DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Neural Network Embeddings: Learning Meaningful Representations
  • Essential Building Blocks of Modern Neural Networks
  • Residual Connections and Gating Mechanisms
  • Convolutional Layers: From Images to Time Series
  • Neural Network Embeddings: Learning Meaningful Representations
  • Attention Mechanisms: Learning What to Focus On
  • Encoder-Decoder Architectures
  • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting

On this page

  • Embeddings: Mapping Entities to Vector Spaces
    • Section 4.13 - The Embedding Principle
    • Section 4.14 - Mathematical Framework
      • One-Hot Encoding Limitation
      • Learned Embeddings
    • Section 4.15 - Training Embeddings
      • Geometric Interpretation
    • Section 4.16 - Applications Beyond Categorical Variables
      • Numerical Feature Embedding
      • Time Embeddings
      • Entity Embeddings

Neural Network Embeddings: Learning Meaningful Representations

Course
Advanced Concepts
Understanding embeddings in neural networks: from discrete entities to continuous vector spaces.
Author

Remi Genet

Published

2025-04-03

Embeddings: Mapping Entities to Vector Spaces

Section 4.13 - The Embedding Principle

Neural networks operate on continuous numerical values, yet many real-world inputs are discrete entities: words, categories, user IDs, or market symbols. Embeddings solve this fundamental mismatch by learning continuous vector representations of discrete entities.

An embedding is formally a mapping function \[ E: X \to \mathbb{R}^d \] that transforms elements from a discrete set (X) into (d)-dimensional real vectors. The key insight is that these vectors are learned during training to capture meaningful relationships between entities.

Section 4.14 - Mathematical Framework

One-Hot Encoding Limitation

Traditional one-hot encoding represents a categorical variable with ( n ) possible values as:

\[ e_i = [0, \ldots, 0, 1, 0, \ldots, 0] \in \mathbb{R}^n \]

where the 1 appears in the ( i )-th position. This representation has several limitations: - Dimensionality grows linearly with vocabulary size. - It provides no notion of similarity between entities. - The sparse representation wastes computational resources.

Learned Embeddings

Instead, we learn a dense embedding matrix \[ W \in \mathbb{R}^{n \times d} \] where ( d n ). For an input ( i ), its embedding becomes:

\[ x_i = W e_i \in \mathbb{R}^d. \]

This transformation offers several advantages: 1. Reduced Dimensionality: ( d n ). 2. Dense Representation: Enables efficient computation. 3. Learned Similarities: Captures relationships between entities. 4. Continuous Space: Supports gradient-based optimization.

Section 4.15 - Training Embeddings

Embeddings are learned end-to-end with the neural network through gradient descent. For a loss function ( L ), the gradient with respect to the embedding parameters flows through:

\[ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial x_i} \cdot \frac{\partial x_i}{\partial W} = \frac{\partial L}{\partial x_i} \cdot e_i^\top. \]

This learning process adjusts the embeddings to minimize the task loss while capturing useful relationships between entities.

Geometric Interpretation

The geometry of the embedding space reflects semantic relationships. For instance, if entities ( a ) and ( b ) are represented by vectors ( x_a ) and ( x_b ), their relationship can be measured by:

  1. Euclidean Distance: \[ \| x_a - x_b \|_2. \]
  2. Cosine Similarity: \[ \frac{x_a \cdot x_b}{\|x_a\|_2 \|x_b\|_2}. \]
  3. Dot Product: \[ x_a \cdot x_b. \]

These metrics capture different aspects of similarity in the embedded space.

Section 4.16 - Applications Beyond Categorical Variables

While embeddings were initially developed for categorical variables, their applications extend much further.

Numerical Feature Embedding

Even continuous features can benefit from embeddings. For a numerical value ( v ), we can learn a nonlinear embedding:

\[ E(v) = W_2\, \sigma(W_1 v + b_1) + b_2, \]

where ( ) is a nonlinear activation function. This allows the network to learn a more expressive representation of the feature.

Time Embeddings

In sequential models, time itself can be embedded. Given a timestamp ( t ), we can create positional embeddings:

\[ E(t) = \bigl[\sin(\omega_k t),\, \cos(\omega_k t)\bigr]_{k=1}^{d/2}, \]

where ( _k ) are different frequencies. This formulation captures both absolute and relative temporal positions.

Entity Embeddings

In financial applications, market symbols, sectors, or other discrete entities can be embedded to capture inherent relationships:

  • Assets within the same sector should have similar embeddings.
  • Companies with similar market behavior should be close in embedding space.
  • The embedding can capture complex relationships not explicitly encoded in the data.

The dimensionality ( d ) of these embeddings is a hyperparameter that balances: - Representational Capacity (larger ( d )), - Computational Efficiency (smaller ( d )), and - Generalization (to avoid overfitting).

The power of embeddings lies in their ability to learn meaningful representations automatically from data, capturing complex relationships in a form that neural networks can effectively process.

Back to top
Convolutional Layers: From Images to Time Series
Attention Mechanisms: Learning What to Focus On

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub