DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Encoder-Decoder Architectures
  • Essential Building Blocks of Modern Neural Networks
  • Residual Connections and Gating Mechanisms
  • Convolutional Layers: From Images to Time Series
  • Neural Network Embeddings: Learning Meaningful Representations
  • Attention Mechanisms: Learning What to Focus On
  • Encoder-Decoder Architectures
  • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting

On this page

  • Encoder-Decoder Architectures: Processing Sequential Data
    • Section 4.22 - The Sequence-to-Sequence Challenge
    • Section 4.23 - Mathematical Framework
      • The Encoding Phase
      • The Decoding Phase
    • Section 4.24 - State Transfer and Conditioning
    • Section 4.25 - Applications and Variants
      • Time Series Forecasting
      • Machine Translation
      • Voice Conversion
    • Section 4.26 - Theoretical Properties
    • Section 4.27 - Exemple Architecture

Encoder-Decoder Architectures

Course
Advanced Concepts
Understanding encoder-decoder architectures: principles, mathematics, and applications in sequence-to-sequence tasks.
Author

Remi Genet

Published

2025-04-03

Encoder-Decoder Architectures: Processing Sequential Data

Section 4.22 - The Sequence-to-Sequence Challenge

Many real-world problems involve transforming one sequence into another sequence, potentially of different lengths. Examples include machine translation (sequence of words to sequence of words), time series forecasting (past values to future values), and music generation (audio features to audio features). This presents unique challenges that cannot be addressed by simple feed-forward architectures.

The fundamental difficulty lies in creating a fixed-size representation of variable-length input that contains sufficient information to generate variable-length output. The encoder-decoder architecture emerged as an elegant solution to this challenge.

Section 4.23 - Mathematical Framework

The encoder-decoder architecture decomposes the sequence transformation problem into two phases. For an input sequence ( x = (x_1, , x_n) ), we want to generate an output sequence ( y = (y_1, , y_m) ), where ( n ) and ( m ) can be different.

The Encoding Phase

The encoder processes the input sequence to create a context vector ( c ):

c=fenc(x1,…,xn)

where ( f_{} ) is typically a recurrent neural network that produces both outputs and a final state:

ht,st=fenc(xt,st−1)

The final state ( s_n ) serves as the context vector ( c ), capturing the entire input sequence’s information.

The Decoding Phase

The decoder generates the output sequence conditioned on the context vector:

yt=fdec(c,yt−1,st−1′)

where: - ( s’{t-1} ) is the decoder’s internal state, - ( y{t-1} ) is the previous output, - The initial state ( s’_0 ) is initialized with the context vector ( c ).

Section 4.24 - State Transfer and Conditioning

A crucial aspect of encoder-decoder architectures is how the encoder’s information is transferred to the decoder. The most common approaches are:

  1. State Transfer: The encoder’s final state initializes the decoder:

    s0′=c=sn

  2. Context Conditioning: The context vector is used at each decoding step:

    yt=fdec(c,yt−1,st−1′)

When using LSTMs, both the cell state and hidden state are transferred:

(h0′,c0′)=(hn,cn)

This dual state transfer helps maintain both short-term and long-term dependencies.

Section 4.25 - Applications and Variants

Time Series Forecasting

In time series forecasting, the architecture processes known past values to predict future values. The encoder processes the historical sequence, while the decoder generates predictions using: - The encoded historical context, - Any known future information (like calendar features).

The mathematical formulation becomes:

yt=fdec(c,[yt−1,kt],st−1′)

where ( k_t ) represents known future features at time ( t ).

Machine Translation

In translation tasks, the encoder processes the source language sentence while the decoder generates the target language translation. The architecture learns to: - Encode semantic meaning from the source language, - Generate grammatically correct sequences in the target language, - Maintain the original message’s intent.

Voice Conversion

For audio processing tasks, the encoder-decoder architecture can transform vocal characteristics while preserving linguistic content. The encoder captures phonetic and prosodic features, while the decoder reconstructs the audio with modified characteristics.

Section 4.26 - Theoretical Properties

The encoder-decoder architecture possesses several important theoretical properties:

  1. Information Bottleneck: The context vector ( c ) creates a controlled bottleneck, forcing the model to learn efficient representations of the input sequence.

  2. Variable Length Handling: The architecture naturally accommodates input and output sequences of different lengths without requiring padding or truncation.

  3. Temporal Abstraction: The encoding phase can learn to abstract temporal patterns at multiple scales, while the decoding phase can generate sequences with different temporal characteristics.

This architectural pattern has become fundamental in sequence processing tasks, particularly when combined with attention mechanisms that allow the decoder to selectively focus on different parts of the input sequence during generation.

Section 4.27 - Exemple Architecture

TKAN Structure

TKAN Structure
Back to top
Attention Mechanisms: Learning What to Focus On
Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub