DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Long Short-Term Memory Networks (LSTM)
  • Recurrent Neural Networks
  • Sequential Data Processing: From MLPs to RNNs
  • Long Short-Term Memory Networks (LSTM)
  • Modern RNN Architectures
  • RNN Limitations: Computational Challenges
  • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction

On this page

  • Long Short-Term Memory Networks: The Backbone of Sequence Processing
    • Section 2.4 - Historical Context and Significance
      • Origins and Evolution
    • Section 2.5 - LSTM Architecture Deep Dive
      • Core Components
      • Mathematical Formulation
    • Section 2.6 - Implementation in Keras
      • Understanding Each Component:
    • Section 2.7 - Memory Management
      • How LSTM Manages Information
      • Gradient Flow

Long Short-Term Memory Networks (LSTM)

Course
Deep Learning
Understanding the mathematics and principles behind LSTM networks, their historical significance, and practical considerations.
Author

Remi Genet

Published

2025-04-03

Long Short-Term Memory Networks: The Backbone of Sequence Processing

Section 2.4 - Historical Context and Significance

Origins and Evolution

LSTMs were introduced by Hochreiter & Schmidhuber in 1997, yet they remain one of the most effective sequence processing architectures. Their endurance in the field stems from:

  1. Robust Architecture:
    • Carefully designed gating mechanisms
    • Stable gradient flow
    • Explicit memory management
  2. Practical Success:
    • Proven effectiveness in time series
    • Strong performance in financial forecasting
    • Reliable training behavior
Note

Despite being over 25 years old, LSTMs often outperform newer architectures in many financial applications, particularly in volatility forecasting and trend prediction.

Section 2.5 - LSTM Architecture Deep Dive

Core Components

An LSTM cell maintains two states: 1. Cell State ((c_t)): Long-term memory 2. Hidden State ((h_t)): Current output/short-term state

LSTM Cell Structure

LSTM Cell Structure

Mathematical Formulation

The LSTM updates these states through four main gates:

  1. Forget Gate (determines what to remove from cell state): \[ f_t = \sigma\Bigl(W_f \cdot [h_{t-1}, x_t] + b_f\Bigr) \]

  2. Input Gate (determines what new information to store): \[ i_t = \sigma\Bigl(W_i \cdot [h_{t-1}, x_t] + b_i\Bigr) \] \[ \tilde{c}_t = \tanh\Bigl(W_c \cdot [h_{t-1}, x_t] + b_c\Bigr) \]

  3. Cell State Update: \[ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \]

  4. Output Gate (determines what parts of cell state to output): \[ o_t = \sigma\Bigl(W_o \cdot [h_{t-1}, x_t] + b_o\Bigr) \] \[ h_t = o_t \odot \tanh(c_t) \]

Where: - () is the sigmoid function: \[ \sigma(x) = \frac{1}{1+e^{-x}} \] - () is the hyperbolic tangent: \[ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \] - () represents element-wise multiplication.

Activation Function Choices

The choice of activation functions is crucial for LSTM stability:

  1. Sigmoid (()) for Gates:
    • Output range ([0,1])
    • Acts as soft gates
    • Smooth gradients
  2. tanh for State Transforms:
    • Output range ([-1,1])
    • Zero-centered
    • Helps with gradient flow

Section 2.6 - Implementation in Keras

from keras import Sequential, layers

# Simple LSTM for financial time series
model = Sequential([
    layers.LSTM(64, 
                input_shape=(sequence_length, n_features),
                activation='tanh',             # State activation
                recurrent_activation='sigmoid',# Gate activation
                return_sequences=True),        # Return full sequence
    layers.LSTM(32),                         # Return only final output
    layers.Dense(1)                          # Prediction
])

Understanding Each Component:

# Detailed LSTM configuration
lstm_layer = layers.LSTM(
    units=64,                        # Size of output
    activation='tanh',               # State transform
    recurrent_activation='sigmoid',  # Gates
    use_bias=True,                   # Include bias terms
    kernel_initializer='glorot_uniform',   # Weight initialization
    recurrent_initializer='orthogonal',    # Important for stability
    bias_initializer='zeros',
    unit_forget_bias=True            # Initialize forget gate bias to 1
)

Section 2.7 - Memory Management

How LSTM Manages Information

  1. Short-term Memory (Hidden State):
    • Updated at every time step
    • Directly used for outputs
    • Influenced by current input and cell state
  2. Long-term Memory (Cell State):
    • Protected by gates
    • Can maintain information for long sequences
    • Selective updates through forget and input gates
Financial Application Example

In financial time series: - Cell state can track market regime - Forget gate can adapt to regime changes - Input gate can identify significant events - Output gate can focus on relevant features for prediction

Gradient Flow

The LSTM’s architecture helps with gradient flow through:

  1. Additive Updates: \[ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \] This additive structure creates a direct path for gradients.

  2. Gating Mechanisms:

  • Gates are differentiable.
  • Help control gradient magnitude.
  • Prevent explosion/vanishing through sigmoid bounds.
  1. Cell State Highway:
  • Provides a direct path through time steps.
  • Protected by the forget gate.
  • Involves minimal transformations.
Note

The careful balance of activation functions (sigmoid and tanh) combined with the gating mechanism helps maintain stable gradient flow, which is crucial for training on long sequences of financial data.

Back to top
Sequential Data Processing: From MLPs to RNNs
Modern RNN Architectures

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub