DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Automatic Differentiation: The Engine of Deep Learning
  • Introduction to Deep Learning
  • From Traditional Models to Deep Learning
  • The Multi-Layer Perceptron (MLP)
  • Automatic Differentiation: The Engine of Deep Learning
  • Computation Backends & Keras 3
  • GPUs and Deep Learning: When Hardware Matters
  • Keras Fundamentals: Models & Layers
  • Keras Matrix Operations: The Building Blocks
  • Activation Functions: Adding Non-linearity
  • Model Training Fundamentals
  • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations

On this page

  • Gradient Computation: Three Approaches
    • Section 1.12 - Finite Difference Approximation
      • Core Concept
      • Limitations
    • Section 1.13 - Symbolic Differentiation
      • What It Does
      • The Expression Swell Problem
    • Section 1.14 - Automatic Differentiation (AD)
      • Foundational Idea
      • Why It Scales
    • Section 1.15 - Method Comparison
    • Section 1.16 - Composable Differentiability
      • Mathematical Guarantee
    • Section 1.17 - AD by Example
      • Computational Graph
      • Gradient Calculation

Automatic Differentiation: The Engine of Deep Learning

Cours
Fundamentals
A technical deep-dive into gradient computation methods, highlighting why automatic differentiation became the backbone of modern machine learning.
Author

Remi Genet

Published

2025-04-03

Gradient Computation: Three Approaches


Section 1.12 - Finite Difference Approximation

Core Concept

Estimate derivatives using function evaluations at perturbed inputs:

First Derivative:
\[ \frac{\partial f}{\partial x} \approx \frac{f(x + \varepsilon) - f(x)}{\varepsilon} \]
(Forward difference formula)

Example:
For \[ f(x) = x^2 \]
at ( x=2 ) with ( ): \[ f'(2) \approx \frac{(2.01)^2 - 2^2}{0.01} = \frac{4.0401 - 4}{0.01} = 4.01, \]
while the true derivative is ( 4 ).

Limitations

  1. Precision Trade-off:
    • Too large ( ) → Approximation error
    • Too small ( ) → Numerical instability
  2. Computational Cost:
    ( O(n) ) evaluations for ( n ) parameters → Intractable for neural networks (with ( n 6-109 )).

Section 1.13 - Symbolic Differentiation

What It Does

Manipulates mathematical expressions to derive exact derivative formulas.

Input:
\[ f(x) = \sin(x^2) + e^x \]

Symbolic Derivative:
\[ f'(x) = 2x \cdot \cos(x^2) + e^x. \]

The Expression Swell Problem

Consider nested operations:

\[ f(x) = (x+1)^{10}, \] so the compact derivative is \[ f'(x) = 10(x+1)^9. \]

If not simplified, the derivative may appear as an expanded product: \[ f'(x) = 10(x+1)(x+1)(x+1)(x+1)(x+1)(x+1)(x+1)(x+1)(x+1)(x+1), \] which leads to: - Memory Impact: Storage grows exponentially with operation depth. - Real-World Impact: A 100-layer network would produce unusable expressions.


Section 1.14 - Automatic Differentiation (AD)

Foundational Idea

Decompose functions into elementary operations, then apply the chain rule numerically.

Example:
For \[ f(x,y) = x \cdot y + \sin(x), \]
the computation is as follows:

  1. Forward Pass:
    • ( a = x y )
    • ( b = (x) )
    • ( f = a + b )
  2. Backward Pass (Reverse-Mode AD):
    • ( = 1, = 1 )
    • ( = y, = x )
    • ( = (x) )
    • Thus, \[ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial a}\cdot\frac{\partial a}{\partial x} + \frac{\partial f}{\partial b}\cdot\frac{\partial b}{\partial x} = y + \cos(x). \]

Why It Scales

  • Complexity: ( O(1) ) per parameter regardless of network depth.
  • Memory: Stores only intermediate values, not full symbolic expressions.

Section 1.15 - Method Comparison

Method Precision Compute Cost Memory Scalability
Finite Difference Approximate ( O(n) ) ( O(1) ) Small ( n )
Symbolic Exact ( O(1) ) Exponential Shallow nets
Automatic Diff Exact ( O(1) ) Linear Any model

Section 1.16 - Composable Differentiability

Mathematical Guarantee

If ( f ) and ( g ) are differentiable, then ( h(x) = f(g(x)) ) is differentiable with: \[ h'(x) = f'(g(x)) \cdot g'(x). \]

Implication for Deep Learning:
Neural networks can be built from arbitrary differentiable components: 1. Basic Operations: Matrix multiply, convolution
2. Advanced Layers: Self-attention, spectral norms
3. Custom Functions: User-defined gradients


Section 1.17 - AD by Example

Computational Graph

Consider the function: \[ f(x,y) = \bigl(x + \sigma(y)\bigr) \cdot \tanh(x), \] where ( ) is the sigmoid function.

A simplified computational graph is:

    x       y
    │       │
    ├─ (+) ─┐
    │       σ
    └─ tanh │
         │  │
         └─ mul
             │
             f

Gradient Calculation

  1. Forward Pass: Compute all node values at ( x=2,, y=1 )
    • ( (1) )
    • ( (2) )
    • ( f (2 + 0.731) )
  2. Reverse Pass: Propagate derivatives
    • ( = 1 )
    • ( = 2.731 )
      ( = 2.731 )
    • ( = 1 - ^2(2) )
      ( = 2.731 )
    • ( = 0.964 )
      ( = 0.964 )
    • ( = 1 )
      ( = 0.964 )
    • Total: \[ \frac{\partial f}{\partial x} \approx 0.194 + 0.964 = 1.158. \]

Back to top
The Multi-Layer Perceptron (MLP)
Computation Backends & Keras 3

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub