DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. GPUs and Deep Learning: When Hardware Matters
  • Introduction to Deep Learning
  • From Traditional Models to Deep Learning
  • The Multi-Layer Perceptron (MLP)
  • Automatic Differentiation: The Engine of Deep Learning
  • Computation Backends & Keras 3
  • GPUs and Deep Learning: When Hardware Matters
  • Keras Fundamentals: Models & Layers
  • Keras Matrix Operations: The Building Blocks
  • Activation Functions: Adding Non-linearity
  • Model Training Fundamentals
  • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations

On this page

  • CPU vs GPU: The Deep Learning Divide
    • Section 1.22 - Architectural Differences
      • CPU (Central Processing Unit)
      • GPU (Graphics Processing Unit)
    • Section 1.23 - Matrix Multiplication: GPU’s Sweet Spot
      • Why GPUs Dominate Deep Learning
    • Section 1.24 - When GPUs Aren’t Worth It
      • Case 1: Small Models
      • Case 2: Non-Matrix Work
    • Section 1.25 - VRAM: The Memory Bottleneck
      • Why It Matters
    • Section 1.26 - Practical Considerations
      • For This Course
      • Experimenting Beyond
    • Historical Note

GPUs and Deep Learning: When Hardware Matters

Cours
Fundamentals
Understanding the role of specialized hardware in accelerating neural network training, and why modern AI relies on GPUs.
Author

Remi Genet

Published

2025-04-03

CPU vs GPU: The Deep Learning Divide


Section 1.22 - Architectural Differences

CPU (Central Processing Unit)

  • Design: Few complex cores (4–64) optimized for sequential tasks
  • Strengths:
    • Fast single-thread performance
    • Handles diverse workloads (file I/O, system tasks)
  • Analog: A master chef preparing dishes one at a time

GPU (Graphics Processing Unit)

  • Design: 1000s of simple cores optimized for parallel tasks
  • Strengths:
    • Massively parallel floating-point operations
    • Efficient matrix/tensor computations
  • Analog: A kitchen army chopping 1000 vegetables simultaneously

Section 1.23 - Matrix Multiplication: GPU’s Sweet Spot

Why GPUs Dominate Deep Learning

Neural network forward pass for layer \(l\): \[ \mathbf{h}^{(l)} = \varphi\Bigl(\mathbf{W}^{(l)}\mathbf{h}^{(l-1)} + \mathbf{b}^{(l)}\Bigr) \]

GPU Advantages:
1. Parallelize matrix multiplications across 1000s of cores
2. Batch operations: Process multiple samples simultaneously
3. Specialized cores: Tensor Cores (NVIDIA) accelerate mixed-precision math

Performance Gain:
- CPU: ~100 GFLOPS (e.g., Intel i9)
- GPU: ~50 TFLOPS (e.g., NVIDIA A100) → 500× faster


Section 1.24 - When GPUs Aren’t Worth It

Case 1: Small Models

A 2-layer MLP (input=64, hidden=32, output=1):
- Parameters:
\[ (64 \times 32) + (32 \times 1) + \text{biases} = 2080 \]
- CPU Time: 0.5 ms/batch (direct cache access)
- GPU Time: 2 ms/batch (data transfer overhead dominates)

Case 2: Non-Matrix Work

  • Data preprocessing (Pandas operations)
  • Decision tree training (sequential splits)
  • HTTP server handling

Section 1.25 - VRAM: The Memory Bottleneck

Why It Matters

  • Stores model weights and activations during training
  • Example requirements:
    • LSTM with 1000 units: ~16 MB
    • GPT-4: ~1 TB (requires multi-GPU)

Course Context:
Time series models rarely exceed 100 MB → Fit in CPU RAM


Section 1.26 - Practical Considerations

For This Course

  • No GPU Needed:
    • All practicals are designed for CPU execution
    • Typical training times <10 minutes per exercise
  • Why?
    • Small datasets (synthetic or historical market data)
    • Compact architectures (≤5 layers, ≤256 units)

Experimenting Beyond

Cloud GPU Options:

Platform Cost Setup Complexity
Google Colab Free (T4 GPU) Low (browser)
Vast.ai ~$0.15/hr (RTX 3090) Medium (Docker)
AWS EC2 ~$0.5/hr (T4) High (IAM/VPC)

First-Time Setup Guide:
1. Create an account on the chosen platform
2. Upload your Jupyter notebook
3. Select a GPU instance type
4. Run !nvidia-smi to verify GPU access


Historical Note

The 2012 AlexNet breakthrough (ImageNet classification) was enabled by NVIDIA GTX 580 GPUs, training in 5 days versus months on CPUs. Modern LLMs like GPT-4 would be infeasible without GPU clusters.


Back to top
Computation Backends & Keras 3
Keras Fundamentals: Models & Layers

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub