DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Understanding the Training Loop
  • Training a Neural Network
  • Understanding the Training Loop
  • Understanding Optimizers
  • Understanding Callbacks
  • Training Parameters and Practical Considerations
  • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
    • content/Cours_3/keras_callbacks_corrected.ipynb

On this page

  • The Training Loop: How Models Learn
    • Section 3.1 - What Happens During Training?
      • The Basic Loop Structure
    • Section 3.2 - Step-by-Step Breakdown
      • 1. Batch Selection
      • 2. Forward Pass
      • 3. Loss Computation
      • 4. Gradient Computation
      • 5. Weight Updates
    • Section 3.3 - Memory Management During Training
      • Forward Pass Storage
      • Gradient Computation Requirements
    • Section 3.4 - Training Loop Variations
      • Basic Training
      • Training with Validation
      • Training with Multiple Losses

Understanding the Training Loop

Course
Fundamentals
Deep dive into the fundamental training loop in deep learning, understanding how models learn step by step.
Author

Remi Genet

Published

2025-04-03

The Training Loop: How Models Learn

Section 3.1 - What Happens During Training?

When we call model.fit() in Keras, we’re initiating a complex process that repeatedly updates the model’s weights to minimize the loss function. Let’s understand what happens under the hood.

The Basic Loop Structure

At its core, training follows this pattern:

# Conceptual implementation of training loop
for epoch in range(n_epochs):
    for batch_idx in range(n_batches):
        X_batch, y_batch = get_batch(batch_idx)
        with GradientTape() as tape:
            y_pred = model(X_batch)
            loss = loss_function(y_batch, y_pred)
        gradients = tape.gradient(loss, model.weights)
        optimizer.apply_gradients(zip(gradients, model.weights))

Section 3.2 - Step-by-Step Breakdown

1. Batch Selection

def get_batch(idx, batch_size):
    start_idx = idx * batch_size
    end_idx = start_idx + batch_size
    return X[start_idx:end_idx], y[start_idx:end_idx]

This step: - Selects a subset of training data - Provides manageable chunks for processing - Enables stochastic gradient descent

Note

While full-batch gradient descent would use all data at once, mini-batch training offers: - Better generalization - Lower memory requirements - Faster iterations

2. Forward Pass

# Inside model's call method
def call(self, inputs):
    # Layer 1
    x = self.dense1(inputs)
    x = self.activation1(x)
    
    # Layer 2
    x = self.dense2(x)
    x = self.activation2(x)
    
    # Output layer
    return self.output_layer(x)

During this phase: - Data flows through the network - Each layer performs its computations - Activations are stored for backpropagation

3. Loss Computation

def compute_loss(y_true, y_pred):
    # Example: Mean Squared Error
    return tf.reduce_mean(tf.square(y_true - y_pred))

The loss function: - Measures prediction error - Provides optimization target - Guides weight updates

4. Gradient Computation

def compute_gradients(tape, loss, weights):
    # Automatic differentiation
    gradients = tape.gradient(loss, weights)
    return gradients

During backpropagation: - Gradients flow backwards through network - Chain rule applied automatically - Each weight’s contribution calculated

5. Weight Updates

def apply_updates(optimizer, gradients, weights):
    # Basic gradient descent update
    for g, w in zip(gradients, weights):
        w.assign_sub(learning_rate * g)

The update step: - Modifies weights based on gradients - Scaled by learning rate - Controlled by optimizer logic

Section 3.3 - Memory Management During Training

Forward Pass Storage

During the forward pass, we need to store: 1. Layer inputs for gradient computation 2. Intermediate activations 3. Final outputs for loss calculation

Important

Memory usage scales with: - Batch size - Model depth - Layer sizes

Gradient Computation Requirements

The backward pass requires: 1. Access to forward pass activations 2. Memory for gradient computations 3. Temporary storage for intermediate results

Section 3.4 - Training Loop Variations

Basic Training

# Standard training loop
for epoch in range(epochs):
    for batch in train_dataset:
        train_step(model, batch)

Training with Validation

# Training with validation checks
for epoch in range(epochs):
    # Training
    for batch in train_dataset:
        train_step(model, batch)
    
    # Validation
    for batch in val_dataset:
        validate_step(model, batch)

Training with Multiple Losses

# Multiple loss components
for epoch in range(epochs):
    for batch in train_dataset:
        with GradientTape() as tape:
            # Main task loss
            main_loss = compute_main_loss(model, batch)
            
            # Regularization loss
            reg_loss = compute_regularization(model)
            
            # Combined loss
            total_loss = main_loss + reg_loss
Tip

The training loop’s structure can be customized for: - Multiple inputs/outputs - Custom regularization - Complex loss functions - Gradient accumulation

Back to top
Training a Neural Network
Understanding Optimizers

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub