Understanding the Training Loop

Remi Genet

Understanding the Training Loop

Course

Fundamentals

Deep dive into the fundamental training loop in deep learning, understanding how models learn step by step.

Author

Remi Genet

Published

2025-04-03

The Training Loop: How Models Learn

Section 3.1 - What Happens During Training?

When we call model.fit() in Keras, we’re initiating a complex process that repeatedly updates the model’s weights to minimize the loss function. Let’s understand what happens under the hood.

The Basic Loop Structure

At its core, training follows this pattern:

# Conceptual implementation of training loop
for epoch in range(n_epochs):
    for batch_idx in range(n_batches):
        X_batch, y_batch = get_batch(batch_idx)
        with GradientTape() as tape:
            y_pred = model(X_batch)
            loss = loss_function(y_batch, y_pred)
        gradients = tape.gradient(loss, model.weights)
        optimizer.apply_gradients(zip(gradients, model.weights))

Section 3.2 - Step-by-Step Breakdown

1. Batch Selection

def get_batch(idx, batch_size):
    start_idx = idx * batch_size
    end_idx = start_idx + batch_size
    return X[start_idx:end_idx], y[start_idx:end_idx]

This step: - Selects a subset of training data - Provides manageable chunks for processing - Enables stochastic gradient descent

Note

While full-batch gradient descent would use all data at once, mini-batch training offers: - Better generalization - Lower memory requirements - Faster iterations

2. Forward Pass

# Inside model's call method
def call(self, inputs):
    # Layer 1
    x = self.dense1(inputs)
    x = self.activation1(x)
    
    # Layer 2
    x = self.dense2(x)
    x = self.activation2(x)
    
    # Output layer
    return self.output_layer(x)

During this phase: - Data flows through the network - Each layer performs its computations - Activations are stored for backpropagation

3. Loss Computation

def compute_loss(y_true, y_pred):
    # Example: Mean Squared Error
    return tf.reduce_mean(tf.square(y_true - y_pred))

The loss function: - Measures prediction error - Provides optimization target - Guides weight updates

4. Gradient Computation

def compute_gradients(tape, loss, weights):
    # Automatic differentiation
    gradients = tape.gradient(loss, weights)
    return gradients

During backpropagation: - Gradients flow backwards through network - Chain rule applied automatically - Each weight’s contribution calculated

5. Weight Updates

def apply_updates(optimizer, gradients, weights):
    # Basic gradient descent update
    for g, w in zip(gradients, weights):
        w.assign_sub(learning_rate * g)

The update step: - Modifies weights based on gradients - Scaled by learning rate - Controlled by optimizer logic

Section 3.3 - Memory Management During Training

Forward Pass Storage

During the forward pass, we need to store: 1. Layer inputs for gradient computation 2. Intermediate activations 3. Final outputs for loss calculation

Important

Memory usage scales with: - Batch size - Model depth - Layer sizes

Gradient Computation Requirements

The backward pass requires: 1. Access to forward pass activations 2. Memory for gradient computations 3. Temporary storage for intermediate results

Section 3.4 - Training Loop Variations

Basic Training

# Standard training loop
for epoch in range(epochs):
    for batch in train_dataset:
        train_step(model, batch)

Training with Validation

# Training with validation checks
for epoch in range(epochs):
    # Training
    for batch in train_dataset:
        train_step(model, batch)
    
    # Validation
    for batch in val_dataset:
        validate_step(model, batch)

Training with Multiple Losses

# Multiple loss components
for epoch in range(epochs):
    for batch in train_dataset:
        with GradientTape() as tape:
            # Main task loss
            main_loss = compute_main_loss(model, batch)
            
            # Regularization loss
            reg_loss = compute_regularization(model)
            
            # Combined loss
            total_loss = main_loss + reg_loss

Tip

The training loop’s structure can be customized for: - Multiple inputs/outputs - Custom regularization - Complex loss functions - Gradient accumulation

Back to top