DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Training Parameters and Practical Considerations
  • Training a Neural Network
  • Understanding the Training Loop
  • Understanding Optimizers
  • Understanding Callbacks
  • Training Parameters and Practical Considerations
  • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
    • content/Cours_3/keras_callbacks_corrected.ipynb

On this page

  • Training Parameters and Practical Considerations
    • Section 3.14 - Feature Scaling: A Critical Step
      • Why Scaling is Crucial
      • Scaling Methods
    • Section 3.15 - Batch Size: Finding the Sweet Spot
      • Batch Size Selection Principles
      • Impact on Training
    • Section 3.16 - Weight Initialization
      • The Importance of Proper Initialization
      • Key Initialization Methods
      • Layer-Specific Considerations
    • Section 3.17 - Validation Strategy
      • Train-Validation Split
      • Validation Best Practices

Training Parameters and Practical Considerations

Course
Fundamentals
Understanding key training parameters and practical considerations in deep learning.
Author

Remi Genet

Published

2025-04-03

Training Parameters and Practical Considerations

Section 3.14 - Feature Scaling: A Critical Step

Why Scaling is Crucial

Feature scaling is arguably the most important preprocessing step in deep learning. Consider a network with inputs ( x_1 ) and ( x_2 ):

\[ h = \tanh(w_1 x_1 + w_2 x_2 + b) \]

Problems without scaling:

  1. Gradient Issues:
    • If ( x_1 ) and ( x_2 )
    • The gradients ( ) and ( ) will have vastly different magnitudes
    • This imbalance makes optimization nearly impossible
  2. Activation Function Saturation:
    • Large input values can push activations into their saturation regions.
    • For example, \[ \tanh(1000) \approx \tanh(10000) \approx 1 \]
    • This leads to vanishing gradients.
Impact on Training

Without proper scaling: - The optimizer struggles to find good solutions. - Training becomes unstable. - The model might not converge at all.

Scaling Methods

  1. Standardization (Preferred for deep learning):

    X_scaled = (X - mean) / std
    • Centers data around 0.
    • Scales data to have unit variance.
    • Works well with common activation functions.
  2. Min-Max Scaling:

    X_scaled = (X - X.min()) / (X.max() - X.min())
    • Maps features to the ([0, 1]) interval.
    • Useful when bounded outputs are required.

Section 3.15 - Batch Size: Finding the Sweet Spot

Batch Size Selection Principles

A good rule of thumb is to choose the largest batch size that yields between 100 and 500 batches per epoch. For example:

def optimal_batch_size(n_samples, target_batches=300):
    batch_size = n_samples // target_batches
    # Round to the nearest power of 2 for GPU efficiency
    return 2**int(np.log2(batch_size))

Example calculations: - 100,000 samples → batch_size ≈ 256 (391 batches) - 10,000 samples → batch_size ≈ 32 (313 batches) - 1,000,000 samples → batch_size ≈ 2048 (488 batches)

Why This Range?
  • Too few batches (<100): Not enough updates per epoch.
  • Too many batches (>500): Training becomes unnecessarily slow.
  • Sweet spot: Provides a good balance between speed and stability.

Impact on Training

  1. Statistical Effects:
    • Larger batches yield more precise gradient estimates.
    • Smaller batches introduce noise, which can help in exploring the loss landscape.
  2. Optimization Effects:
    • Larger batches might require higher learning rates.
    • Smaller batches have an inherent regularization effect.

Section 3.16 - Weight Initialization

The Importance of Proper Initialization

Poor weight initialization can lead to: 1. Vanishing gradients 2. Exploding gradients 3. Dead neurons (especially with ReLU activations)

Key Initialization Methods

  1. Xavier/Glorot Initialization (for tanh or sigmoid activations):

    std = sqrt(2.0 / (fan_in + fan_out))
    W = np.random.normal(0, std, size=(fan_in, fan_out))
  2. He Initialization (for ReLU activations):

    std = sqrt(2.0 / fan_in)
    W = np.random.normal(0, std, size=(fan_in, fan_out))
Initialization Guidelines
  • For ReLU, use He initialization.
  • For tanh or sigmoid, use Xavier (Glorot) initialization.
  • For linear activations, Xavier initialization is also recommended.

Layer-Specific Considerations

# Dense layer with ReLU activation
layer = Dense(
    units=64,
    kernel_initializer='he_normal',
    bias_initializer='zeros'
)

# Dense layer with tanh activation
layer = Dense(
    units=64,
    kernel_initializer='glorot_normal',
    bias_initializer='zeros'
)

Section 3.17 - Validation Strategy

Train-Validation Split

For many applications, a simple holdout validation set is sufficient. For example, in time series data you might use:

# Using the last 20% of data for validation
val_size = int(len(X) * 0.2)
X_train, y_train = X[:-val_size], y[:-val_size]
X_val, y_val = X[-val_size:], y[-val_size:]

Validation Best Practices

  1. Time Series Data:
    • Always split the data chronologically.
    • Do not shuffle the time series.
    • Choose the validation size based on the forecast horizon.
  2. Financial Data:
    • Account for different market regimes.
    • Consider using multiple validation periods.
    • Test on data representing various market conditions.
Practical Tips
  1. Data Preparation:
    • Scale features before splitting.
    • Apply the same scaling parameters to the validation set.
    • Ensure that the validation set is representative of future data.
  2. Monitoring:
    • Watch the gap between training and validation loss.
    • Monitor multiple performance metrics.
    • Be cautious of sudden changes in validation performance.
Back to top
Understanding Callbacks
TP: Using Deep Learning Frameworks for General Optimization

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub