DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Model Training Fundamentals
  • Introduction to Deep Learning
  • From Traditional Models to Deep Learning
  • The Multi-Layer Perceptron (MLP)
  • Automatic Differentiation: The Engine of Deep Learning
  • Computation Backends & Keras 3
  • GPUs and Deep Learning: When Hardware Matters
  • Keras Fundamentals: Models & Layers
  • Keras Matrix Operations: The Building Blocks
  • Activation Functions: Adding Non-linearity
  • Model Training Fundamentals
  • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations

On this page

  • Training Your First Model
    • Section 1.40 - The Training Workflow
      • Four Essential Steps
    • Section 1.41 - Key Parameters Explained
      • Compilation Options
      • Training Parameters
    • Section 1.42 - Quick Tips
      • Common Gotchas
    • Section 1.43 - Task-Specific Templates
      • Regression (Predict Numbers)
      • Binary Classification (Yes/No)
      • Multiclass Classification (Categories)

Model Training Fundamentals

Cours
Fundamentals
Essential workflow for training neural networks in Keras, covering the basic steps from model creation to prediction.
Author

Remi Genet

Published

2025-04-03

Training Your First Model

Section 1.40 - The Training Workflow

Four Essential Steps

  1. Create the model
  2. Compile it with optimizer and loss
  3. Fit to training data
  4. Make predictions
Complete Training Example
import numpy as np
from keras import Sequential, layers

# Generate sample data
X_train = np.random.normal(size=(1000, 20))  # 1000 samples, 20 features
y_train = (X_train.sum(axis=1) > 0).astype(int)  # Binary classification

# Step 1: Create model
model = Sequential([
    layers.Dense(64, activation='relu', input_shape=(20,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

# Step 2: Compile
model.compile(
    optimizer='adam',               # Algorithm to update weights
    loss='binary_crossentropy',    # How to measure errors
    metrics=['accuracy']           # What to report during training
)

# Step 3: Fit
history = model.fit(
    X_train, y_train,
    epochs=10,           # Number of training passes
    batch_size=32,       # Samples per gradient update
    validation_split=0.2 # 20% of data for validation
)

# Step 4: Predict
X_new = np.random.normal(size=(100, 20))  # New data
predictions = model.predict(X_new)         # Get model outputs

Section 1.41 - Key Parameters Explained

Compilation Options

model.compile(
    optimizer='adam',    # Common choices: 'adam', 'sgd', 'rmsprop'
    loss='mse',         # For regression: 'mse', 'mae'
                        # For classification: 'binary_crossentropy', 'categorical_crossentropy'
    metrics=['accuracy'] # What to track: 'accuracy', 'mae', custom metrics
)

Training Parameters

model.fit(
    X_train, y_train,        
    epochs=10,              # More epochs = more training (sometimes too much)
    batch_size=32,          # Smaller = slower but more stable
    validation_split=0.2,   # Hold out data to check performance
    shuffle=True,           # Mix data between epochs
    verbose=1              # 0=silent, 1=progress bar, 2=one line per epoch
)

Section 1.42 - Quick Tips

Common Gotchas

  1. Input Shape:
    • Must match your data
    • Include batch dimension: (batch_size, features)
  2. Data Types:
    • Convert inputs to float32: X_train = X_train.astype('float32')
    • Classification targets to int32: y_train = y_train.astype('int32')
  3. Validation:
    • Always use validation data to check for overfitting
    • Either validation_split or separate validation_data=(X_val, y_val)
Note

We’ll dive deeper into:

  • Loss function selection
  • Optimizer tuning
  • Batch size effects
  • Validation strategies
  • Overfitting prevention

in later sections of the course.

Section 1.43 - Task-Specific Templates

Regression (Predict Numbers)

model = Sequential([
    layers.Dense(64, activation='relu', input_shape=(n_features,)),
    layers.Dense(1)  # No activation for regression
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

Binary Classification (Yes/No)

model = Sequential([
    layers.Dense(64, activation='relu', input_shape=(n_features,)),
    layers.Dense(1, activation='sigmoid')  # Output between 0 and 1
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Multiclass Classification (Categories)

model = Sequential([
    layers.Dense(64, activation='relu', input_shape=(n_features,)),
    layers.Dense(n_classes, activation='softmax')  # One probability per class
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Important

These templates provide starting points - you’ll learn to customize them as we progress through the course.

Back to top
Activation Functions: Adding Non-linearity
TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub