From Traditional Models to Deep Learning

Remi Genet

From Traditional Models to Deep Learning

Cours

Fundamentals

Foundations of machine learning and econometric modeling, introducing deep learning as a flexible function approximation paradigm for financial problems.

Author

Remi Genet

Published

2025-04-03

From Standard Models to Neural Networks

Section 1.1 - Linear Regression: The Simplest Machine Learning Model

What It Does

Predicts a number (e.g., stock price tomorrow) using a weighted combination of input features (e.g., P/E ratio, volatility):

Mathematical Form
For input features \[ \mathbf{x} = [x_1, \dots, x_n] \] and weights \[ \boldsymbol{\theta} = [\theta_1, \dots, \theta_n], \] the prediction is given by: \[ \hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n. \]

How We Find the Weights
The best weights are those that minimize the prediction error on the training data: \[ \hat{\theta} = \operatorname{argmin}_{\theta} \sum_{i=1}^{N} \left( y_i - \hat{y}_i \right)^2. \]

Closed-form solution: \[ \hat{\theta} = (X^\top X)^{-1} X^\top y, \] where (X) is the data matrix with a column of 1s for the intercept.

Section 1.2 - Decision Trees: Learning Simple Rules

How It Works Step-by-Step

Start: All data in one group (e.g., all historical stock returns).
Find Split: Test all possible feature thresholds (e.g., “P/E ratio < 15?”) to create two groups where predictions are most accurate.
Repeat: Keep splitting subgroups until reaching stopping criteria (max depth or minimum samples).

Example: Predicting stock outperformance

Is P/E ratio < 20?  
 ├─ Yes → Check ROE > 15%  
 │    ├─ Yes → Predict "Outperform"  
 │    └─ No → Predict "Neutral"  
 └─ No → Predict "Underperform"

Mathematical Criterion (Classification)
At each split, maximize the purity gain: \[ \text{Gain} = H(\text{parent}) - \left[\frac{N_{\text{left}}}{N}\, H(\text{left}) + \frac{N_{\text{right}}}{N}\, H(\text{right})\right], \] where (H) represents the impurity (e.g., Gini or entropy).

Section 1.3 - GARCH Models: Handling Time-Dependent Variance

Why We Need It

Financial returns often exhibit volatility clustering (calm vs. turbulent periods). GARCH models capture this time-varying variance.

Model Equations

Return at time (t): \[ r_t = \mu + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2). \]

Volatility dynamics (GARCH(1,1)): \[ \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2. \]

Calibration Process 1. Initialize parameters (), (), (). 2. Compute the volatility series ({_t^2}) using past ()’s. 3. Adjust parameters to maximize the likelihood of observing the returns. 4. Repeat until convergence (no closed-form solution exists).

Section 1.4 - Enter Deep Learning

Core Idea

Instead of hand-crafting models (linear terms, tree splits, GARCH lags), let the algorithm learn the feature transformations:

Traditional Approach
\[ y = f(\mathbf{x}), \] where (f) is designed by humans (e.g., \[ y = \theta_0 + \theta_1 x_1 + \dots). \]

Deep Learning Approach
\[ y = f(\mathbf{x}; \theta), \] where (f) is a learned sequence of nonlinear transformations: \[ h_1 = \varphi(W_1 \mathbf{x} + b_1), \] \[ h_2 = \varphi(W_2 h_1 + b_2), \] \[ \vdots \] \[ \hat{y} = W_{\text{out}} h_n + b_{\text{out}}. \]

Here, () is the activation function (e.g., ReLU: ((z) = (0, z))).

Why It Matters for Finance

Handles raw, high-dimensional data (order books, news text).
Discovers complex patterns (nonlinear factor interactions).
Offers flexible architecture design (time series, graphs, etc.).

Back to top