From Traditional Models to Deep Learning
From Standard Models to Neural Networks
Section 1.1 - Linear Regression: The Simplest Machine Learning Model
What It Does
Predicts a number (e.g., stock price tomorrow) using a weighted combination of input features (e.g., P/E ratio, volatility):
Mathematical Form
For input features \[
\mathbf{x} = [x_1, \dots, x_n]
\] and weights \[
\boldsymbol{\theta} = [\theta_1, \dots, \theta_n],
\] the prediction is given by: \[
\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n.
\]
How We Find the Weights
The best weights are those that minimize the prediction error on the training data: \[
\hat{\theta} = \operatorname{argmin}_{\theta} \sum_{i=1}^{N} \left( y_i - \hat{y}_i \right)^2.
\]
Closed-form solution: \[ \hat{\theta} = (X^\top X)^{-1} X^\top y, \] where (X) is the data matrix with a column of 1s for the intercept.
Section 1.2 - Decision Trees: Learning Simple Rules
How It Works Step-by-Step
- Start: All data in one group (e.g., all historical stock returns).
- Find Split: Test all possible feature thresholds (e.g., “P/E ratio < 15?”) to create two groups where predictions are most accurate.
- Repeat: Keep splitting subgroups until reaching stopping criteria (max depth or minimum samples).
Example: Predicting stock outperformance
Is P/E ratio < 20?
├─ Yes → Check ROE > 15%
│ ├─ Yes → Predict "Outperform"
│ └─ No → Predict "Neutral"
└─ No → Predict "Underperform"
Mathematical Criterion (Classification)
At each split, maximize the purity gain: \[
\text{Gain} = H(\text{parent}) - \left[\frac{N_{\text{left}}}{N}\, H(\text{left}) + \frac{N_{\text{right}}}{N}\, H(\text{right})\right],
\] where (H) represents the impurity (e.g., Gini or entropy).
Section 1.3 - GARCH Models: Handling Time-Dependent Variance
Why We Need It
Financial returns often exhibit volatility clustering (calm vs. turbulent periods). GARCH models capture this time-varying variance.
Model Equations
Return at time (t): \[ r_t = \mu + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2). \]
Volatility dynamics (GARCH(1,1)): \[ \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2. \]
Calibration Process 1. Initialize parameters (), (), (). 2. Compute the volatility series ({_t^2}) using past ()’s. 3. Adjust parameters to maximize the likelihood of observing the returns. 4. Repeat until convergence (no closed-form solution exists).
Section 1.4 - Enter Deep Learning
Core Idea
Instead of hand-crafting models (linear terms, tree splits, GARCH lags), let the algorithm learn the feature transformations:
Traditional Approach
\[
y = f(\mathbf{x}),
\] where (f) is designed by humans (e.g., \[
y = \theta_0 + \theta_1 x_1 + \dots).
\]
Deep Learning Approach
\[
y = f(\mathbf{x}; \theta),
\] where (f) is a learned sequence of nonlinear transformations: \[
h_1 = \varphi(W_1 \mathbf{x} + b_1),
\] \[
h_2 = \varphi(W_2 h_1 + b_2),
\] \[
\vdots
\] \[
\hat{y} = W_{\text{out}} h_n + b_{\text{out}}.
\]
Here, () is the activation function (e.g., ReLU: ((z) = (0, z))).
Why It Matters for Finance
- Handles raw, high-dimensional data (order books, news text).
- Discovers complex patterns (nonlinear factor interactions).
- Offers flexible architecture design (time series, graphs, etc.).