DeepLearning For Finance
  • Back to Main Website
  • Home
  • Introduction to Deep Learning
    • Introduction to Deep Learning
    • From Traditional Models to Deep Learning
    • The Multi-Layer Perceptron (MLP)
    • Automatic Differentiation: The Engine of Deep Learning
    • Computation Backends & Keras 3
    • GPUs and Deep Learning: When Hardware Matters
    • Keras Fundamentals: Models & Layers
    • Keras Matrix Operations: The Building Blocks
    • Activation Functions: Adding Non-linearity
    • Model Training Fundamentals

    • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
  • Recurrent Neural Networks
    • Recurrent Neural Networks
    • Sequential Data Processing: From MLPs to RNNs
    • Long Short-Term Memory Networks (LSTM)
    • Modern RNN Architectures
    • RNN Limitations: Computational Challenges

    • Travaux Pratiques
    • TP: Recurrent Neural Networks for Time Series Prediction
    • TP Corrected: Recurrent Neural Networks for Time Series Prediction
  • Training a Neural Network
    • Training a Neural Network
    • Understanding the Training Loop
    • Understanding Optimizers
    • Understanding Callbacks
    • Training Parameters and Practical Considerations

    • Travaux Pratiques
    • TP: Using Deep Learning Frameworks for General Optimization
    • tp_general_optimization_corrected.html
    • TP: Impact of Callbacks on Training
  • Essential Building Blocks of Modern Neural Networks
    • Essential Building Blocks of Modern Neural Networks
    • Residual Connections and Gating Mechanisms
    • Convolutional Layers: From Images to Time Series
    • Neural Network Embeddings: Learning Meaningful Representations
    • Attention Mechanisms: Learning What to Focus On
    • Encoder-Decoder Architectures

    • Travaux Pratiques
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
    • Practical Assignment: Building a Transformer-Based Architecture for Time Series Forecasting
  • Projets
    • Projets
  • Code source
  1. Keras Matrix Operations: The Building Blocks
  • Introduction to Deep Learning
  • From Traditional Models to Deep Learning
  • The Multi-Layer Perceptron (MLP)
  • Automatic Differentiation: The Engine of Deep Learning
  • Computation Backends & Keras 3
  • GPUs and Deep Learning: When Hardware Matters
  • Keras Fundamentals: Models & Layers
  • Keras Matrix Operations: The Building Blocks
  • Activation Functions: Adding Non-linearity
  • Model Training Fundamentals
  • Travaux Pratiques
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations
    • TP1 Corrected: Building Neural Networks - From Simple to Custom Implementations

On this page

  • Matrix Operations: The Foundation of Neural Networks
    • Section 1.31 - From NumPy to Keras: Key Differences
      • Immutability Principle
      • Basic Arithmetic Operations
    • Section 1.32 - Matrix Multiplication: Three Ways
      • 1. Matrix Product (@)
      • 2. Comparing dot and matmul Operations
    • Section 1.36 - Einstein Summation (einsum)
    • Section 1.33 - Shape Manipulation: Essential Operations
      • Stack and Concatenate
      • Dimension Management
    • Section 1.34 - Common Shape Transformations
      • Reshape and Transpose
      • Broadcasting
    • Section 1.35 - Practical Tips
      • Shape Debugging
      • Memory Efficiency

Keras Matrix Operations: The Building Blocks

Cours
Fundamentals
Understanding fundamental matrix operations in Keras that form the basis of all neural network computations, with a focus on practical examples and shape manipulation.
Author

Remi Genet

Published

2025-04-03

Matrix Operations: The Foundation of Neural Networks

Section 1.31 - From NumPy to Keras: Key Differences

Immutability Principle

Unlike NumPy arrays, Keras tensors are immutable:

import numpy as np
import keras
import keras.ops as ops

# NumPy: Mutable
np_array = np.array([[1, 2], [3, 4]])
np_array[0,0] = 10  # Works

# Keras: Immutable
k_tensor = keras.array([[1, 2], [3, 4]])
# k_tensor[0,0] = 10  # Would raise error
# Instead, create new tensor:
new_tensor = ops.concatenate([
    ops.reshape(keras.array([10]), (1, 1)),
    k_tensor[0:1, 1:],
    k_tensor[1:]])

Basic Arithmetic Operations

Operations create new tensors:

# Create sample tensors
x = keras.random.normal(shape=(32, 10))  # Batch of 32, 10 features
y = keras.random.normal(shape=(32, 10))

# Basic operations
sum_tensor = x + y  # Element-wise addition
diff_tensor = x - y  # Element-wise subtraction
prod_tensor = x * y  # Element-wise multiplication (Hadamard)
div_tensor = x / y  # Element-wise division

Section 1.32 - Matrix Multiplication: Three Ways

1. Matrix Product (@)

# 2D Case: (batch, features)
x = keras.random.normal(shape=(32, 10))
W = keras.random.normal(shape=(10, 5))
out = x @ W  # Shape: (32, 5)

# 3D Case: (batch, sequence, features)
x = keras.random.normal(shape=(32, 20, 10))  # 20 timesteps
W = keras.random.normal(shape=(10, 5))
out = x @ W  # Shape: (32, 20, 5)

2. Comparing dot and matmul Operations

Let’s examine the same operations using both dot and matmul:

Case 1: Vector-Vector (1D-1D)

v1 = keras.random.normal(shape=(5,))
v2 = keras.random.normal(shape=(5,))

# Both return scalar (dot product)
dot_result = ops.dot(v1, v2)     # Scalar output
mat_result = ops.matmul(v1, v2)  # Scalar output

Case 2: Matrix-Vector (2D-1D)

matrix = keras.random.normal(shape=(3, 5))
vector = keras.random.normal(shape=(5,))

# Both treat vector as column vector implicitly
dot_result = ops.dot(matrix, vector)    # Shape: (3,)
mat_result = ops.matmul(matrix, vector) # Shape: (3,)

Case 3: Matrix-Matrix (2D-2D)

A = keras.random.normal(shape=(3, 5))
B = keras.random.normal(shape=(5, 4))

# Both perform standard matrix multiplication
dot_result = ops.dot(A, B)    # Shape: (3, 4)
mat_result = ops.matmul(A, B) # Shape: (3, 4)

Case 4: Batched Operations (3D-2D)

# Batch of matrices
batch = keras.random.normal(shape=(32, 10, 5))  # (batch, seq, features)
weights = keras.random.normal(shape=(5, 4))     # (in_features, out_features)

# Key Difference 1: Broadcasting behavior
dot_result = ops.dot(batch, weights)    # Shape: (32, 10, 4)
mat_result = ops.matmul(batch, weights) # Shape: (32, 10, 4)

Case 5: Complex Broadcasting (3D-3D)

# This is where differences become more apparent
A = keras.random.normal(shape=(32, 10, 5))  # (batch, seq1, features)
B = keras.random.normal(shape=(32, 5, 8))   # (batch, features, seq2)

# Key Difference 2: Handling of batch dimensions
dot_result = ops.dot(A, B)     # Shape: (32, 10, 32, 8)  # Note extra dimensions
mat_result = ops.matmul(A, B)  # Shape: (32, 10, 8)      # Batch-wise multiplication

Key Differences:

  1. Simple Cases (1D/2D):
    • Both operations behave identically for vector-vector, matrix-vector, and matrix-matrix multiplication
  2. Higher Dimensional Cases:
    • matmul is designed for batched matrix multiplication with intuitive broadcasting
    • dot follows more general tensor contraction rules, which can create extra dimensions
  3. When to Use Each:
# Use dot for:
# 1. Simple dot products
scalar = ops.dot(vector1, vector2)

# 2. When you need specific tensor contractions
# Sum product over last axis of x1 and second-to-last of x2
result = ops.dot(x1, x2)

# Use matmul for:
# 1. Batched matrix multiplication
batched_result = ops.matmul(batch_input, weights)

# 2. When you want automatic broadcasting
result = ops.matmul(A, B)  # B will be broadcast if needed

Section 1.36 - Einstein Summation (einsum)

A powerful and concise way to express tensor operations:

# 1. Trace of matrix
matrix = keras.random.normal(shape=(5, 5))
trace = ops.einsum('ii', matrix)  # Sum of diagonal elements

# 2. Matrix transpose
matrix = keras.random.normal(shape=(3, 4))
transposed = ops.einsum('ij->ji', matrix)  # Shape: (4, 3)

# 3. Batched matrix multiplication
batch = keras.random.normal(shape=(32, 10, 5))  # Batch of matrices
weights = keras.random.normal(shape=(5, 8))     # Weight matrix
result = ops.einsum('bij,jk->bik', batch, weights)  # Shape: (32, 10, 8)

# 4. Complex tensor contractions
# Example: Attention mechanism computation
queries = keras.random.normal(shape=(32, 10, 64))    # (batch, seq_len, dim)
keys = keras.random.normal(shape=(32, 15, 64))       # (batch, seq_len2, dim)
attention = ops.einsum('bik,bjk->bij', queries, keys)  # Shape: (32, 10, 15)

Why Use einsum?

  1. Clarity: Provides explicit indexing notation
  2. Flexibility: Can express complex operations in one line
  3. Performance: Often optimized by backend for efficiency
  4. Readability: Makes tensor manipulation intentions clear

Common Neural Network Operations with einsum:

# Dense layer without bias
x = keras.random.normal(shape=(32, 10))    # (batch, features)
W = keras.random.normal(shape=(10, 5))     # (in_features, out_features)
y = ops.einsum('bf,fo->bo', x, W)         # (batch, out_features)

# Self-attention
Q = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
K = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
V = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
attention = ops.einsum('bik,bjk->bij', Q, K)  # Attention scores
output = ops.einsum('bij,bjd->bid', attention, V)  # Weighted sum

Section 1.33 - Shape Manipulation: Essential Operations

Stack and Concatenate

# Stack: Add new dimension
x1 = keras.random.normal(shape=(32, 10))
x2 = keras.random.normal(shape=(32, 10))
stacked = ops.stack([x1, x2])  # Shape: (2, 32, 10)
stacked = ops.stack([x1, x2], axis=1)  # Shape: (32, 2, 10)

# Concatenate: Join along existing dimension
concat = ops.concatenate([x1, x2], axis=0)  # Shape: (64, 10)
concat = ops.concatenate([x1, x2], axis=1)  # Shape: (32, 20)

# 3D Example
seq1 = keras.random.normal(shape=(32, 5, 10))  # 5 timesteps
seq2 = keras.random.normal(shape=(32, 3, 10))  # 3 timesteps
# Concatenate sequences
longer_seq = ops.concatenate([seq1, seq2], axis=1)  # Shape: (32, 8, 10)

Dimension Management

# Add dimension
x = keras.random.normal(shape=(32, 10))
x_expanded = ops.expand_dims(x, axis=1)  # Shape: (32, 1, 10)

# Alternative using None (NumPy style)
x_expanded = x[:, None, :]  # Same result

# 3D Example: Add feature dimension
seq = keras.random.normal(shape=(32, 20, 1))  # Single feature
seq_expanded = seq[..., None]  # Shape: (32, 20, 1, 1)

Section 1.34 - Common Shape Transformations

Reshape and Transpose

# Reshape: Change tensor structure
x = keras.random.normal(shape=(32, 20, 10))
# Flatten all but batch dimension
flat = ops.reshape(x, (32, -1))  # Shape: (32, 200)

# Transpose: Reorder dimensions
x = keras.random.normal(shape=(32, 20, 10))
# Swap sequence and feature dimensions
x_t = ops.transpose(x, (0, 2, 1))  # Shape: (32, 10, 20)

Broadcasting

Keras follows NumPy broadcasting rules:

# Add bias to each feature
x = keras.random.normal(shape=(32, 10))  # Batch data
b = keras.random.normal(shape=(10,))     # Per-feature bias
y = x + b  # b is broadcast to (32, 10)

# 3D case: Add timestep-specific bias
x = keras.random.normal(shape=(32, 20, 10))  # Sequential data
b = keras.random.normal(shape=(20, 1))      # Per-timestep bias
y = x + b  # b is broadcast to (32, 20, 10)

Section 1.35 - Practical Tips

Shape Debugging

Always verify tensor shapes:

# Print shape information
x = keras.random.normal(shape=(32, 20, 10))
print(f"Input shape: {ops.shape(x)}")

# Track shapes through operations
y = ops.dot(x, keras.random.normal(shape=(10, 5)))
print(f"Output shape: {ops.shape(y)}")

Memory Efficiency

Be mindful of temporary tensors:

# Less efficient: Creates intermediate tensor
temp = x + y
result = temp * z

# More efficient: Single operation
result = (x + y) * z
Historical Note

The immutable tensor design in modern frameworks (Keras, JAX, PyTorch) stems from the need for automatic differentiation and parallel computation. While NumPy’s mutable arrays are convenient for data processing, immutable tensors enable reliable gradient computation and better GPU utilization.

Back to top
Keras Fundamentals: Models & Layers
Activation Functions: Adding Non-linearity

Deep Learning For Finance, Rémi Genet.
Licence
Code source disponible sur Github

 

Site construit avec et Quarto
Inspiration pour la mise en forme du site ici
Code source disponible sur GitHub