Keras Matrix Operations: The Building Blocks
Matrix Operations: The Foundation of Neural Networks
Section 1.31 - From NumPy to Keras: Key Differences
Immutability Principle
Unlike NumPy arrays, Keras tensors are immutable:
import numpy as np
import keras
import keras.ops as ops
# NumPy: Mutable
= np.array([[1, 2], [3, 4]])
np_array 0,0] = 10 # Works
np_array[
# Keras: Immutable
= keras.array([[1, 2], [3, 4]])
k_tensor # k_tensor[0,0] = 10 # Would raise error
# Instead, create new tensor:
= ops.concatenate([
new_tensor 10]), (1, 1)),
ops.reshape(keras.array([0:1, 1:],
k_tensor[1:]]) k_tensor[
Basic Arithmetic Operations
Operations create new tensors:
# Create sample tensors
= keras.random.normal(shape=(32, 10)) # Batch of 32, 10 features
x = keras.random.normal(shape=(32, 10))
y
# Basic operations
= x + y # Element-wise addition
sum_tensor = x - y # Element-wise subtraction
diff_tensor = x * y # Element-wise multiplication (Hadamard)
prod_tensor = x / y # Element-wise division div_tensor
Section 1.32 - Matrix Multiplication: Three Ways
1. Matrix Product (@)
# 2D Case: (batch, features)
= keras.random.normal(shape=(32, 10))
x = keras.random.normal(shape=(10, 5))
W = x @ W # Shape: (32, 5)
out
# 3D Case: (batch, sequence, features)
= keras.random.normal(shape=(32, 20, 10)) # 20 timesteps
x = keras.random.normal(shape=(10, 5))
W = x @ W # Shape: (32, 20, 5) out
2. Comparing dot and matmul Operations
Let’s examine the same operations using both dot
and matmul
:
Case 1: Vector-Vector (1D-1D)
= keras.random.normal(shape=(5,))
v1 = keras.random.normal(shape=(5,))
v2
# Both return scalar (dot product)
= ops.dot(v1, v2) # Scalar output
dot_result = ops.matmul(v1, v2) # Scalar output mat_result
Case 2: Matrix-Vector (2D-1D)
= keras.random.normal(shape=(3, 5))
matrix = keras.random.normal(shape=(5,))
vector
# Both treat vector as column vector implicitly
= ops.dot(matrix, vector) # Shape: (3,)
dot_result = ops.matmul(matrix, vector) # Shape: (3,) mat_result
Case 3: Matrix-Matrix (2D-2D)
= keras.random.normal(shape=(3, 5))
A = keras.random.normal(shape=(5, 4))
B
# Both perform standard matrix multiplication
= ops.dot(A, B) # Shape: (3, 4)
dot_result = ops.matmul(A, B) # Shape: (3, 4) mat_result
Case 4: Batched Operations (3D-2D)
# Batch of matrices
= keras.random.normal(shape=(32, 10, 5)) # (batch, seq, features)
batch = keras.random.normal(shape=(5, 4)) # (in_features, out_features)
weights
# Key Difference 1: Broadcasting behavior
= ops.dot(batch, weights) # Shape: (32, 10, 4)
dot_result = ops.matmul(batch, weights) # Shape: (32, 10, 4) mat_result
Case 5: Complex Broadcasting (3D-3D)
# This is where differences become more apparent
= keras.random.normal(shape=(32, 10, 5)) # (batch, seq1, features)
A = keras.random.normal(shape=(32, 5, 8)) # (batch, features, seq2)
B
# Key Difference 2: Handling of batch dimensions
= ops.dot(A, B) # Shape: (32, 10, 32, 8) # Note extra dimensions
dot_result = ops.matmul(A, B) # Shape: (32, 10, 8) # Batch-wise multiplication mat_result
Key Differences:
- Simple Cases (1D/2D):
- Both operations behave identically for vector-vector, matrix-vector, and matrix-matrix multiplication
- Higher Dimensional Cases:
matmul
is designed for batched matrix multiplication with intuitive broadcastingdot
follows more general tensor contraction rules, which can create extra dimensions
- When to Use Each:
# Use dot for:
# 1. Simple dot products
= ops.dot(vector1, vector2)
scalar
# 2. When you need specific tensor contractions
# Sum product over last axis of x1 and second-to-last of x2
= ops.dot(x1, x2)
result
# Use matmul for:
# 1. Batched matrix multiplication
= ops.matmul(batch_input, weights)
batched_result
# 2. When you want automatic broadcasting
= ops.matmul(A, B) # B will be broadcast if needed result
Section 1.36 - Einstein Summation (einsum)
A powerful and concise way to express tensor operations:
# 1. Trace of matrix
= keras.random.normal(shape=(5, 5))
matrix = ops.einsum('ii', matrix) # Sum of diagonal elements
trace
# 2. Matrix transpose
= keras.random.normal(shape=(3, 4))
matrix = ops.einsum('ij->ji', matrix) # Shape: (4, 3)
transposed
# 3. Batched matrix multiplication
= keras.random.normal(shape=(32, 10, 5)) # Batch of matrices
batch = keras.random.normal(shape=(5, 8)) # Weight matrix
weights = ops.einsum('bij,jk->bik', batch, weights) # Shape: (32, 10, 8)
result
# 4. Complex tensor contractions
# Example: Attention mechanism computation
= keras.random.normal(shape=(32, 10, 64)) # (batch, seq_len, dim)
queries = keras.random.normal(shape=(32, 15, 64)) # (batch, seq_len2, dim)
keys = ops.einsum('bik,bjk->bij', queries, keys) # Shape: (32, 10, 15) attention
Why Use einsum?
- Clarity: Provides explicit indexing notation
- Flexibility: Can express complex operations in one line
- Performance: Often optimized by backend for efficiency
- Readability: Makes tensor manipulation intentions clear
Common Neural Network Operations with einsum:
# Dense layer without bias
= keras.random.normal(shape=(32, 10)) # (batch, features)
x = keras.random.normal(shape=(10, 5)) # (in_features, out_features)
W = ops.einsum('bf,fo->bo', x, W) # (batch, out_features)
y
# Self-attention
= keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
Q = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
K = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
V = ops.einsum('bik,bjk->bij', Q, K) # Attention scores
attention = ops.einsum('bij,bjd->bid', attention, V) # Weighted sum output
Section 1.33 - Shape Manipulation: Essential Operations
Stack and Concatenate
# Stack: Add new dimension
= keras.random.normal(shape=(32, 10))
x1 = keras.random.normal(shape=(32, 10))
x2 = ops.stack([x1, x2]) # Shape: (2, 32, 10)
stacked = ops.stack([x1, x2], axis=1) # Shape: (32, 2, 10)
stacked
# Concatenate: Join along existing dimension
= ops.concatenate([x1, x2], axis=0) # Shape: (64, 10)
concat = ops.concatenate([x1, x2], axis=1) # Shape: (32, 20)
concat
# 3D Example
= keras.random.normal(shape=(32, 5, 10)) # 5 timesteps
seq1 = keras.random.normal(shape=(32, 3, 10)) # 3 timesteps
seq2 # Concatenate sequences
= ops.concatenate([seq1, seq2], axis=1) # Shape: (32, 8, 10) longer_seq
Dimension Management
# Add dimension
= keras.random.normal(shape=(32, 10))
x = ops.expand_dims(x, axis=1) # Shape: (32, 1, 10)
x_expanded
# Alternative using None (NumPy style)
= x[:, None, :] # Same result
x_expanded
# 3D Example: Add feature dimension
= keras.random.normal(shape=(32, 20, 1)) # Single feature
seq = seq[..., None] # Shape: (32, 20, 1, 1) seq_expanded
Section 1.34 - Common Shape Transformations
Reshape and Transpose
# Reshape: Change tensor structure
= keras.random.normal(shape=(32, 20, 10))
x # Flatten all but batch dimension
= ops.reshape(x, (32, -1)) # Shape: (32, 200)
flat
# Transpose: Reorder dimensions
= keras.random.normal(shape=(32, 20, 10))
x # Swap sequence and feature dimensions
= ops.transpose(x, (0, 2, 1)) # Shape: (32, 10, 20) x_t
Broadcasting
Keras follows NumPy broadcasting rules:
# Add bias to each feature
= keras.random.normal(shape=(32, 10)) # Batch data
x = keras.random.normal(shape=(10,)) # Per-feature bias
b = x + b # b is broadcast to (32, 10)
y
# 3D case: Add timestep-specific bias
= keras.random.normal(shape=(32, 20, 10)) # Sequential data
x = keras.random.normal(shape=(20, 1)) # Per-timestep bias
b = x + b # b is broadcast to (32, 20, 10) y
Section 1.35 - Practical Tips
Shape Debugging
Always verify tensor shapes:
# Print shape information
= keras.random.normal(shape=(32, 20, 10))
x print(f"Input shape: {ops.shape(x)}")
# Track shapes through operations
= ops.dot(x, keras.random.normal(shape=(10, 5)))
y print(f"Output shape: {ops.shape(y)}")
Memory Efficiency
Be mindful of temporary tensors:
# Less efficient: Creates intermediate tensor
= x + y
temp = temp * z
result
# More efficient: Single operation
= (x + y) * z result
The immutable tensor design in modern frameworks (Keras, JAX, PyTorch) stems from the need for automatic differentiation and parallel computation. While NumPy’s mutable arrays are convenient for data processing, immutable tensors enable reliable gradient computation and better GPU utilization.