Keras Matrix Operations: The Building Blocks
Matrix Operations: The Foundation of Neural Networks
Section 1.31 - From NumPy to Keras: Key Differences
Immutability Principle
Unlike NumPy arrays, Keras tensors are immutable:
import numpy as np
import keras
import keras.ops as ops
# NumPy: Mutable
np_array = np.array([[1, 2], [3, 4]])
np_array[0,0] = 10 # Works
# Keras: Immutable
k_tensor = keras.array([[1, 2], [3, 4]])
# k_tensor[0,0] = 10 # Would raise error
# Instead, create new tensor:
new_tensor = ops.concatenate([
ops.reshape(keras.array([10]), (1, 1)),
k_tensor[0:1, 1:],
k_tensor[1:]])Basic Arithmetic Operations
Operations create new tensors:
# Create sample tensors
x = keras.random.normal(shape=(32, 10)) # Batch of 32, 10 features
y = keras.random.normal(shape=(32, 10))
# Basic operations
sum_tensor = x + y # Element-wise addition
diff_tensor = x - y # Element-wise subtraction
prod_tensor = x * y # Element-wise multiplication (Hadamard)
div_tensor = x / y # Element-wise divisionSection 1.32 - Matrix Multiplication: Three Ways
1. Matrix Product (@)
# 2D Case: (batch, features)
x = keras.random.normal(shape=(32, 10))
W = keras.random.normal(shape=(10, 5))
out = x @ W # Shape: (32, 5)
# 3D Case: (batch, sequence, features)
x = keras.random.normal(shape=(32, 20, 10)) # 20 timesteps
W = keras.random.normal(shape=(10, 5))
out = x @ W # Shape: (32, 20, 5)2. Comparing dot and matmul Operations
Let’s examine the same operations using both dot and matmul:
Case 1: Vector-Vector (1D-1D)
v1 = keras.random.normal(shape=(5,))
v2 = keras.random.normal(shape=(5,))
# Both return scalar (dot product)
dot_result = ops.dot(v1, v2) # Scalar output
mat_result = ops.matmul(v1, v2) # Scalar outputCase 2: Matrix-Vector (2D-1D)
matrix = keras.random.normal(shape=(3, 5))
vector = keras.random.normal(shape=(5,))
# Both treat vector as column vector implicitly
dot_result = ops.dot(matrix, vector) # Shape: (3,)
mat_result = ops.matmul(matrix, vector) # Shape: (3,)Case 3: Matrix-Matrix (2D-2D)
A = keras.random.normal(shape=(3, 5))
B = keras.random.normal(shape=(5, 4))
# Both perform standard matrix multiplication
dot_result = ops.dot(A, B) # Shape: (3, 4)
mat_result = ops.matmul(A, B) # Shape: (3, 4)Case 4: Batched Operations (3D-2D)
# Batch of matrices
batch = keras.random.normal(shape=(32, 10, 5)) # (batch, seq, features)
weights = keras.random.normal(shape=(5, 4)) # (in_features, out_features)
# Key Difference 1: Broadcasting behavior
dot_result = ops.dot(batch, weights) # Shape: (32, 10, 4)
mat_result = ops.matmul(batch, weights) # Shape: (32, 10, 4)Case 5: Complex Broadcasting (3D-3D)
# This is where differences become more apparent
A = keras.random.normal(shape=(32, 10, 5)) # (batch, seq1, features)
B = keras.random.normal(shape=(32, 5, 8)) # (batch, features, seq2)
# Key Difference 2: Handling of batch dimensions
dot_result = ops.dot(A, B) # Shape: (32, 10, 32, 8) # Note extra dimensions
mat_result = ops.matmul(A, B) # Shape: (32, 10, 8) # Batch-wise multiplicationKey Differences:
- Simple Cases (1D/2D):
- Both operations behave identically for vector-vector, matrix-vector, and matrix-matrix multiplication
- Higher Dimensional Cases:
matmulis designed for batched matrix multiplication with intuitive broadcastingdotfollows more general tensor contraction rules, which can create extra dimensions
- When to Use Each:
# Use dot for:
# 1. Simple dot products
scalar = ops.dot(vector1, vector2)
# 2. When you need specific tensor contractions
# Sum product over last axis of x1 and second-to-last of x2
result = ops.dot(x1, x2)
# Use matmul for:
# 1. Batched matrix multiplication
batched_result = ops.matmul(batch_input, weights)
# 2. When you want automatic broadcasting
result = ops.matmul(A, B) # B will be broadcast if neededSection 1.36 - Einstein Summation (einsum)
A powerful and concise way to express tensor operations:
# 1. Trace of matrix
matrix = keras.random.normal(shape=(5, 5))
trace = ops.einsum('ii', matrix) # Sum of diagonal elements
# 2. Matrix transpose
matrix = keras.random.normal(shape=(3, 4))
transposed = ops.einsum('ij->ji', matrix) # Shape: (4, 3)
# 3. Batched matrix multiplication
batch = keras.random.normal(shape=(32, 10, 5)) # Batch of matrices
weights = keras.random.normal(shape=(5, 8)) # Weight matrix
result = ops.einsum('bij,jk->bik', batch, weights) # Shape: (32, 10, 8)
# 4. Complex tensor contractions
# Example: Attention mechanism computation
queries = keras.random.normal(shape=(32, 10, 64)) # (batch, seq_len, dim)
keys = keras.random.normal(shape=(32, 15, 64)) # (batch, seq_len2, dim)
attention = ops.einsum('bik,bjk->bij', queries, keys) # Shape: (32, 10, 15)Why Use einsum?
- Clarity: Provides explicit indexing notation
- Flexibility: Can express complex operations in one line
- Performance: Often optimized by backend for efficiency
- Readability: Makes tensor manipulation intentions clear
Common Neural Network Operations with einsum:
# Dense layer without bias
x = keras.random.normal(shape=(32, 10)) # (batch, features)
W = keras.random.normal(shape=(10, 5)) # (in_features, out_features)
y = ops.einsum('bf,fo->bo', x, W) # (batch, out_features)
# Self-attention
Q = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
K = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
V = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
attention = ops.einsum('bik,bjk->bij', Q, K) # Attention scores
output = ops.einsum('bij,bjd->bid', attention, V) # Weighted sumSection 1.33 - Shape Manipulation: Essential Operations
Stack and Concatenate
# Stack: Add new dimension
x1 = keras.random.normal(shape=(32, 10))
x2 = keras.random.normal(shape=(32, 10))
stacked = ops.stack([x1, x2]) # Shape: (2, 32, 10)
stacked = ops.stack([x1, x2], axis=1) # Shape: (32, 2, 10)
# Concatenate: Join along existing dimension
concat = ops.concatenate([x1, x2], axis=0) # Shape: (64, 10)
concat = ops.concatenate([x1, x2], axis=1) # Shape: (32, 20)
# 3D Example
seq1 = keras.random.normal(shape=(32, 5, 10)) # 5 timesteps
seq2 = keras.random.normal(shape=(32, 3, 10)) # 3 timesteps
# Concatenate sequences
longer_seq = ops.concatenate([seq1, seq2], axis=1) # Shape: (32, 8, 10)Dimension Management
# Add dimension
x = keras.random.normal(shape=(32, 10))
x_expanded = ops.expand_dims(x, axis=1) # Shape: (32, 1, 10)
# Alternative using None (NumPy style)
x_expanded = x[:, None, :] # Same result
# 3D Example: Add feature dimension
seq = keras.random.normal(shape=(32, 20, 1)) # Single feature
seq_expanded = seq[..., None] # Shape: (32, 20, 1, 1)Section 1.34 - Common Shape Transformations
Reshape and Transpose
# Reshape: Change tensor structure
x = keras.random.normal(shape=(32, 20, 10))
# Flatten all but batch dimension
flat = ops.reshape(x, (32, -1)) # Shape: (32, 200)
# Transpose: Reorder dimensions
x = keras.random.normal(shape=(32, 20, 10))
# Swap sequence and feature dimensions
x_t = ops.transpose(x, (0, 2, 1)) # Shape: (32, 10, 20)Broadcasting
Keras follows NumPy broadcasting rules:
# Add bias to each feature
x = keras.random.normal(shape=(32, 10)) # Batch data
b = keras.random.normal(shape=(10,)) # Per-feature bias
y = x + b # b is broadcast to (32, 10)
# 3D case: Add timestep-specific bias
x = keras.random.normal(shape=(32, 20, 10)) # Sequential data
b = keras.random.normal(shape=(20, 1)) # Per-timestep bias
y = x + b # b is broadcast to (32, 20, 10)Section 1.35 - Practical Tips
Shape Debugging
Always verify tensor shapes:
# Print shape information
x = keras.random.normal(shape=(32, 20, 10))
print(f"Input shape: {ops.shape(x)}")
# Track shapes through operations
y = ops.dot(x, keras.random.normal(shape=(10, 5)))
print(f"Output shape: {ops.shape(y)}")Memory Efficiency
Be mindful of temporary tensors:
# Less efficient: Creates intermediate tensor
temp = x + y
result = temp * z
# More efficient: Single operation
result = (x + y) * zThe immutable tensor design in modern frameworks (Keras, JAX, PyTorch) stems from the need for automatic differentiation and parallel computation. While NumPy’s mutable arrays are convenient for data processing, immutable tensors enable reliable gradient computation and better GPU utilization.