Keras Matrix Operations: The Building Blocks

Remi Genet

Keras Matrix Operations: The Building Blocks

Cours

Fundamentals

Understanding fundamental matrix operations in Keras that form the basis of all neural network computations, with a focus on practical examples and shape manipulation.

Author

Remi Genet

Published

2025-04-03

Matrix Operations: The Foundation of Neural Networks

Section 1.31 - From NumPy to Keras: Key Differences

Immutability Principle

Unlike NumPy arrays, Keras tensors are immutable:

import numpy as np
import keras
import keras.ops as ops

# NumPy: Mutable
np_array = np.array([[1, 2], [3, 4]])
np_array[0,0] = 10  # Works

# Keras: Immutable
k_tensor = keras.array([[1, 2], [3, 4]])
# k_tensor[0,0] = 10  # Would raise error
# Instead, create new tensor:
new_tensor = ops.concatenate([
    ops.reshape(keras.array([10]), (1, 1)),
    k_tensor[0:1, 1:],
    k_tensor[1:]])

Basic Arithmetic Operations

Operations create new tensors:

# Create sample tensors
x = keras.random.normal(shape=(32, 10))  # Batch of 32, 10 features
y = keras.random.normal(shape=(32, 10))

# Basic operations
sum_tensor = x + y  # Element-wise addition
diff_tensor = x - y  # Element-wise subtraction
prod_tensor = x * y  # Element-wise multiplication (Hadamard)
div_tensor = x / y  # Element-wise division

Section 1.32 - Matrix Multiplication: Three Ways

1. Matrix Product (@)

# 2D Case: (batch, features)
x = keras.random.normal(shape=(32, 10))
W = keras.random.normal(shape=(10, 5))
out = x @ W  # Shape: (32, 5)

# 3D Case: (batch, sequence, features)
x = keras.random.normal(shape=(32, 20, 10))  # 20 timesteps
W = keras.random.normal(shape=(10, 5))
out = x @ W  # Shape: (32, 20, 5)

2. Comparing dot and matmul Operations

Let’s examine the same operations using both dot and matmul:

Case 1: Vector-Vector (1D-1D)

v1 = keras.random.normal(shape=(5,))
v2 = keras.random.normal(shape=(5,))

# Both return scalar (dot product)
dot_result = ops.dot(v1, v2)     # Scalar output
mat_result = ops.matmul(v1, v2)  # Scalar output

Case 2: Matrix-Vector (2D-1D)

matrix = keras.random.normal(shape=(3, 5))
vector = keras.random.normal(shape=(5,))

# Both treat vector as column vector implicitly
dot_result = ops.dot(matrix, vector)    # Shape: (3,)
mat_result = ops.matmul(matrix, vector) # Shape: (3,)

Case 3: Matrix-Matrix (2D-2D)

A = keras.random.normal(shape=(3, 5))
B = keras.random.normal(shape=(5, 4))

# Both perform standard matrix multiplication
dot_result = ops.dot(A, B)    # Shape: (3, 4)
mat_result = ops.matmul(A, B) # Shape: (3, 4)

Case 4: Batched Operations (3D-2D)

# Batch of matrices
batch = keras.random.normal(shape=(32, 10, 5))  # (batch, seq, features)
weights = keras.random.normal(shape=(5, 4))     # (in_features, out_features)

# Key Difference 1: Broadcasting behavior
dot_result = ops.dot(batch, weights)    # Shape: (32, 10, 4)
mat_result = ops.matmul(batch, weights) # Shape: (32, 10, 4)

Case 5: Complex Broadcasting (3D-3D)

# This is where differences become more apparent
A = keras.random.normal(shape=(32, 10, 5))  # (batch, seq1, features)
B = keras.random.normal(shape=(32, 5, 8))   # (batch, features, seq2)

# Key Difference 2: Handling of batch dimensions
dot_result = ops.dot(A, B)     # Shape: (32, 10, 32, 8)  # Note extra dimensions
mat_result = ops.matmul(A, B)  # Shape: (32, 10, 8)      # Batch-wise multiplication

Key Differences:

Simple Cases (1D/2D):
- Both operations behave identically for vector-vector, matrix-vector, and matrix-matrix multiplication
Higher Dimensional Cases:
- matmul is designed for batched matrix multiplication with intuitive broadcasting
- dot follows more general tensor contraction rules, which can create extra dimensions
When to Use Each:

# Use dot for:
# 1. Simple dot products
scalar = ops.dot(vector1, vector2)

# 2. When you need specific tensor contractions
# Sum product over last axis of x1 and second-to-last of x2
result = ops.dot(x1, x2)

# Use matmul for:
# 1. Batched matrix multiplication
batched_result = ops.matmul(batch_input, weights)

# 2. When you want automatic broadcasting
result = ops.matmul(A, B)  # B will be broadcast if needed

Section 1.36 - Einstein Summation (einsum)

A powerful and concise way to express tensor operations:

# 1. Trace of matrix
matrix = keras.random.normal(shape=(5, 5))
trace = ops.einsum('ii', matrix)  # Sum of diagonal elements

# 2. Matrix transpose
matrix = keras.random.normal(shape=(3, 4))
transposed = ops.einsum('ij->ji', matrix)  # Shape: (4, 3)

# 3. Batched matrix multiplication
batch = keras.random.normal(shape=(32, 10, 5))  # Batch of matrices
weights = keras.random.normal(shape=(5, 8))     # Weight matrix
result = ops.einsum('bij,jk->bik', batch, weights)  # Shape: (32, 10, 8)

# 4. Complex tensor contractions
# Example: Attention mechanism computation
queries = keras.random.normal(shape=(32, 10, 64))    # (batch, seq_len, dim)
keys = keras.random.normal(shape=(32, 15, 64))       # (batch, seq_len2, dim)
attention = ops.einsum('bik,bjk->bij', queries, keys)  # Shape: (32, 10, 15)

Why Use einsum?

Clarity: Provides explicit indexing notation
Flexibility: Can express complex operations in one line
Performance: Often optimized by backend for efficiency
Readability: Makes tensor manipulation intentions clear

Common Neural Network Operations with einsum:

# Dense layer without bias
x = keras.random.normal(shape=(32, 10))    # (batch, features)
W = keras.random.normal(shape=(10, 5))     # (in_features, out_features)
y = ops.einsum('bf,fo->bo', x, W)         # (batch, out_features)

# Self-attention
Q = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
K = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
V = keras.random.normal(shape=(32, 8, 64)) # (batch, seq, dim)
attention = ops.einsum('bik,bjk->bij', Q, K)  # Attention scores
output = ops.einsum('bij,bjd->bid', attention, V)  # Weighted sum

Section 1.33 - Shape Manipulation: Essential Operations

Stack and Concatenate

# Stack: Add new dimension
x1 = keras.random.normal(shape=(32, 10))
x2 = keras.random.normal(shape=(32, 10))
stacked = ops.stack([x1, x2])  # Shape: (2, 32, 10)
stacked = ops.stack([x1, x2], axis=1)  # Shape: (32, 2, 10)

# Concatenate: Join along existing dimension
concat = ops.concatenate([x1, x2], axis=0)  # Shape: (64, 10)
concat = ops.concatenate([x1, x2], axis=1)  # Shape: (32, 20)

# 3D Example
seq1 = keras.random.normal(shape=(32, 5, 10))  # 5 timesteps
seq2 = keras.random.normal(shape=(32, 3, 10))  # 3 timesteps
# Concatenate sequences
longer_seq = ops.concatenate([seq1, seq2], axis=1)  # Shape: (32, 8, 10)

Dimension Management

# Add dimension
x = keras.random.normal(shape=(32, 10))
x_expanded = ops.expand_dims(x, axis=1)  # Shape: (32, 1, 10)

# Alternative using None (NumPy style)
x_expanded = x[:, None, :]  # Same result

# 3D Example: Add feature dimension
seq = keras.random.normal(shape=(32, 20, 1))  # Single feature
seq_expanded = seq[..., None]  # Shape: (32, 20, 1, 1)

Section 1.34 - Common Shape Transformations

Reshape and Transpose

# Reshape: Change tensor structure
x = keras.random.normal(shape=(32, 20, 10))
# Flatten all but batch dimension
flat = ops.reshape(x, (32, -1))  # Shape: (32, 200)

# Transpose: Reorder dimensions
x = keras.random.normal(shape=(32, 20, 10))
# Swap sequence and feature dimensions
x_t = ops.transpose(x, (0, 2, 1))  # Shape: (32, 10, 20)

Broadcasting

Keras follows NumPy broadcasting rules:

# Add bias to each feature
x = keras.random.normal(shape=(32, 10))  # Batch data
b = keras.random.normal(shape=(10,))     # Per-feature bias
y = x + b  # b is broadcast to (32, 10)

# 3D case: Add timestep-specific bias
x = keras.random.normal(shape=(32, 20, 10))  # Sequential data
b = keras.random.normal(shape=(20, 1))      # Per-timestep bias
y = x + b  # b is broadcast to (32, 20, 10)

Section 1.35 - Practical Tips

Shape Debugging

Always verify tensor shapes:

# Print shape information
x = keras.random.normal(shape=(32, 20, 10))
print(f"Input shape: {ops.shape(x)}")

# Track shapes through operations
y = ops.dot(x, keras.random.normal(shape=(10, 5)))
print(f"Output shape: {ops.shape(y)}")

Memory Efficiency

Be mindful of temporary tensors:

# Less efficient: Creates intermediate tensor
temp = x + y
result = temp * z

# More efficient: Single operation
result = (x + y) * z

Historical Note

The immutable tensor design in modern frameworks (Keras, JAX, PyTorch) stems from the need for automatic differentiation and parallel computation. While NumPy’s mutable arrays are convenient for data processing, immutable tensors enable reliable gradient computation and better GPU utilization.

Back to top