Computation Backends &amp; Keras 3

Remi Genet

Computation Backends & Keras 3

Cours

Fundamentals

Understanding the ecosystem of deep learning frameworks and how Keras 3 abstracts hardware acceleration through backend engines.

Author

Remi Genet

Published

2025-04-03

The Stack: From Python to Silicon

Section 1.18 - Core Computation Engines

TensorFlow (Google)

Developer: Google Brain Team (2015)
Key Trait: Static computation graphs (define-and-run)
Strengths:
- Production-grade deployment (TF Serving, TFLite)
- Tight TPU integration
Weakness: Less flexible for research prototyping

PyTorch (Meta)

Developer: Facebook AI Research (2016)
Key Trait: Dynamic computation graphs (define-by-run)
Strengths:
- Pythonic debugging experience
- Dominant in academic research
Weakness: Historically weaker mobile/edge support

JAX (Google)

Developer: Google Research (2018)
Key Trait: Functional programming + composable transforms
Strengths:
- Automatic vectorization (vmap)
- Native support for higher-order gradients
Weakness: Steeper learning curve

Section 1.19 - Keras 3: Unified Abstraction Layer

Key Innovation

Keras 3 acts as a backend-agnostic interface:

# Same code runs on TensorFlow, PyTorch, or JAX
import os
os.environ["KERAS_BACKEND"] = "jax"  # Environment variable needs to be set prior to importing keras, default is tensorflow

from keras import layers

model = keras.Sequential([
    layers.Dense(64, activation='relu'),
    layers.Dense(10)
])

model.compile()

Architecture

┌──────────────────────────┐
│ Keras API (Python)       │ ← You code here
├──────────────────────────┤
│ Backend Adapter          │ ← Converts Keras ops to backend primitives
├───────┬────────┬─────────┤
│ TF    │ PyTorch│ JAX     │ ← Backend engines
├───────┴────────┴─────────┤
│ XLA/CUDA/C++/ROCm        │ ← Hardware-specific optimization
└──────────────────────────┘

Section 1.20 - The Performance Layer

Under the Hood

All frameworks ultimately delegate computation to:

Optimized C/C++ Kernels:
- BLAS (e.g., Intel MKL, OpenBLAS) for linear algebra
- Custom ops for neural networks (e.g., convolution)
GPU Acceleration:
- CUDA (NVIDIA) / ROCm (AMD) for parallel computation
- Kernel fusion via XLA (TensorFlow/JAX) or TorchScript (PyTorch)

Example Stack Trace

keras.layers.Dense(..)  # Python
↓
keras.backend.matmul()  # Backend-agnostic op
↓
tf.linalg.matmul()      # TensorFlow implementation
↓
Eigen::Tensor contraction  # C++/CUDA kernel

Section 1.21 - Why Abstraction Matters

Portability: Same model code runs on CPU/GPU/TPU
Vendor Independence: Avoid lock-in to any ecosystem
Performance: Leverage decades of HPC optimization

Back to top