Convolutional Layers: From Images to Time Series

Remi Genet

Convolutional Layers: From Images to Time Series

Course

Advanced Concepts

Understanding convolution operations in neural networks and their applications beyond computer vision.

Author

Remi Genet

Published

2025-04-03

Convolutional Operations: A Unified Mathematical Framework

Section 4.9 - The Mathematical Foundation of Convolutions

At its core, a convolution is an operation between two functions that produces a third function expressing how the shape of one is modified by the other. In deep learning, we use discrete convolutions where one function is our input data and the other is our learnable kernel.

The Basic Convolution Operation

For a 1-dimensional input signal (x) and a kernel (w), the convolution operation is defined as:

\[ (x * w)(t) = \sum_{k} x(t - k) \, w(k) \]

In practice, we often work with finite, discrete signals. For an input vector (x ^n) and a kernel (w ^k), the discrete convolution becomes:

\[ y[t] = \sum_{k=0}^{k-1} x[t - k] \, w[k], \]

where (k) is the kernel size.

Section 4.10 - From 1D to Multi-Dimensional Convolutions

While convolutions are often associated with image processing (2D convolutions), the operation generalizes naturally across dimensions:

1D Convolution (Time Series)

Used for temporal data, where the convolution slides over time:

\[ y[t] = \sum_{k} x[t - k] \, w[k] \]

2D Convolution (Images)

For spatial data with input (X) and kernel (W):

\[ Y[i,j] = \sum_{m} \sum_{n} X[i - m,\, j - n] \, W[m,n] \]

3D Convolution (Videos/Volumes)

Extends to three dimensions for spatio-temporal or volumetric data:

\[ Y[i,j,k] = \sum_{l} \sum_{m} \sum_{n} X[i - l,\, j - m,\, k - n] \, W[l,m,n] \]

Section 4.11 - Convolutions in Time Series Analysis

In time series analysis, 1D convolutions serve several crucial purposes:

Moving Average as Convolution

A simple moving average can be expressed as a convolution with a uniform kernel:

\[ w = \left[\frac{1}{k},\, \frac{1}{k},\, \dots,\, \frac{1}{k}\right] \]

The output at each point becomes an average of (k) surrounding points:

\[ y[t] = \frac{1}{k} \sum_{i=0}^{k-1} x[t - i] \]

Learnable Temporal Patterns

In neural networks, the kernel weights are learned from data. A 1D convolutional layer with input (x ^n) and (c) kernels (w_{(i)} ^k) produces output:

\[ y_{(i)}[t] = \sigma\Bigl(\sum_{k} x[t - k] \, w_{(i)}[k] + b_{(i)}\Bigr) \]

where: - () is a nonlinear activation function, - (b_{(i)}) is a learnable bias term, - (i) ranges from 1 to (c) (number of output channels).

This operation can learn to detect various temporal patterns: - Short-term dependencies: Captured with small kernel sizes. - Long-term patterns: Captured using dilated convolutions. - Multi-scale features: Achieved using parallel convolutions with different kernel sizes.

Dilated Convolutions

To capture long-range dependencies without increasing the parameter count, dilated convolutions introduce gaps in the kernel:

\[ y[t] = \sum_{k} x[t - d\, k] \, w[k], \]

where (d) is the dilation rate. This effectively increases the receptive field exponentially with layer depth while maintaining computational efficiency.

Section 4.12 - Theoretical Properties

Convolutions possess several important properties that make them particularly effective for pattern recognition:

Translation Equivariance: If the input is shifted by (), the output shifts by ():

\[ \operatorname{Conv}(T_\delta x) = T_\delta \operatorname{Conv}(x), \]

where (T_) represents translation by ().
Local Connectivity: Each output point depends only on a local region of the input, reducing computational complexity.
Parameter Sharing: The same kernel is applied across all positions, dramatically reducing the number of parameters compared to fully connected layers.

These properties make convolutional layers particularly effective for tasks where patterns may appear at different positions in the input sequence, while maintaining both computational and statistical efficiency in learning.

Back to top