From Ising Models to Recurrent Neural Networks
ML for Science and Engineering — Lecture 16
Joseph Bakarji
We have been building a toolkit for modeling dynamical systems from data:
Every time-stepping scheme is an autoregressive model:
The Ising Model, Hopfield Networks, and Boltzmann Machines
How statistical mechanics inspired the first neural network architectures
A model from statistical mechanics: a 2D grid of spins, each either "up" $(+1)$ or "down" $(-1)$. Neighboring spins want to align. The model was designed to explain ferromagnetism: how local interactions produce global order.
$s_i \in \{-1, +1\}$ = spin at site $i$
$J > 0$ = coupling (scalar); favors aligned neighbors
$h$ = external magnetic field (scalar)
$\langle i,j \rangle$ = nearest-neighbor pairs only
The Metropolis-Hastings algorithm simulates the Ising model at temperature $T$:
import numpy as np
def ising_step(grid, T, J=1.0):
N = grid.shape[0]
for _ in range(N * N):
i, j = np.random.randint(N, size=2)
s = grid[i, j]
# Sum of 4 nearest neighbors
neighbors = (grid[(i-1)%N, j] +
grid[(i+1)%N, j] +
grid[i, (j-1)%N] +
grid[i, (j+1)%N])
dE = 2 * J * s * neighbors
if dE <= 0 or np.random.rand() < np.exp(-dE / T):
grid[i, j] = -s
return grid
John Hopfield's insight: replace the Ising lattice with a fully connected network. Instead of nearest-neighbor coupling, every neuron connects to every other. The energy landscape has local minima that serve as stored memories.
Same energy as Ising, but with all-to-all learned weights $W_{ij}$ instead of uniform nearest-neighbor coupling $J$.
John Hopfield
Geoffrey Hinton
Nobel Prize in Physics 2024
The network operates as an associative memory: given a corrupted input, it recovers the closest stored pattern.
Ramsauer et al. (2021) showed that with continuous states and a log-sum-exp energy, the Hopfield update rule becomes:
Geoffrey Hinton and Terrence Sejnowski (1985) extended Hopfield networks with two ingredients:
A Restricted Boltzmann Machine (RBM) has no connections within the same layer (bipartite graph):
Goal: maximize the probability the model assigns to real data. The gradient has a beautiful structure:
Physics-inspired designs gave birth to modern deep learning:
Hopfield (1982)
Energy minimization
Associative memory
$\downarrow$
Modern Hopfield
$\rightarrow$ Transformers
Boltzmann (1985)
Gibbs distribution
Generative model
$\downarrow$
Pretraining
$\rightarrow$ VAEs, GANs, Diffusion
Ising (1920)
Phase transitions
Statistical mechanics
$\downarrow$
Energy-based learning
$\rightarrow$ Unifying framework
Learning dynamics in a hidden state space
Instead of modeling $x_{k+1} = f(x_k)$ in the observable space, introduce a hidden state $h_t$:
The RNN predicts $\hat{x}_{t+1}$ at each step. The loss measures prediction errors across the full sequence:
To minimize $\mathcal{L}$, we need $\frac{\partial \mathcal{L}}{\partial W_{hh}}$, $\frac{\partial \mathcal{L}}{\partial W_{xh}}$, and $\frac{\partial \mathcal{L}}{\partial W_{hy}}$. But $h_t$ depends on $h_{t-1}$, which depends on $h_{t-2}$, etc. We must unroll the network through time and apply the chain rule.
Apply the chain rule step by step. Consider the gradient at time $t = 4$:
The gradient magnitude at time step $k$ back scales as $\sim (\rho \cdot \overline{\tanh'})^k$:
Hochreiter & Schmidhuber (1997) introduced gating mechanisms and a cell state highway:
Source: Wikipedia — LSTM cell diagram
What if you never train the recurrent weights?
A radically different approach: the recurrent dynamics are random and fixed. Only a linear readout is trained.
For the reservoir to be useful, it must forget initial conditions:
Watch how reservoir neurons respond to a sinusoidal input at different spectral radii:
How much can a reservoir remember? (Dambre et al. 2012)
Any physical system with sufficient complexity, nonlinearity, and fading memory can be a reservoir:
| Physical System | Reservoir Mechanism | Reference |
|---|---|---|
| Photonic circuits | Mach-Zehnder modulator + delay feedback | Larger et al. 2012 |
| Mechanical networks | Mass-spring nonlinear coupling | Dion et al. 2018 |
| Quantum systems | Interacting qubits, exponential Hilbert space | Fujii & Nakajima 2017 |
| Biological neurons | Cortical microcircuits (Liquid State Machines) | Maass et al. 2002 |
Pathak et al., Physical Review Letters (2018)
A model of spatiotemporal chaos, originally for flame-front instabilities (Kuramoto 1978, Sivashinsky 1977):
Simulated KS-like spatiotemporal pattern
| Reservoir size | $N = 5000$ neurons |
| Input | 64-dim spatial discretization |
| Training | Ridge regression only |
| Spectral radius | $\rho \approx 0.9$ |
| Prediction | Output fed back as input |
| Feature | Vanilla RNN | LSTM | ESN |
|---|---|---|---|
| Recurrent weights | Trained (BPTT) | Trained (BPTT) | Fixed random |
| Training cost | $O(N^2 T \cdot \text{epochs})$ | $O(N^2 T \cdot \text{epochs})$ | $O(N^2 T + N^3)$ |
| Gradient issues | Vanishing/exploding | Mitigated (cell highway) | None |
| Memory | Short | Long (gated) | $\leq N$ (hard limit) |
| Interpretability | Low | Low | High (linear readout) |
| Adaptability | Learned features | Learned features | Random features |
| Era | Architecture | Core Idea | Physics Connection |
|---|---|---|---|
| 1980s | Hopfield / Boltzmann | Energy minimization as computation | Ising model, stat mech |
| 1990s | RNN / LSTM | Learned dynamics in hidden space | Dynamical systems, state-space |
| 2000s | Echo State Networks | Random dynamics + linear readout | Edge of chaos, physical reservoirs |
| 2020s | Transformers / SSMs | Attention = Hopfield retrieval | Modern Hopfield energy |
Hopfield & Boltzmann
Hopfield (1982). Neural networks and physical systems with emergent collective computational abilities. PNAS.
Ramsauer et al. (2021). Hopfield networks is all you need. ICLR.
McEliece et al. (1987). The capacity of the Hopfield associative memory. IEEE TIT.
Hinton & Sejnowski (1983). Boltzmann machines. CVPR.
Hinton (2002). Contrastive divergence. Neural Computation.
Hinton et al. (2006). Deep belief nets. Neural Computation.
RNNs & LSTMs
Hochreiter & Schmidhuber (1997). Long short-term memory. Neural Computation.
Cho et al. (2014). GRU encoder-decoder. EMNLP.
Reservoir Computing
Jaeger (2001). Echo state networks. GMD Report 148.
Maass et al. (2002). Liquid state machines. Neural Computation.
Pathak et al. (2018). Predicting spatiotemporal chaos. PRL.
Dambre et al. (2012). Information processing capacity. Scientific Reports.
Gauthier et al. (2021). Next generation RC. Nature Comm.
Modern
Chen et al. (2018). Neural ODEs. NeurIPS.
LeCun et al. (2006). Energy-based learning. MIT Press.