ML for Science and Engineering — Lecture 8
So far we have modeled time series by mapping previous steps to next steps:
Autoregressive models, ARIMA, function fitting on lagged variables.
Today: a fundamentally different approach. Model the derivative:
Discover the differential equation that generates the data.
This is what Newton did. This is what everyone in the sciences has done for quite some time.
Discovering equations from data is not new:
Hudson Bay Company data (1900-1920): hare and lynx pelts traded.
We have two variables $x$ (prey) and $y$ (predator). Assume the dynamics can be written as a linear combination of candidate terms:
We don't know which terms matter. We include many candidates and let the algorithm select the active ones.
The library encodes our hypothesis class: which terms could appear in the equation?
| Library choice | What it can discover |
|---|---|
| $[1, x, x^2, x^3]$ | Polynomial ODEs |
| $[x, \sin(x), \cos(x)]$ | Trigonometric dynamics (e.g. pendulum) |
| $[x, y, xy, x^2, y^2]$ | Coupled 2D systems (e.g. Lotka-Volterra) |
| $[u, u_x, u_{xx}, u \cdot u_x]$ | PDEs (diffusion, Burgers, etc.) |
We need $\dot{x}$, but all we have are discrete samples $x(t_1), x(t_2), \dots, x(t_n)$.
Forward difference:
Central difference:
| Method | Accuracy | Noise sensitivity | Notes |
|---|---|---|---|
| Forward difference | $O(\Delta t)$ | High | Simplest |
| Central difference | $O(\Delta t^2)$ | High | Better accuracy |
| Savitzky-Golay | Depends on window | Low | Smoothing + differentiation |
| Fourier / spectral | Spectral | Medium | Global, periodic signals |
| Total variation | Regularized | Low | For very noisy data |
Focus on $x$ with a quadratic hypothesis: $\dot{x}(t_i) = \theta_0 + \theta_1 \, x(t_i) + \theta_2 \, x(t_i)^2$
Each data point gives one equation with the same unknowns $\theta_0, \theta_1, \theta_2$. Click to add:
Stacking all $n$ equations from the data points:
For an arbitrary library $\phi_0, \phi_1, \dots, \phi_m$:
Compact form: $\dot{\mathbf{X}} = \boldsymbol{\Phi}\big(\mathbf{x}(t)\big) \, \boldsymbol{\theta}$. Each $\phi_j$ could be $x$, $x^2$, $\sin(x)$, $xy$, etc.
Minimize the squared error between predicted and observed derivatives:
Closed-form solution via the normal equations:
When $\mathbf{x} = (x_1, x_2, \dots, x_d)$ is a vector, each component gets its own equation but shares the same library:
Compact matrix form:
Where $n$ = data points, $d$ = state dimension, $p$ = library size. Sparsify each column of $\boldsymbol{\Theta}$ independently.
The $\ell_1$ penalty pushes small coefficients to exactly zero. We saw this with regularization.
The approach used in the original SINDy paper. Simple, effective, interpretable.
import numpy as np
# X: (n, 2) state measurements, dt: time step, threshold: sparsity knob
dXdt = (X[2:] - X[:-2]) / (2 * dt) # derivatives (central diff)
# Feature library: [1, x, y, x², y², xy]
x, y = X[1:-1, 0], X[1:-1, 1]
Phi = np.column_stack([np.ones_like(x), x, y, x**2, y**2, x*y])
Theta = np.linalg.lstsq(Phi, dXdt, rcond=None)[0] # least squares
# Sequential thresholding
for _ in range(10):
small = np.abs(Theta) < threshold
Theta[small] = 0
for j in range(2): # one column per variable
active = ~small[:, j]
if active.any():
Theta[active, j] = np.linalg.lstsq(
Phi[:, active], dXdt[:, j], rcond=None)[0]
SINDy converges, but is the answer correct? Several factors make this harder than it looks:
ML for Science and Engineering — Lecture 9
What if our variable depends on space and time? $u(x, t)$
Same idea: the feature library now includes spatial derivatives:
At each grid point $(x_i, t_j)$, compute all candidate terms:
Stack all interior grid points into one system. After sparsification:
Click interior points to build the system.
Each interior grid point gives one equation. Stack them all:
With a richer library $[u, u_x, u_{xx}, u \cdot u_x, u_{xxx}, \dots]$, the same approach discovers many classical PDEs:
| Equation | Surviving features | Physics |
|---|---|---|
| Diffusion: $u_t = \alpha u_{xx}$ | $u_{xx}$ | Heat conduction |
| Advection: $u_t = -c \, u_x$ | $u_x$ | Wave transport |
| Burgers: $u_t = \nu u_{xx} - u \, u_x$ | $u_{xx}$, $u \cdot u_x$ | Shock formation |
| KdV: $u_t = -u \, u_x + \delta \, u_{xxx}$ | $u \cdot u_x$, $u_{xxx}$ | Solitons |
Rudy, Brunton, Proctor & Kutz, Science Advances (2017) — PDE-FIND: the extension of SINDy to partial differential equations.
| Approach | Hypothesis class | Interpretable? | Data needed |
|---|---|---|---|
| Neural network | Any continuous function | No | Lots |
| SINDy | Sparse combination of known terms | Yes | Moderate |
| Symbolic regression | Arbitrary expressions | Yes | Moderate |
| Physics-informed NN | NN constrained by known PDE | Partially | Less |
Brunton et al., PNAS (2016) | Rudy et al., Science Advances (2017)