Date: February 24, 2025
Topics Covered: Multi-stage integration methods (Runge-Kutta schemes), linear stability analysis, Störmer-Verlet symplectic integrator
This lecture marks a critical transition in our course: we now have all the tools needed to build sophisticated physics-informed neural architectures. We've learned to derive energy-preserving ODEs from least action principles, compute Hamiltonians via Legendre transforms, integrate using discrete gradients, solve for nonlinear stencils with PyTorch, and impose conservation constraints through polynomial reproduction and Noether's theorem. With these foundational techniques in place, we're ready to tackle our ultimate objective: constructing a nonlinear wave equation solver.
The focus today shifts to the crucial question of how to integrate these systems in time. We begin by examining multi-stage integration schemes, particularly the Runge-Kutta (RK) family of methods. These explicit methods are especially important for machine learning applications where gradient evaluations drive the optimization process. We'll see how different RK schemes achieve various orders of accuracy and, critically, how to analyze their stability regions for systems with complex eigenvalues—a key consideration when solving hyperbolic PDEs like the wave equation.
Finally, we introduce the Störmer-Verlet (leapfrog) integrator, a symplectic method that exactly preserves the Hamiltonian structure of our dynamics. This conservation property makes it ideal for long-time integration of energy-conserving systems derived from variational principles. The connection between our discrete gradient methods and symplectic integration provides a beautiful unification of the variational and Hamiltonian perspectives.
Let's recall what we now know how to do:
With these in hand, we are ready to assume our final form: a nonlinear wave equation solver!
Consider the wave equation:
$$\begin{cases} \partial_{tt} u = c^2 \partial_{xx} u \\ u(x, t=0) = f(x) \end{cases}$$The general solution is given by d'Alembert's formula:
$$u(x,t) = f(x + ct) + f(x - ct)$$Goal: Derive a neural architecture that can recover this simple case.
Important Note: In a previous class, we showed that in the absence of the right-most node $(D_+ u_N^2 = D_+^2 u_N)$, the Lagrangian gives the approximation:
$$\nabla_h^2 \approx D_+^{-1} D_+ h = D_+ D_- h$$Our goal is to be as expressive as possible while satisfying conservation constraints.
We hypothesize the discrete action:
$$S_h[\theta] = \sum_i \frac{1}{2} \dot{q}_i^2 h - \frac{1}{2} N(D_- q_i; \theta)^2 h$$where:
Key Properties:
Euler-Lagrange equation:
$$\boxed{\ddot{q}_i = D_+ \nabla N(D_- q; \theta)}$$This is our learnable wave equation in discrete form.
We can identify the generalized momentum as:
$$p_i(t) = \frac{\partial L}{\partial \dot{q}_i} = \dot{q}_i h$$Applying the Legendre transform:
$$\boxed{H = \underbrace{\sum_i \frac{1}{2} p_i^2 h^{-1}}_{T_\theta(p) \text{ (kinetic)}} + \underbrace{\frac{1}{2} \sum_i N(D_- q; \theta)^2 h}_{V_\theta(q) \text{ (potential)}}}$$We want to solve:
$$\begin{aligned} \frac{dp}{dt} &= -\partial_q V_\theta(q) \\ \frac{dq}{dt} &= \partial_p T_\theta(p) \end{aligned}$$These are the canonical Hamiltonian equations for our learnable wave system.
Some final remarks on time integration:
To choose an integration scheme (so far we've only looked at explicit Euler), we need to understand stability.
Consider a linear system of ODEs:
$$\dot{y} = Ay$$Critical observation: Depending on the eigenvalues of $A$, the ODE will have distinct character:
Important Note: Why is explicit integration important for ML applications?
In machine learning contexts, we need to evaluate gradients with respect to both state variables and parameters $\theta$. Implicit methods require solving nonlinear systems at each timestep, which is computationally expensive and complicates backpropagation. Explicit methods allow straightforward gradient flow through time integration.
Consider now the nonlinear system: $\ddot{x} = F(x; \theta)$
There are two broad classes of explicit methods:
Idea: At each stage, you can make an additional gradient evaluation using points generated in previous stages.
where $a_0 = 1$. If $a_j = 0$ for $j > 0$, the scheme is explicit.
Idea: Use information about the derivative from previous timesteps to predict the next state.
Multi-step methods:
Multi-stage methods:
For simplicity, we'll just use multi-stage methods.
This is our familiar explicit Euler method.
After explicit Euler (EE) for stage 1, make a second stage:
$$\begin{aligned} k_2 &= F(t_n + \alpha h, x_n + \beta h k_1) \\ x_{n+1} &= x_n + h(a k_1 + b k_2) \end{aligned}$$To choose coefficients, expand in Taylor series and match to $\mathcal{O}(h^2)$:
Coefficients must satisfy:
$$\begin{aligned} a + b &= 1 \\ \alpha b &= 1/2 \\ \beta b &= 1/2 \end{aligned}$$Result: Non-unique! Multiple second-order RK schemes exist.
Example: RK2 Take $a = b = 1/2$, then $\alpha = \beta = 1$:
$$\begin{aligned} k_1 &= F(t_n, x_n) \\ k_2 &= F(t_n + h, x_n + h k_1) \\ x_{n+1} &= x_n + \frac{h}{2}(k_1 + k_2) \end{aligned}$$For the general $s$-stage scheme, coefficients are written compactly as a Butcher tableau:
| $c_1$ | $a_{11}$ | $a_{12}$ | $\cdots$ | $a_{1s}$ |
| $c_2$ | $a_{21}$ | $a_{22}$ | $\cdots$ | $a_{2s}$ |
| $\vdots$ | $\vdots$ | $\vdots$ | $\ddots$ | $\vdots$ |
| $c_s$ | $a_{s1}$ | $a_{s2}$ | $\cdots$ | $a_{ss}$ |
| $b_1$ | $b_2$ | $\cdots$ | $b_s$ |
For explicit methods, $a_{ij} = 0$ for $j \geq i$ (lower triangular with zero diagonal).
| 0 | 0 | 0 | 0 | 0 |
| 1/2 | 1/2 | 0 | 0 | 0 |
| 1/2 | 0 | 1/2 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 |
| 1/6 | 1/3 | 1/3 | 1/6 |
To understand why this is the default, we need to analyze the stability. We'll see that it works for purely imaginary problems (hyperbolic PDEs).
Consider again $\dot{y} = Ay$. Let's analyze RK2:
Step 1: Write out the RK2 scheme:
$$\begin{aligned} y_{n+1} &= y_n + \frac{h}{2}(k_1 + k_2) \\ k_1 &= A y_n \\ k_2 &= A(y_n + h k_1) = A y_n + h A^2 y_n \end{aligned}$$Step 2: Substitute:
$$y_{n+1} = \left(I + hA + \frac{h^2}{2} A^2\right) y_n$$Step 3: Define amplification matrix:
$$y_{n+1} = Q y_n \quad \text{where} \quad Q = I + hA + \frac{h^2}{2} A^2$$Note: What drives the choices of coefficients in RK is that $Q \approx \exp(hA)$ (Taylor expansion of matrix exponential).
Consider an arbitrary scheme where $Q$ is diagonalizable:
$$y_n = Q^n y_0$$If $Q = S \Lambda S^{-1}$ where $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_N)$, then component-wise:
$$w_{n,i} = \lambda_i^n w_{0,i}$$Stability condition: The solution is bounded if:
$$\max_i |\lambda_i| \leq 1$$where $\lambda_i$ is the $i$-th eigenvalue of $Q$.
Recall definition: The spectral radius is:
$$\rho(A) = \max\{|\lambda_i| : \lambda_i \text{ is an eigenvalue of } A\}$$We want:
$$\rho(Q) = \left| 1 + h\rho(A) + \frac{h^2}{2} \rho(A)^2 \right| \leq 1$$Let $z = h \rho(A) \in \mathbb{C}$ (complex number). Solve for $z$ such that:
$$|g(z)| = \left| 1 + z + \frac{1}{2} z^2 \right| \leq 1$$The set of all such $z$ forms the stability region in the complex plane.
RK1 (Explicit Euler): Disk of radius 1 centered at $(-1, 0)$ in the complex plane.
RK2: Stretches the stability region by approximately 3/2 in the imaginary direction.
RK4: The standard RK4 has the largest stability region along the imaginary axis among explicit methods of its order, which is why it's the default choice for Hamiltonian/wave-like systems.
Critical observation: For the wave equation (purely imaginary eigenvalues), we need stability regions that extend along the imaginary axis. RK4 provides the best balance of accuracy and stability for such problems.
Recall that for a canonical Hamiltonian system:
$$\begin{aligned} \dot{p} &= -\partial_q H \\ \dot{q} &= \partial_p H \end{aligned}$$with $\frac{dH}{dt} = 0$ (energy conservation).
Assuming the decomposition:
$$H = \underbrace{T(p)}_{\text{kinetic}} + \underbrace{V(q)}_{\text{potential}}$$Then we want to solve:
$$\begin{aligned} \dot{p} &= -\partial_q V \\ \dot{q} &= \partial_p T \end{aligned}$$Only works for separable Hamiltonians, no dissipation.
The Störmer-Verlet (also called leapfrog) integrator is a second-order symplectic method that exactly preserves the symplectic structure of Hamiltonian mechanics.
Algorithm:
Step 1: Half-step momentum update:
$$p_{n+1/2} = p_n - \frac{h}{2} \partial_q V(q_n)$$Step 2: Full-step position update:
$$q_{n+1} = q_n + h \partial_p T(p_{n+1/2})$$Step 3: Half-step momentum update:
$$p_{n+1} = p_{n+1/2} - \frac{h}{2} \partial_q V(q_{n+1})$$Key Properties:
For kinetic energy $T(p) = \frac{1}{2m} p^2$, we have $\partial_p T = p/m$, so the position update simplifies to:
$$q_{n+1} = q_n + \frac{h}{m} p_{n+1/2}$$Important Note: The leapfrog structure (alternating half-steps) is what gives the method its symplectic property. This makes it ideal for long-time integration of conservative systems like our wave equation.
This lecture covered:
Key Takeaway: For energy-conserving systems derived from variational principles, symplectic integrators like Störmer-Verlet provide exact conservation of phase space structure and bounded long-time energy behavior. Combined with learnable nonlinear stencils, these methods enable principled construction of physics-informed neural architectures for PDEs with guaranteed geometric properties. The choice of time integrator—whether high-order RK for accuracy or symplectic methods for conservation—depends critically on the eigenvalue structure and conservation requirements of the target system.