Lecture 7: Hamiltonian Dynamics and Energy-Conserving Time Integrators

Topics: Hamiltonian mechanics, symplectic structure, machine learning Hamiltonians, discrete gradient method

0. Overview

This lecture transitions from learning spatial discretizations (finite difference stencils) to learning time integrators that preserve fundamental physical invariants. We address a critical question: In Monday's lecture, we built generic nonlinear FD stencil fitters — they worked most of the time, but why? Today we'll talk about building in energy conservation exactly.

The key insight is that many physical systems are naturally described as Hamiltonian dynamical systems, where energy (the Hamiltonian $H$) is automatically conserved. Standard numerical integration methods (forward Euler, Runge-Kutta) violate this conservation, leading to unphysical energy drift or damping over long time horizons. By designing time integrators that respect the symplectic structure of Hamiltonian systems, we can guarantee discrete energy conservation.

This lecture introduces three perspectives on mechanics — Newtonian (forces), Hamiltonian (energy-centric), and Lagrangian (optimization-centric) — and shows how they connect through the Legendre transform. We then develop the discrete gradient method, a geometric integration technique that constructs time integrators preserving energy exactly at the discrete level. This provides the foundation for machine learning continuous Hamiltonians from data while ensuring learned models conserve energy.

Pedagogical note: While the machinery may seem abstract, the payoff is substantial: learned models that automatically respect conservation laws exhibit dramatically improved long-time accuracy and physical realism compared to black-box neural networks.

1. Preliminaries: Skew-Symmetry and Energy Conservation

This simple observation is the foundation for energy conservation in Hamiltonian systems.

2. Hamiltonian Mechanics

2.1 Canonical Hamiltonian Systems

2.2 Example: Nonlinear Pendulum

Small angle limit: $\sin\theta \approx \theta$ gives $\ddot{\theta} = -\lambda^2 \theta$ (harmonic oscillator).

For dynamics of the form $\ddot{\theta} + F(\theta) = 0$ where $F$ has a simple antiderivative, we can use the trick:

For the nonlinear pendulum: Multiply $\ddot{\theta} + \lambda^2 \sin\theta = 0$ by $\dot{\theta}$: $$\ddot{\theta}\dot{\theta} + \lambda^2 \dot{\theta}\sin\theta = 0$$ $$\frac{d}{dt}\left(\frac{1}{2}\dot{\theta}^2 - \lambda^2\cos\theta\right) = 0$$ $$\Rightarrow \frac{1}{2}\dot{\theta}^2 - \lambda^2\cos\theta = \text{const.}$$ Taking: $$\begin{aligned} H &= \frac{1}{2}p^2 - \lambda^2\cos q \\ \frac{dp}{dt} &= -\partial_q H = -\lambda^2\sin q \\ \frac{dq}{dt} &= \partial_p H = p \end{aligned}$$

3. Symplectic Structure

3.1 Compact Notation

Note: Some references use $S = \begin{pmatrix} 0 & -I \\ I & 0 \end{pmatrix}$; the sign convention depends on whether you write $\dot{q} = \partial_p H$ or $\dot{p} = \partial_q H$ first.

3.2 Area Preservation (Liouville's Theorem)

So $\int_{\partial\Omega} x \cdot dA$ corresponds to the area (volume) of $\Omega$ times dimension.

Conclusion: Area (volume) in phase space is conserved. This is exactly the property violated when we over-damp our system (e.g., with forward Euler or excessive numerical dissipation).

4. Non-Canonical Hamiltonians

Proof: $$\begin{aligned} \frac{dH}{dt} &= \nabla_x H^\top \dot{x} \\ &= \nabla_x H^\top S(x) \nabla_x H \\ &= \frac{1}{2}\nabla_x H^\top S(x) \nabla_x H + \frac{1}{2}\nabla_x H^\top S(x) \nabla_x H \\ &= \frac{1}{2}\nabla_x H^\top S(x) \nabla_x H - \frac{1}{2}\nabla_x H^\top S(x)^\top \nabla_x H \\ &= \frac{1}{2}\nabla_x H^\top S(x) \nabla_x H - \frac{1}{2}(\nabla_x H^\top S(x) \nabla_x H)^\top \quad \text{(scalar)} \\ &= 0 \end{aligned}$$

5. Machine Learning Hamiltonian Dynamics

In the continuous setting, we can easily learn a Hamiltonian system using neural networks:

This architecture guarantees $S(x) = -S(x)^\top$, so energy is conserved in continuous time.

Problem: To fit to data, we need to finite difference $\dot{x} \approx \frac{x^{n+1} - x^n}{k}$, which will either:

Solution: To discretely conserve $H$, we turn to geometric integration theory.

6. Discrete Gradient Method

6.1 Framework

This gives a wish list for how to build $\tilde{S}$ and $\bar{\nabla} H$.

6.2 Example 1: Harten, Lax, van Leer (1983)

Proof: $$\begin{aligned} \bar{\nabla} H(x, y) \cdot (y - x) &= \sum_{i=1}^n \bar{H}_{x_i}(x, y)(y_i - x_i) \\ &= \sum_{i=1}^n \int_0^1 H_{x_i}[(1-\xi)x + \xi y] (y_i - x_i)\, d\xi \\ &= \int_0^1 \sum_{i=1}^n H_{x_i}[(1-\xi)x + \xi y] (y_i - x_i)\, d\xi \\ &= \int_0^1 \frac{d}{d\xi} H[(1-\xi)x + \xi y]\, d\xi \quad \text{(chain rule)} \\ &= H(y) - H(x) \end{aligned}$$

where in the second-to-last step, we used:

$$\frac{d}{d\xi} H[(1-\xi)x + \xi y] = \nabla H \cdot \frac{d}{d\xi}[(1-\xi)x + \xi y] = \nabla H \cdot (y - x)$$

6.3 Other Discrete Gradient Formulations

Different formulations have different computational costs and accuracy properties, but all satisfy the discrete gradient property.

Summary

This lecture established the mathematical framework for learning time integrators that preserve energy:

Hamiltonian mechanics provides an energy-centric description of dynamical systems where conservation of $H$ follows automatically from the equations of motion
Symplectic structure ($S = -S^\top$) is the geometric property ensuring energy conservation and phase space volume preservation
Machine learning Hamiltonians in continuous time is straightforward using neural networks with built-in skew-symmetry constraints
Discrete gradient method extends energy conservation to discrete time by carefully constructing approximations $\bar{\nabla}H(x^{n+1}, x^n)$ satisfying:
- Consistency: $\bar{\nabla}H(x, x) = \nabla H(x)$
- Discrete gradient property: $(x^{n+1} - x^n) \cdot \bar{\nabla}H = H(x^{n+1}) - H(x^{n})$
Harten-Lax-van Leer discrete gradient provides an explicit formula using path integrals in state space

Key Takeaway: Standard time integrators (Euler, RK4) violate conservation laws at the discrete level, leading to spurious energy drift. Geometric integrators constructed via the discrete gradient method exactly preserve energy at every timestep, dramatically improving long-time accuracy for Hamiltonian systems. This principle extends to learning time integrators from data: by parameterizing both $S(x)$ and $H(x)$ with neural networks and using discrete gradients, we can discover novel integration schemes that respect physics.

Next lecture: We'll shift from Hamiltonian to Lagrangian mechanics, introducing functional derivatives and the principle of least action. This provides an optimization-centric view of dynamics and enables elegant treatment of constraints and symmetries via Noether's theorem.