Date: February 17, 2025
Topics: Functional derivatives, Euler-Lagrange equations, principle of least action, Legendre transform, Noether's theorem
This lecture introduces Lagrangian mechanics, the third pillar of classical mechanics alongside Newtonian and Hamiltonian formulations. While all three describe the same physics, they emphasize different aspects and offer complementary tools for analysis and computation.
The Lagrangian approach centers on optimization: physical systems evolve along paths that extremize the action functional $S[u] = \int_0^T L\, dt$, where $L = T - V$ is the difference between kinetic and potential energy. This principle of least action transforms mechanics into a variational problem, making it natural to enforce constraints (via Lagrange multipliers) and identify conserved quantities (via Noether's theorem).
To work with action functionals, we need calculus in infinite-dimensional function spaces. We introduce two approaches: the Fréchet derivative (abstract, general) and the Gâteaux derivative (practical, directional). These lead to the Euler-Lagrange equations, which systematically convert variational principles into differential equations. We then connect Lagrangian and Hamiltonian mechanics via the Legendre transform, showing they are dual perspectives on the same physics.
Finally, Noether's theorem reveals the profound connection between symmetries and conservation laws: every continuous symmetry of the Lagrangian corresponds to a conserved quantity. This explains why momentum, energy, and angular momentum are conserved and guides the design of symmetry-respecting neural network architectures.
Pedagogical context: Understanding the optimization-centric Lagrangian viewpoint is essential for modern physics-informed machine learning. Many recent methods (Physics-Informed Neural Networks, Neural ODEs, Hamiltonian Neural Networks) implicitly use variational principles and functional derivatives to encode physical constraints.
Newtonian Mechanics: $F = m\ddot{x}$
Hamiltonian Mechanics: $\dot{x} = J\nabla H$
Lagrangian Mechanics: $x = \arg\min_{\tilde{x}} S(\tilde{x})$
Lagrangian density:
$$L = \underbrace{T}_{\text{kinetic}} - \underbrace{V}_{\text{potential}}$$For continuous systems (fields):
$$L = \int_\Omega \mathcal{L}[u]\, dx$$where $\mathcal{L}$ is the Lagrangian density.
Action:
$$S = \int_0^T L\, dt = \int_0^T \int_\Omega \mathcal{L}[u]\, dx\, dt$$Informally, the path that a system will evolve through for fixed initial and final states will be the $u(t)$ which minimizes the action:
$$u = \arg\min_{\hat{u}} \int_0^T \int_\Omega \mathcal{L}[u]\, dx\, dt$$Challenge: To make sense of this, we need to minimize expressions like $S$. In standard calculus, we take a derivative of a function $f(x)$, set it to zero to find extremal points. Here though:
A functional maps function spaces to real numbers:
$$S: V \to \mathbb{R}, \quad V \text{ a space of functions}$$i.e., the input space is now infinite-dimensional.
Let $V$ be a function space, $F: V \to \mathbb{R}$. The differential $\delta F[u]$ is defined as follows: for $\delta u \in V$,
$$F[u + \delta u] - F[u] = \delta F[u] + \varepsilon \|\delta u\|$$such that $\lim_{\varepsilon \to 0} \varepsilon \|\delta u\| = 0$.
Interpretation: This is analogous to the total derivative in finite dimensions, but the norm $\|\cdot\|$ is now in a function space (e.g., $L^2$ norm).
Idea: Introduce a scalar parameter and use the vector calculus definition of a directional derivative.
$$\begin{aligned} \delta F[u, \delta u] &= \lim_{\varepsilon \to 0} \frac{F[u + \varepsilon \delta u] - F[u]}{\varepsilon} \\ &= \left.\frac{d}{d\varepsilon} F[u + \varepsilon \delta u]\right|_{\varepsilon = 0} \end{aligned}$$This is analogous to the Gâteaux derivative or directional derivative in infinite dimensions.
Let $K = \frac{1}{2}\int_\Omega \rho u^2\, dx$. Take variation with respect to $u$:
$$\begin{aligned} (\delta_u K, \delta u) &= \lim_{\varepsilon \to 0} \frac{1}{\varepsilon} \int_\Omega \left[\frac{\rho(u + \varepsilon \delta u)^2}{2} - \frac{\rho u^2}{2}\right] dx \\ &= \lim_{\varepsilon \to 0} \frac{1}{\varepsilon} \int_\Omega \rho\left[\frac{u^2}{2} + \varepsilon u \delta u + \frac{\varepsilon^2}{2}(\delta u)^2 - \frac{u^2}{2}\right] dx \\ &= \lim_{\varepsilon \to 0} \int_\Omega \left[\rho u \delta u + \frac{\varepsilon}{2}\rho (\delta u)^2\right] dx \\ &= \int_\Omega \rho u \delta u\, dx \\ &= (\rho u, \delta u) \end{aligned}$$And so we identify:
$$\delta_u K = \rho u \quad \text{(the variation)}$$Denote $\frac{\delta}{\delta u} F[u] = \delta_u F[u]$.
Linearity:
$$\frac{\delta}{\delta u}(\lambda F[u] + \mu G[u]) = \lambda \delta_u F[u] + \mu \delta_u G[u]$$Product rule:
$$\frac{\delta}{\delta u}(F[u] G[u]) = \delta_u F \cdot G + F \cdot \delta_u G$$Chain rule:
$$\frac{\delta}{\delta u}(F[g(u)]) = \frac{\delta F[g(u)]}{\delta g(u)} \cdot \frac{dg(u)}{du}$$Assume a generic functional density:
$$F[u] = \int_\Omega f(x, u, \nabla u)\, dx$$Euler-Lagrange equation:
$$\delta_u F = 0 \quad \Rightarrow \quad \frac{\partial f}{\partial u} - \nabla \cdot \frac{\partial f}{\partial \nabla u} = 0$$Practical note: We can just apply this as a formula and skip all the limits.
For different forms of $f$, we get different equations. For example:
$$F[u] = \int_0^1 f(x, u, \partial_x u, \partial_{xx} u)\, dx$$Then:
$$\delta_u F = \frac{\partial f}{\partial u} - \frac{d}{dx}\frac{\partial f}{\partial(\partial_x u)} + \frac{d^2}{dx^2}\frac{\partial f}{\partial(\partial_{xx} u)}$$For the action functional:
$$S[u] = \int_a^b \int_\Omega \mathcal{L}(x, u, \dot{u})\, dx\, dt$$The path which admits an extremal value of $S$ satisfies:
$$\delta_u S[u] = 0$$Applying the Euler-Lagrange equations:
$$\frac{\partial \mathcal{L}}{\partial u} - \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot{u}} = 0$$Kinetic energy:
$$T = \frac{1}{2}m v^2 = \frac{1}{2}m L^2 \dot{\theta}^2$$Potential energy:
$$\begin{aligned} U &= mgh \\ h &= L(1 - \cos\theta) \approx \frac{1}{2}L\theta^2 \quad \text{(small angle)} \end{aligned}$$Lagrangian:
$$L = \frac{1}{2}m L^2 \dot{\theta}^2 - \frac{1}{2}mgL\theta^2$$Euler-Lagrange equation:
$$\frac{d}{dt}(mL^2\dot{\theta}) = -mgL\theta$$ $$mL^2\ddot{\theta} = -mgL\theta \quad \Rightarrow \quad \ddot{\theta} = -\frac{g}{L}\theta$$Recovering the harmonic oscillator equation. ✓
A technique to switch between the Lagrangian and Hamiltonian descriptions.
Given a Lagrangian:
$$L(q_1, \ldots, q_N, \dot{q}_1, \ldots, \dot{q}_N)$$The Legendre transform induces the conjugate momenta $p_1, \ldots, p_N$:
$$\begin{cases} H(q_1, \ldots, q_N, p_1, \ldots, p_N) = \sum_i p_i \dot{q}_i - L(q_1, \ldots, q_N, \dot{q}_1, \ldots, \dot{q}_N) \\ p_i = \frac{\partial L}{\partial \dot{q}_i} \end{cases}$$Lagrangian: $L = \frac{1}{2}mL^2\dot{q}^2 - \frac{1}{2}mgLq^2$ (where $q = \theta$)
Step 1: Compute conjugate momentum:
$$p = \frac{\partial L}{\partial \dot{q}} = mL^2\dot{q} \quad \Rightarrow \quad \dot{q} = \frac{p}{mL^2}$$Step 2: Apply Legendre transform:
$$\begin{aligned} H &= p\dot{q} - L \\ &= \frac{p^2}{2mL^2} + \frac{mgLq^2}{2} \end{aligned}$$This recovers the Hamiltonian we derived earlier! ✓
Definition: If a constraint is an equality and a function of coordinates only (no $\dot{q}$'s), it is holonomic:
$$f_i(q, t) = 0$$We can augment the Lagrangian with Lagrange multipliers:
$$\hat{L}(q, \dot{q}, \lambda) = L(q, \dot{q}) - \sum_i \lambda_i f_i(q, t)$$where $\lambda$ is a field (in contrast to the KKT multipliers we considered previously, which were scalar).
Applying Euler-Lagrange to the augmented Lagrangian:
$$\begin{cases} \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_j}\right) = \frac{\partial L}{\partial q_j} + \lambda(t) \frac{\partial f_i}{\partial q_j} \\ f_i(q, t) = 0 \end{cases}$$Informal statement: Symmetries which leave the Lagrangian unchanged each have a corresponding conserved quantity.
Consider the transformations:
$$\begin{aligned} t &\to t' = t + \delta t \\ q &\to q' = q + \delta q \end{aligned}$$Assume $N$ such maps parameterized by small parameters $\varepsilon_r$:
$$\begin{aligned} \delta t &= \sum_r \varepsilon_r T_r \\ \delta q &= \sum_r \varepsilon_r Q_r \end{aligned}$$Noether's Theorem: If the Lagrangian is invariant under these transformations, then:
$$\left(\frac{\partial L}{\partial \dot{q}} \cdot \dot{q} - L\right) T_r - \frac{\partial L}{\partial \dot{q}} \cdot Q_r = \text{const.}$$are conserved quantities.
Transformation:
$$\begin{aligned} t &\to t + \delta t \\ N &= 1, \quad T = 1, \quad Q = 0 \end{aligned}$$Conserved quantity:
$$\frac{\partial L}{\partial \dot{q}} \cdot \dot{q} - L = \text{const.}$$This is the Hamiltonian (total energy)! Energy conservation follows from time-translation symmetry.
Transformation:
$$\begin{aligned} q &\to q + \delta q \\ N &= 1, \quad T = 0, \quad Q = 1 \end{aligned}$$Conserved quantity:
$$\frac{\partial L}{\partial \dot{q}} = \text{const.}$$This is the conjugate momentum! Momentum conservation follows from spatial translation symmetry.
Transformation: Rotation of $\delta\theta$ about a given axis $n$:
$$r \to r + \delta\theta\, n \times r$$ $$N = 1, \quad T = 0, \quad Q = n \times r$$Conserved quantity:
$$\begin{aligned} \text{const.} &= \frac{\partial L}{\partial \dot{q}} \cdot Q \\ &= p \cdot (n \times r) \\ &= n \cdot (r \times p) \\ &= n \cdot L_{\text{ang}} \end{aligned}$$where $L_{\text{ang}} = r \times p$ is angular momentum. Angular momentum conservation follows from rotational symmetry.
Design principle: When machine learning dynamics, we will design architectures which respect this set of symmetries.
Examples:
By building symmetries into the architecture, we:
This lecture established the Lagrangian perspective on mechanics and connected it to machine learning:
Key Takeaway: The Lagrangian formulation transforms mechanics into an optimization problem, making it natural to enforce constraints and identify conserved quantities. Noether's theorem explains why conservation laws exist and provides a blueprint for designing neural network architectures that respect fundamental symmetries. By building symmetries into model architectures, we guarantee conservation laws by construction and dramatically improve generalization.