Date: February 5, 2025
Topics: Constrained optimization, polynomial reproduction, moving least squares, Schur complement method
This lecture establishes the mathematical foundations for learning finite difference (FD) stencils from data while enforcing physical and mathematical constraints. We introduce constrained optimization as a central tool for ensuring that learned numerical methods satisfy desired properties such as conservation laws, consistency, and stability.
The lecture connects three critical ideas: (1) equality-constrained quadratic programming using Lagrange multipliers and the Schur complement, (2) strategies for training FD stencils from noisy data (penalty methods, reduced-space optimization, and Lagrange multiplier methods), and (3) polynomial reproduction as a systematic framework for constructing high-quality stencils that exactly replicate polynomial solutions. The polynomial reproduction framework builds on moving least squares (MLS) and provides a dual optimization perspective: we seek minimal-norm stencil weights that satisfy reproduction constraints.
This material bridges classical numerical analysis (consistency, stability, convergence) with modern machine learning (data-driven stencil discovery, constrained neural networks). Understanding these techniques is essential for designing physics-informed neural networks and learning numerical schemes that respect fundamental physical principles like energy conservation and translation invariance.
Our main tool will be constrained optimization:
$$\begin{aligned} \min_{\theta} \quad & F(x; \theta) \\ \text{s.t.} \quad & G(x; \theta) = 0 \end{aligned}$$Typical use cases:
Consider the canonical problem:
$$\begin{aligned} \min_{x} \quad & \frac{1}{2} x^\top A x + b^\top x + c \\ \text{s.t.} \quad & Bx = d \end{aligned}$$Note: For nonlinear problems, replace \(A\) with the Hessian and \(B\) with the Jacobian.
Solution via Lagrange multipliers:
To solve, introduce Lagrange multiplier \(\lambda\) and form the Lagrangian:
$$\mathcal{L}(x, \lambda) = \frac{1}{2} x^\top A x + b^\top x + c + \lambda^\top (Bx - d)$$The Karush-Kuhn-Tucker (KKT) conditions state that the minimizer satisfies:
$$\begin{aligned} \nabla_x \mathcal{L} &= 0 \\ \nabla_\lambda \mathcal{L} &= 0 \end{aligned}$$Explicitly:
$$\begin{aligned} \nabla_\lambda \mathcal{L} &= Bx - d = 0 \\ \nabla_x \mathcal{L} &= Ax + b + B^\top \lambda = 0 \end{aligned}$$This yields the saddle point problem:
$$\begin{pmatrix} A & B^\top \\ B & 0 \end{pmatrix} \begin{pmatrix} x \\ \lambda \end{pmatrix} = \begin{pmatrix} -b \\ d \end{pmatrix} \tag{KKT}$$Step 1: Multiply the top row of (KKT) by \(BA^{-1}\):
$$BA^{-1}Ax + BA^{-1}B^\top \lambda = -BA^{-1}b$$This simplifies to:
$$Bx + BA^{-1}B^\top \lambda = -BA^{-1}b$$Step 2: Extract the constraint \(Bx = d\) from the second row of (KKT) and substitute:
$$BA^{-1}B^\top \lambda = -d - BA^{-1}b$$Step 3: Define the Schur complement \(S = BA^{-1}B^\top\) and solve:
$$\begin{aligned} \lambda &= -S^{-1}(d + BA^{-1}b) \\ x &= A^{-1}(-B^\top \lambda - b) \end{aligned}$$Important: We will use this Schur complement formula repeatedly as a tool to enforce properties we desire as equality constraints in stencil learning.
Force matching (derivative matching):
In homework problems, we often match derivatives (also called force matching in molecular dynamics):
$$\mathcal{L} = \left\| \frac{\hat{u}_j^{n+1} - \hat{u}_j^n}{k} + \frac{1}{h^{|\alpha|}} \sum_{k=-\alpha}^{\beta} S_{j+k} u_{j+k}^n \right\|^2$$to approximate a differential operator \(D^\alpha\).
Problem: This approach is prone to overfitting for noisy realistic data.
Mitigation strategies:
Define the reconstruction loss and physics residual:
$$\begin{aligned} \mathcal{L}_R &= \frac{1}{2} \sum_{n,j} |u_j^n - \hat{u}_j^n|^2 \\ r_{n,j} &= u_j^{n+1} - u_j^n - k D_\theta^\alpha u_j^n \end{aligned}$$where \(D_\theta^\alpha\) is a parameterized stencil.
We pose the constrained fit to the solution:
$$\begin{aligned} \min_{u_j^n, \theta} \quad & \mathcal{L}_R \\ \text{s.t.} \quad & r_{n,j} = 0 \quad \forall n, j \end{aligned}$$Drawbacks:
Substitute the residual constraint back into \(\mathcal{L}_R\) to optimize directly over the space of FD solutions.
If \(u_j^n = Q(u_j^{n-1})\) is the (potentially nonlinear) update operator:
$$\mathcal{L} = \frac{1}{2} \sum_{n,j} \left| Q(u_j^{n-1}) \cdots Q(u_j^0) u_j^0 - \hat{u}_j^n \right|^2$$Challenge: For nonlinear/implicit schemes, need to backpropagate through a linear solve or Newton/iterative nonlinear solve.
Process: For iterations \(1, \ldots, \#\):
Advantage: Backpropagation doesn't need to go through forward + adjoint solve; best for nonlinear/implicit schemes.
Note: For linear problems, methods (2) and (3) are equivalent.
Important considerations:
Consider points \(\bar{X} = \{x_1, \ldots, x_N\} \subseteq \Omega \subseteq \mathbb{R}^d\).
Fill distance (characterizes scattered data):
$$h_{\bar{X}, \Omega} = \sup_{x \in \Omega} \min_{j \leq N} \|x - x_j\|_2$$Interpretation: Radius of the biggest ball in \(\Omega\) without any data points in it.
Separation distance:
$$q_{\bar{X}} = \frac{1}{2} \min_{i \neq j} \|x_i - x_j\|_2$$For our FD stencils so far:
$$h_{\bar{X}, \Omega} = h, \quad q_{\bar{X}} = h \quad \Rightarrow c_{qu} = 1$$(uniform grid)
Idea: Introduce moving least squares to generalize FDM for scattered data.
For \(x \in \Omega\), define \(S_{f,\bar{X}}(x) = p^*(x)\) where:
$$p^* = \arg\min_{p \in \Pi_m(\mathbb{R}^d)} \sum_{i=1}^N [f(x_i) - p(x_i)]^2 \omega(x, x_i)$$where \(\Pi_m(\mathbb{R}^d)\) denotes polynomials of degree \(\leq m\) in \(\mathbb{R}^d\).
We'll assume \(\omega(x,y) = \Phi_\delta(x-y)\) (radial basis function kernel).
Step 1: Take derivative with respect to \(c(x)\) and set to zero:
$$0 = \sum_{i=1}^N (f(x_i) - c(x)) \Phi_\delta(x - x_i)$$Step 2: Solve for \(c(x)\):
$$c(x) = \frac{\sum_i \Phi_\delta(x - x_i) f(x_i)}{\sum_i \Phi_\delta(x - x_i)}$$This recovers kernel density estimation, also known as the Shepard interpolant.
Remark: We can thus search for a minimal norm stencil that has reproduction properties.
Notation (linear algebra heavy):
Dimensions:
Step 1: Expand the objective:
$$\mathcal{L} = \frac{1}{2} c^\top P W P^\top c - c^\top P W u + \text{(terms independent of } c\text{)}$$Step 2: Recall calculus identities:
$$\frac{\partial}{\partial x} \frac{1}{2} x^\top A x = \frac{1}{2}(A + A^\top)x, \quad \frac{\partial}{\partial x} y^\top x = y$$Step 3: Take gradient and set to zero:
$$\frac{\partial \mathcal{L}}{\partial c} = PWP^\top c - PWu = 0$$ $$\Rightarrow c = (PWP^\top)^{-1} PWu$$Step 4: The MLS approximation is:
$$\begin{aligned} S_{f,\bar{X}}(x) &= c \cdot P(x) \\ &= P(x)^\top (PWP^\top)^{-1} PWu \end{aligned}$$Interpretation: Find minimal weighted norm stencil weights \(a(x)\) that reproduce all polynomials up to degree \(m\).
The KKT system is:
$$\begin{pmatrix} W^{-1} & P^\top \\ P & 0 \end{pmatrix} \begin{pmatrix} a(x) \\ \lambda \end{pmatrix} = \begin{pmatrix} 0 \\ P(x) \end{pmatrix}$$Step 1: Identify Schur complement:
$$S = P W P^\top$$Step 2: Solve for \(\lambda\):
$$\lambda = -S^{-1} P(x) = -(PWP^\top)^{-1} P(x)$$Step 3: Solve for \(a(x)\):
$$\begin{aligned} a(x) &= -W P^\top \lambda \\ &= W P^\top S^{-1} P(x) \end{aligned}$$Step 4: The MLS approximation is:
$$\begin{aligned} S_{f,\bar{X}}(x) &= u^\top a(x) \\ &= u^\top W P^\top S^{-1} P(x) \\ &= P(x)^\top (PWP^\top)^{-1} PWu \end{aligned}$$This matches the primal formulation! ✓
Question: Is the polynomial reproduction set non-empty? (i.e., do solutions always exist?)
Suppose \(\Omega \subseteq \mathbb{R}^d\) is compact and satisfies an interior cone condition with angle \(\theta \in (0, \pi/2)\) and radius \(r\). Then there exists stencil weights \(a_j^*(x)\) for any \(x\) such that:
where \(\tilde{c}_1\) and \(\tilde{c}_2\) are independent of \(h_{\bar{X}, \Omega}\) and can be explicitly derived.
Implications:
This lecture established three key frameworks for learning constrained numerical methods:
Key Takeaway: Constrained optimization provides the mathematical machinery to design physics-informed learning algorithms that respect fundamental properties like conservation laws and polynomial consistency. The primal-dual formulation reveals that seeking minimal-norm stencils with polynomial reproduction is equivalent to moving least squares approximation, unifying classical numerical analysis with modern data-driven methods.
Next steps: Apply these principles to construct energy-conserving time integrators using Hamiltonian mechanics and symplectic structure (Lecture 7).