Lecture 13: Quasi-Optimality, Error Estimation, and Lax-Milgram Theory

Topics Covered: Quasi-optimality in energy norm, Error estimation in $L^2$ norm via duality argument, Interpolation theory and approximation bounds, Lax-Milgram theory for general elliptic problems

0. Overview

In the previous lecture, we introduced the **Galerkin discretization** of the Poisson problem and established Galerkin orthogonality. Today, we take a major step toward understanding **why finite element methods work so well**: we'll prove rigorous error estimates that show how quickly FEM solutions converge to the true solution as we refine our mesh. These aren't just numerical observations—they're mathematical guarantees.

We begin by generalizing our Galerkin formulation to **abstract bilinear forms**, which will allow us to handle a much broader class of PDEs. The key concept is the **energy norm** induced by the bilinear form, which naturally connects the Galerkin method to the Rayleigh-Ritz energy minimization principle we saw before. We'll prove **quasi-optimality**: the FEM solution is the best approximation in the chosen function space, measured in the energy norm.

But what about the $L^2$ norm, the most natural measure of error? Through an elegant **duality argument** (sometimes called Aubin-Nitsche trick), we'll show that $L^2$ error converges **twice as fast** as energy error—a phenomenon called **superconvergence**. The proof hinges on constructing an auxiliary problem and leveraging interpolation theory. Finally, we'll abstract these ideas through **Lax-Milgram theory**, which provides a unified framework for proving existence, uniqueness, and stability for any elliptic bilinear form. This theory is the foundation for learning both function spaces $V_h$ and operators $a(u,v)$ in physics-informed machine learning.

1. Bilinear Forms and Energy Norms

1.1 Generalization to Abstract Bilinear Forms

We will now consider this as an example of a **general bilinear form** $a(\cdot, \cdot)$:

Definition: A bilinear form $a: V \times V \to \mathbb{R}$ satisfies:

$$\begin{aligned} a(\alpha u_1 + \beta u_2, \gamma v_1 + \delta v_2) &= \alpha \gamma a(u_1, v_1) + \alpha \delta a(u_1, v_2) \\ &\quad + \beta \gamma a(u_2, v_1) + \beta \delta a(u_2, v_2) \end{aligned}$$

(linear in both arguments)

1.2 Energy Norm and Cauchy-Schwarz

For bilinear forms, we're interested in those that **generate an energy norm** with a Cauchy-Schwarz inequality:

Recall from last class: This is precisely the connection between Galerkin and Rayleigh-Ritz:

$$\text{Solving } a(u,v) = (f,v) \quad \Leftrightarrow \quad \min_{v \in V_h} \|v\|_E^2$$

1.3 Galerkin Orthogonality (Recap)

This is quasi-optimality: The FEM error is bounded by the best approximation error in $V_h$.

2. Error in $L^2$ Norm: The Duality Argument

2.1 From Energy Norm to $L^2$ Norm

We know that $(G)$ naturally minimizes error in the **energy norm**. What about the **$L^2$ norm**?

We'll show: $L^2$ error is smaller than energy error, following a **duality argument** (Aubin-Nitsche trick).

2.2 The Auxiliary Problem

2.3 The Duality Argument

$$\begin{aligned} \|u - u_h\|^2 &= (u - u_h, u - u_h) \\ &= (u - u_h, -w'') \end{aligned}$$

$$\begin{aligned} &= (u' - u_h', w') + \underbrace{[w'(0)(u - u_h)(0) - w'(1)(u - u_h)(1)]}_{= 0 \text{ by BCs}} \\ &= a(u - u_h, w) \end{aligned}$$

$$\|u - u_h\| \leq \frac{\|u - u_h\|_E \|w - v\|_E}{\|u - u_h\|} = \frac{\|u - u_h\|_E \|w - v\|_E}{\|-w''\|}$$

$$\boxed{\|u - u_h\| \leq \|u - u_h\|_E \inf_{v \in V_h} \frac{\|w - v\|_E}{\|w''\|}}$$

2.4 The Key Approximation Property

Everything comes down to showing that $V_h$ can approximate $w$ well enough!

3. Interpolation Theory

3.1 Setup

$$V_h = \{v \in C^0[0,1] : v|_{[x_{i-1}, x_i]} \text{ is a linear polynomial}, v(0) = 0\}$$

$$\phi_i(x_j) = \delta_{ij} = \begin{cases} 1 & i = j \\ 0 & i \neq j \end{cases}$$

3.2 The Interpolant

$$\begin{aligned} \pi u(x_i) &= \sum_j \widehat{\pi u}_j \phi_j(x_i) = u(x_i) \\ &= \sum_j \widehat{\pi u}_j \delta_{ij} \\ &= \widehat{\pi u}_i \end{aligned}$$

3.3 The Interpolation Error Theorem

Theorem: For all $u \in V$ with $u'' \in L^2$:

$$\boxed{\|u - \pi u\|_E \leq Ch \|u''\|}$$

where $C$ is independent of $h$ and $u$.

This is the key approximation result!

4. Proof of Interpolation Error Bound

4.1 Element-by-Element Analysis

$$\int_{x_{j-1}}^{x_j} (u - \pi u)'^2 \, dx \leq C(x_j - x_{j-1})^2 \int_{x_{j-1}}^{x_j} u''^2 \, dx$$

Let $e = u - \pi u$ be the error. Note that $u'' = e''$ since $\pi u$ is piecewise linear ($(\pi u)'' = 0$ on each element).

4.2 Change of Variables

$$\int_{x_{j-1}}^{x_j} e'^2 \, dx \leq C(x_j - x_{j-1})^2 \int_{x_{j-1}}^{x_j} e''^2 \, dx$$

4.3 Using Rolle's Theorem

Rolle's Theorem: If $e$ is continuous on $[a,b]$ and $e(a) = e(b)$, then there exists at least one point $c \in [a,b]$ such that $e'(c) = 0$.

Since $e(0) = e(1) = 0$ (interpolant matches at endpoints), there exists $\xi \in [0,1]$ such that:

4.4 The Key Estimate

$$\begin{aligned} |e'(y)|^2 &= \left|\int_\xi^y 1 \cdot e'' \, dx\right|^2 \\ &\leq \left(\int_\xi^y 1^2 \, dx\right) \left(\int_\xi^y (e'')^2 \, dx\right) \\ &= |y - \xi| \int_\xi^y (e'')^2 \, dx \\ &\leq |y - \xi| \int_0^1 (e'')^2 \, dx \end{aligned}$$

4.5 Integration Over the Element

$$\int_0^1 (e'(y))^2 \, dy \leq \int_0^1 |y - \xi| \, dy \cdot \int_0^1 (e'')^2 \, dx$$

And we're done! Taking $C = 1/2$ and summing over all elements completes the proof. $\quad \checkmark$

5. Convergence Rates Summary

5.1 Main Results

(1) Energy norm error (quasi-optimality):

$$\|u - u_h\|_E \leq \|u - v\|_E \quad \forall v \in V_h$$

Picking $v = \pi u \in V_h$:

$$\boxed{\|u - u_h\|_E \leq C_1 h \|u''\| = C_1 h \|f\|}$$

(2) $L^2$ norm error (superconvergence via duality):

$$\boxed{\|u - u_h\| \leq C_2 h^2 \|u''\| = C_2 h^2 \|f\|}$$

Key observation: $L^2$ error converges **twice as fast** (order $h^2$) compared to energy error (order $h$). This is **superconvergence**.

5.2 What's Up Next

Now that we understand classical FEM error analysis, we can extend to machine learning:

First, an abstraction of what we learned today, so it's clear this isn't just something special about Poisson...

6. Lax-Milgram Theory

6.1 General Framework

There isn't anything special about Poisson—the same process holds for any elliptic bilinear form.

6.2 Lax-Milgram Theorem

Theorem: If $L: V \to \mathbb{R}$ is a bounded linear functional with:

$$|L(v)| \leq \Lambda \|v\|_V$$

Then:

(1) Existence & Uniqueness: The problem

$$a(u,v) = L(v) \quad \forall v \in V$$

has a **unique solution** $u \in V$.

(2) Stability estimate:

$$\boxed{\|u\|_V \leq \frac{\Lambda}{\alpha}}$$

(3) Equivalence to energy minimization:

$$u = \arg\min_{v \in V} F(v)$$

where $F(v) = \frac{1}{2}a(v,v) - L(v)$.

6.3 Proof Sketch

7. The General FEM Playbook

8. Summary

This lecture covered:

Bilinear forms and energy norms as abstractions of Galerkin methods
Quasi-optimality in energy norm: FEM gives best approximation in $V_h$
Duality argument (Aubin-Nitsche trick) for $L^2$ error estimates
Interpolation theory: constructing $\pi u$ and bounding approximation error
Convergence rates: $\mathcal{O}(h)$ in energy norm, $\mathcal{O}(h^2)$ in $L^2$ norm
Lax-Milgram theory: general framework for elliptic problems
FEM playbook: systematic approach to error analysis for any elliptic operator

Key Takeaway: The power of finite element methods lies not just in their flexibility for complex geometries, but in their **provable optimal approximation properties**. The Lax-Milgram framework shows that for any elliptic bilinear form satisfying continuity and coercivity, we obtain existence, uniqueness, stability, and convergence with quantifiable error bounds. The duality argument reveals superconvergence in $L^2$ norm—convergence twice as fast as in the energy norm. This theoretical foundation is essential for physics-informed machine learning: when we learn function spaces $V_h$ or operators $a(u,v)$, we need to ensure these learned components preserve the mathematical structure (symmetry, coercivity, continuity) that guarantees convergence.