Understanding Itô's lemma through numerical simulation

Most Itô's lemma explanations rely on intuitive hand-waving or focus only on expected values, without demonstrating the underlying mathematical mechanics. Even ChatGPT, which has been good at explaining complex textbook concepts, struggles here, presumably because its training data contains these same problematic explanations.

That is because understanding rigorous proofs requires background in real analysis, martingale theory, stochastic integrals, and measure theory that many STEM graduates (myself included) encounter only in specialized graduate coursework, if at all.

This post takes an empirical approach instead: numerical simulation using Monte Carlo methods to demonstrate Itô's lemma from quadratic variation through geometric Brownian motion. When rigorous proofs are inaccessible, seeing the formula "work" in practice provides the next best kind of confidence in its validity. The complete code for all simulations shown in this post is available on GitHub.

Quadratic variation

Let's start with the fundamental concept that makes Itô Calculus different from ordinary calculus: quadratic variation. In ordinary calculus, the infinitesimal changes ($dx$) squared are so small they vanish as we take finer and finer partitions. But with Brownian motion, the sum of squared increments converges to the time interval itself.

To demonstrate this, I've written some code that simulates Brownian motion and calculates its quadratic variation:

import numpy as np
def generate_quadratic_variation(T=1.0, N_steps=1000):
    """
    Generate a single path of Brownian motion and calculate its quadratic variation

    Parameters:
    - T: Time horizon
    - N_steps: Number of points in partition

    Returns:
    - dt: Time step size
    - qv: Quadratic variation
    """
    dt = T / N_steps

    # Generate a single Brownian motion path
    dB = np.random.normal(0, np.sqrt(dt), N_steps)

    # Calculate quadratic variation
    qv = np.sum(dB**2)

    return dt, qv

Running this code with different partition sizes, we see a pattern:

N	dt	QV	Error
100	1.00e-02	0.827306	0.172694
1000	1.00e-03	0.967506	0.032494
10000	1.00e-04	1.004778	0.004778
100000	1.00e-05	1.001046	0.001046

As we increase the number of points in our partition, the quadratic variation converges to 1.0 (our time horizon T). This isn't a coincidence but a fundamental property of Brownian motion.

This is just a single path. To see why the error shrinks as partition size goes down, let's look at the statistical properties across multiple simulations over 100 trials (note that #trials is different from N, the path size):

N	dt	mean(QV)	std(QV)
100	1.00e-02	0.999603	0.143813
1000	1.00e-03	1.001703	0.043909
10000	1.00e-04	0.999985	0.014373
100000	1.00e-05	1.000197	0.004347

The standard deviation shrinks with the square root of the number of partition points as Central Limit Theorem predicts. This property of Brownian motion - that its quadratic variation equals the time interval - is what makes Itô Calculus necessary and different from ordinary calculus.

Itô's lemma

Now that we've established the behavior of quadratic variation, we can explore Itô's Lemma itself. This lemma tells us how to compute the differential of a function of a stochastic process.

In ordinary calculus, if we have a function $f(t,X)$ and $X$ changes with time according to some process, we use the chain rule:

$$df = (∂f/∂t)dt + (∂f/∂x)dx$$

But when x follows a stochastic process like Brownian motion, we need an extra term:

$$df = (∂f/∂t)dt + (∂f/∂x)dx + (1/2)(∂^2f/∂x^2)(dx)^2$$

That last term is the Itô correction, and it appears because the quadratic variation of Brownian motion doesn't vanish as dt approaches zero.

Numerical verification methodology

Here we compare two discretization schemes for stochastic differential equations:

Itô-corrected scheme: This approach uses the complete Itô formula to derive the proper discretized increments, including the correction term: $(1/2)(∂^2f/∂x^2)(dx)^2$. We accumulate these increments step by step along the path to approximate the solution.
Naive scheme: This approach applies ordinary calculus rules that would be correct for smooth processes, omitting the Itô correction term. It is "naive" because it incorrectly assumes standard calculus chain rule applies to Brownian motion.

We test both discretization schemes against the true analytical value of $f(t,X)$. A correct scheme, i.e. the correct formulation of $d_f$, should satisfy:

$$\int_{0}^{1} df = f(1) - f(0)$$

Verification 1: quadratic function

Now verifying this numerically with a simple function $f(t,X) = X^2 + \sin(t)$, where $X=B$ (Brownian motion).

Recall the formulas for the differential:

Itô's lemma: $$df = \frac{\partial f}{\partial t}dt + \frac{\partial f}{\partial B}dB + \frac{1}{2}\frac{\partial^2 f}{\partial B^2}dt = \cos(t)dt + 2B \, dB + dt$$

Naive differential: $$df = \frac{\partial f}{\partial t}dt + \frac{\partial f}{\partial B}dB = \cos(t)dt + 2B \, dB$$

First, we have to partition the time period into discrete time steps. Then for each time step:

Generate $X$, then calculate $f(t, X)$
Calculate $\Delta f$ using Itô's lemma
Calculate naive $\Delta f$ using ordinary calculus

Afterward we integrate the naive and Itô differentials to get the final result and check how close they are to f at the final time step.

Below is the code generating data for a single path:

import numpy as np

def demonstrate_ito_lemma(T, N_steps):
    """
    Demonstrate Itô's Lemma for f(t,B_t) = B_t² + sin(t)
    where B_t is standard Brownian motion
    """

    dt = T/N_steps
    t = np.linspace(0, T, N_steps+1)

    # Pre-calculate trig functions
    cos_t = np.cos(t[:-1])  # We need cos(t) only for N_steps steps
    sin_t = np.sin(t)       # We need sin(t) for N_steps+1 points

    # Generate Brownian path
    dB = np.random.normal(0, np.sqrt(dt), N_steps)

    # Note: Arrays B and f have size N_steps+1 because they include 
    # the initial state at t=0 while dB has size N_steps (the 
    # increments between consecutive time points)
    B = np.cumsum(np.concatenate([[0], dB]))

    # Initialize paths
    f_ito = np.zeros(N_steps+1)
    f_naive = np.zeros(N_steps+1)
    f_true = np.zeros(N_steps+1)

    # Initial values
    f_true[0] = B[0]**2 + sin_t[0]
    f_ito[0] = f_true[0]
    f_naive[0] = f_true[0]

    # True value (exact solution)
    f_true = B**2 + sin_t

    # Calculate increments
    increments_ito = cos_t*dt + 2*B[:-1]*dB + dt
    increments_naive = cos_t*dt + 2*B[:-1]*dB

    # Use cumsum to evolve paths
    f_ito[1:] = f_ito[0] + np.cumsum(increments_ito)
    f_naive[1:] = f_naive[0] + np.cumsum(increments_naive)

    return t, B, f_true, f_ito, f_naive

Statistics for 100 trials with 100 thousand time steps in time period $[0,1]$:

Method	Mean	Std	Mean Abs Error
True	1.960321	1.673978	N/A
Integration (Itô)	1.960386	1.673497	0.003710
Integration (naive)	0.960386	1.673497	0.999935

The naive approach (ignoring the Itô correction) has a mean absolute error of nearly 1.0, while the Itô approach has an error of only 0.0037. This difference is caused by the missing Itô correction term $(1/2)(∂^2f/∂x^2)(dx)^2$ in the Naive approach, which has a theoretical value of 1.0 in this case.

Verification 2: geometric Brownian motion

Finally, let's apply Itô's Lemma to something with practical importance: geometric Brownian motion (GBM), which is widely used to model stock prices in finance.

In the log space, GBM follows:

Itô's lemma (correct): $$d(\log S) = \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma dB$$

Naive approach (incorrect): $$d(\log S) = \mu dt + \sigma dB$$

The difference is the $-\sigma^2/2$ term, which is the Itô correction. The naive approach applies ordinary calculus rules and misses this correction term, leading to systematic bias in the model. I've left the code for brevity, but it is similar to the previous case, where the primary difference is the increment calculation.

Let's verify this numerically for $μ=0.1, σ=0.3$ in T.

Method	Mean	Std	Mean Abs Error
True (log space)	0.058349	0.302257	N/A
Integration (Itô)	0.058349	0.302257	0.000000
Integration (naive)	0.103349	0.302257	0.045000

The naive approach systematically overestimates the mean by 77% (0.103349 vs. 0.058349). This 0.045 bias matches the theoretical Itô correction of $σ^2/2 = 0.3²/2 = 0.045$.

In the original space (not log space), the difference becomes even more visible:

Method	Mean	Std	Mean Abs Error
True	1.108515	0.328547	N/A
Integration (Itô)	1.108515	0.328547	0.000000
Integration (naive)	1.159537	0.343669	0.051023

Conclusion

We've verified several key properties of stochastic calculus through these numerical simulations:

The quadratic variation of Brownian motion over a time interval T equals T
Itô's Lemma correctly accounts for this non-vanishing quadratic variation
The Itô correction term is crucial for unbiased modeling of processes like Geometric Brownian Motion

These numerical simulations verify key stochastic calculus properties without requiring measure theory. The empirical approach demonstrates why Itô's correction term is necessary for unbiased modeling, which is particularly valuable for practitioners who need to apply these concepts without absorbing the full mathematical framework.