Mixing of the No-U-Turn Sampler and the Geometry of Gaussian Concentration

The No-U-Turn Sampler (NUTS) is the go-to method for sampling in probabilistic programming languages like Stan, PyMC3, NIMBLE, Turing, and NumPyro. However, due to its recursive architecture, even proving its reversibility has been an ongoing challenge. For a concise proof based on interpreting NUTS as an auxiliary variable method, check out: this paper.

Today, I’m excited to share a new paper with Stefan Oberdörster, where we prove that the mixing time of NUTS scales as d^{1/4}, up to logarithmic factors, when initialized in the concentration region of the canonical Gaussian measure. This is the first mixing time guarantee for NUTS and the scaling is expected to be sharp. What’s even more interesting? This result doesn’t rely on any assumptions about path length, thanks to NUTS’ ability to adapt it locally.

But there’s more: we discovered a surprising sensitivity in NUTS to step size, which can create a bottleneck in mixing time. For certain problematic step sizes, NUTS can miss U-turns and fall into looping behavior, potentially leading to an infinite leapfrog path. Check out Figures 4(b) and 5(a) for a numerical demonstration, along with a proposed solution: randomizing the time grid in Figure 6.

For reproducibility: GitHub Repo

Key insight is that concentration of measure leads to:
• Uniformity in NUTS’ U-turn condition & orbit selection with a no-looping step size.
• Uniformity in NUTS’ index selection with step size O(d^{-1/4}).

The proof builds on our recently introduced coupling framework coupling framework, extending it to handle a more general class of accept/reject chains. NUTS is then interpreted as an accept/reject chain, where the accept chain shows more uniform behavior, making it easier to analyze.

We’ve also identified a list of open problems in the paper, and I’m looking forward to tackling some of them during my sabbatical year at the Flatiron Institute.