I’m pleased to share a new paper with Siddharth Mitra and Andre Wibisono (both at Yale), titled:
This work develops a framework for tail-sensitive convergence analysis of unadjusted Hamiltonian Monte Carlo (HMC) in Kullback–Leibler (KL) and Rényi divergence. These divergences measure relative density mismatch and therefore govern Metropolis acceptance behavior: small errors in the tails—where the target density is tiny—can dominate acceptance probabilities, even when total variation distance is small.
Why tail-sensitive divergences matter
Most theoretical analyses of Markov chain mixing time focus on metrics such as total variation distance or Wasserstein distance. While these are useful in many settings, they do not directly control relative density mismatch in the tails of the target distribution. For Metropolis-adjusted Markov chains, including Metropolis-adjusted HMC, acceptance probabilities depend on ratios of densities rather than additive differences in probability mass.
As a result, good performance requires controlling tail behavior. Divergences such as KL and Rényi, which measure multiplicative discrepancies between densities, are therefore the natural metrics in this context.
What we prove
Our main results provide a framework for lifting Wasserstein convergence guarantees (often easier to establish) into KL and Rényi convergence bounds for unadjusted HMC. This is achieved by introducing and analyzing one-shot couplings that establish a regularization property of the unadjusted HMC transition kernel.
Concretely, we show that:
-
A single step of unadjusted HMC has a smoothing effect in strong divergences, transforming even rough initial distributions into ones that are closer in KL or Rényi divergence to the invariant distribution.
-
From this regularization, we derive mixing-time bounds and asymptotic bias estimates for unadjusted HMC in both KL and Rényi divergence.
-
These bounds quantify relative density mismatch and yield principled warm-start guarantees for subsequent Metropolis-adjusted chains.
A key ingredient in the analysis is the use of one-shot couplings. When combined with existing Wasserstein contractivity results (for example, under strong log-concavity assumptions), this yields end-to-end convergence estimates in divergences that directly govern algorithmic performance.
Practical implications
Warm starts for adjusted samplers.
A primary application of our results is the generation of rigorous warm starts for Metropolis-adjusted Markov chains. In practice, it is often difficult to initialize directly from a warm-start initial distribution. Unadjusted HMC can instead be run to reach a distribution that is close in KL or Rényi divergence, after which a Metropolis-adjusted chain can be executed efficiently with good acceptance rates.
Discretization bias.
By working explicitly in KL and Rényi divergence, our results clarify how discretization bias (the discrepancy between the invariant distribution of an unadjusted sampler and the true target) affects strong notions of convergence. This provides a more complete picture of how unadjusted integrators behave in practice, beyond what is visible in weaker norms.
Broader relevance
Tail-sensitive divergences also play an important role in:
-
information contraction along Markov chains,
-
high-dimensional Bayesian inference,
-
Rényi-divergence-based analyses in differential privacy.
On the analysis
Achieving these results required a delicate probabilistic analysis. It is fair to say that this project became a technical tour de force. Siddharth Mitra took the lead on the key probabilistic ideas, developing new arguments that substantially extend earlier total-variation convergence results for unadjusted HMC. The shift from additive to tail-sensitive, multiplicative control was challenging, but ultimately one of the most rewarding aspects of the project.