## Workshop on hypercontractivity and log-Sobolev inequalities in quantum information theory

A couple weeks ago I attended the workshop on “Hypercontractivity and Log Sobolev Inequalities in Quantum Information Theory” organized by Patrick Hayden, Christopher King, Ashley Montanaro and Mary Beth Ruskai at the Banff International Research Station (BIRS).

The pavilion in which workshop talks were held

BIRS holds 5-day workshops in mathematics and applications throughout the year, itself a small component of the Banff centre, which describes itself as the “largest arts and creativity incubator on the planet”. Sounds attractive? The centre appears to be remarkably successful, managing to attract not only the brightest mathematicians like us(!), but also a range of talented artists from the world all over, reputable international conferences, and more. The setting certainly doesn’t hurt (see the pictures — and, by the way, that view was from the center’s beautiful library!), nor does the Canadian hospitality — with hearty food provided daily in a cafeteria with impressive views on the surrounding mountains. It was my first visit and in spite of the somewhat coldish weather (“much warmer than usual”, as they say…you bet! When “usual” means -20 Celsius, it’s still not quite clear what improvement you’re getting). I enjoyed the place, which occupies some sort of in-between a seclusive retreat in the mountains (think Dagstuhl) and a bustling research centre (think the Simons Institute in Berkeley). Good balance.

Aerial view of the Banff center (in summer!)

The goal of this post is to give a brief introduction to the (first half of the) topic of the workshop, “Hypercontractivity and Log Sobolev Inequalities”. (I hope to talk a bit about the second half, “in Quantum Information theory”, in the next post.) Log-Sobolev inequalities, be they quantum or classical, meant next to nothing to me before attending — I had heard of both, and was somewhat familiar with the uses of hypercontractivity in Boolean function analysis, but log-Sobolev inequalities were much more mysterious. The workshop was successful in bringing different perspectives to bear on these inequalities, their relationship, and applications (videos available online!). For my own benefit I’ll write down some very basic facts on these inequalities, where they come from, and what is the motivation for studying them. What I write will undoubtedly appear extremely naive to even moderate experts, and for much more on the subject I highly recommend a survey by Diaconis and Saloff-Coste, the lecture notes by Guionnet and Zegarlinski, and the (quals-fueled) survey by Biswal. In particular I refer the reader to these resources for proofs of the unproven assertions made in this post (any errors introduced being mine of course), as well as appropriate credit and references for the results mentioned below.

Markov semi-groups. The fundamental objects underlying both the hypercontractive and log-Sobolev inequalities are a space of functions ${\mathcal{F}}$ (think continuous real-valued functions defined on a nice topological space, ${\mathbb{R}^d}$, or your favorite graph) together with, first a Markov semi-group ${(P_t)}$ defined on that space, and second, a measure ${\mu}$ on the same space that is invariant under ${P_t}$, i.e.

$\displaystyle \mu(P_t f)\,=\,\mu(f) \ \ \ \ \ (1)$

for all ${f\in\mathcal{F}}$. Now, even the notion of a “Markov semi-group” can be intimidating (to me), but it really corresponds to the most simple process. A Markov semi-group is a family of maps ${(P_t):\mathcal{F}\rightarrow\mathcal{F}}$ indexed by the positive reals ${t\in\mathbb{R}_+}$ that transforms functions in a way that applying the map for time ${t}$, and then time ${s}$, is equivalent to applying it for time ${t+s}$: ${P_{t+s}=P_tP_s}$. Simple examples are e.g. any random walk on the domain of ${\mathcal{F}}$, with ${P_t}$ denoting averaging over inputs obtained by taking a “${t}$-step” walk, or a diffusion process on real functions, such as convolving with a Gaussian of variance ${t}$ or evolving ${f}$ under the heat equation for time ${t}$. Some concrete examples that I’ll refer to throughout:

Examples

Random walks on regular graphs. Let ${G=(V,E)}$ be a regular undirected graph with degree ${d}$, and ${A}$ its normalized adjacency matrix: ${A_{i,j} = 1/d}$ if ${\{i,j\}\in E}$ and ${A_{i,j}=0}$ otherwise. Let ${\mathcal{F} = {\mathbb R}^V}$. If ${f\in \mathcal{F}}$ and ${v\in V}$, then ${(Gf)(v) = E_{u\sim v}[f(u)]}$, where the expectation is taken over a random neighbor ${u}$ of ${v}$ in ${G}$. If ${f}$ is a distribution over ${V}$, then after ${t}$ steps of the discrete-time random walk on ${G}$ the distribution is given by ${G^t f}$. The standard way to make this into a continuous-time random walk is to wait at a vertex ${v}$ for an amount of time distributed as an exponential with mean ${1}$, and then move to a random neighbor. In other words, the number of steps taken in time ${t}$ is distributed as a Poisson point process ${N(t)}$ with rate ${1}$. Starting in an initial distribution ${f}$, the distribution after time ${t}$ is given by

$\displaystyle P_t f = \sum_{k=0}^\infty \Pr(N(t) = k) G^k f = \sum_{k=0}^\infty \frac{t^k}{k!} e^{-t}G^k f = e^{t(G-I)}f.$

The same construction extends to any irreducible Markov chain ${K(x,y)\geq 0}$, ${\sum_y K(x,y)=1}$, for which the associated semigroup is ${P_t f = e^{t(K-I)}f}$.

Ornstein-Uhlenbeck process. Here we take ${\mathcal{F}}$ the set of continuous functions from ${\mathbb{R}}$ to itself, and

$\displaystyle P_tf(x) = \int_{\mathbb{R}} f(e^{-t} x + \sqrt{1-e^{-2t}} y) \frac{e^{-y^2}/2}{\sqrt{2\pi}} dy.$

This describes the evolution of a particle submitted to Brownian motion under the influence of friction. It is a continuous, Gaussian analogue of the Bonami-Beckner noise operator ${T_\rho}$ on the hypercube that crops up in computer science applications, ${T_\rho f(x) = E_{y\sim x} [f(y)]}$ where ${y}$ is obtained from ${x\in\{0,1\}^n}$ by flipping each of its coordinates independently with probability ${p=1-\rho}$. Time-independent evolution. Schrödinger’s equation dictates that a quantum state ${\rho}$ suject to a fixed Hamiltonian ${H}$ will evolve as ${\rho\mapsto \rho(t) = e^{-iHt}\rho e^{iHt}}$ (where I set ${\hbar =1}$ for simplicity). It is easy to check that ${P_t:\rho\mapsto \rho(t)}$ is a Markov semi-group. Note however this semi-group acts on a different kind of space than the previous examples: ${\rho}$ is an operator on a Hilbert space ${\mathcal{H}}$, rather than a function on a certain domain. Dealing with such examples requires non-commutative analogues of the classical theory to which I hope to return in the next post.

Ergodicity. Now that we have a cast of players, let’s see what kind of questions we’d like to answer about them. The most basic question is that of convergence: given a Markov semi-group ${(P_t)}$ defined on a function space ${\mathcal{F}}$, does the sequence ${(P_t f)}$ converge, and if so at what rate? The latter part of this question requires one to specify a distance measure on ${\mathcal{F}}$. For this we will use the ${L^p}$ norms, defined as ${\|f\|_p = (\int |f|^p d\mu)^{1/p}}$ for some measure ${\mu}$ on the domain of functions in ${\mathcal{F}}$. Important special cases are ${p=1,p=2}$ and ${p=\infty}$. Here the measure ${\mu}$ will always be an invariant measure for ${P_t}$, i.e. it satisfies (1). There does not always exist an invariant measure, and there can be more than one; in practice though ${(P_t)}$ and ${\mu}$ are often defined together and for the purposes of this post I will assume there is a unique invariant ${\mu}$ (for the example of a semi-group defined from a Markov chain, this corresponds to the chain being irreducible, and ${\mu}$ is its unique stationary measure). The question then becomes, does ${(P_t f)}$ converge towards its expectation ${\mu(P_t f) = \mu(f)}$ (by invariance), and if so at what speed? This is precisely the question that log-Sobolev inequalities will let us answer.

One says that ${(P_t f)}$ is ${p}$-ergodic if ${\int |P_t f-\mu(f)|^p d\mu \rightarrow_{t\rightarrow\infty} 0}$ for all ${f\in\mathcal{F}}$, i.e. convergence holds in the ${p}$-norm. Note this is equivalent to requiring ${\|P_t-\mu\|_{p,\infty} \rightarrow_{t\rightarrow \infty} 0}$, where ${\|\cdot\|_{p,q}}$ denotes the operator norm from ${L^p}$ to ${L^q}$. Since ${\|f\|_p \leq \|f\|_{p'}}$ for all ${f}$ and any ${p\leq p'}$, the notion of ${p}$-ergodicity becomes stronger as ${p}$ increases. We will see three different ways to obtain bounds on the rate of convergence: the spectral gap inequality, equivalent to ${2}$-ergodicity; Sobolev inequalities and ultra-contractivity, which imply ${\infty}$-ergodicity, and log-Sobolev inequalities and hypercontractivity, which are related to ${p}$-ergodicity for intermediate ${p}$.

Before getting started there is one last quantity associated with any Markov semi-group that we need to introduce, the Liouvillian ${\mathcal{L}}$. The Liouvillian is defined through its action on functions in ${\mathcal{F}}$ as

$\displaystyle \mathcal{L}f = \lim_{t\rightarrow 0^+}\, \frac{P_t-Id}{t}f.$

Equivalently, the Liouvillian can be defined through the functional equation ${\partial_t(P_t f) = \mathcal{L} (P_t f)}$: it plays the role of infinitesimal generator for the semi-group ${(P_t)}$. For the example of the random walk the Liouvillian is simply ${\mathcal{L} = G-I}$. For the Ornstein-Uhlenbeck process one finds ${\mathcal{L}(f)(x)=f''(x)-xf'(x)}$, and for the Bonami-Beckner noise operator ${\mathcal{L} = (1-\rho) (H-I)}$ where ${H}$ is the adjacency matrix of the hypercube. Finally in the example of (time-independent) Schrödinger evolution we have ${\partial_t(\rho(t)) = i [\rho(t),H]}$, so that the Liouvillian is ${\mathcal{L}(\rho) = -i[H,\rho]}$.

Spectral gap inequality and ${L^2}$-ergodicity. A first family of inequalities are spectral gap inequalities, which turn out to be equivalent to ${L^2}$-ergodicity. The spectral gap of an invariant measure ${\mu}$ for the semigroup generated by ${\mathcal{L}}$ is defined as the largest ${\lambda>0}$ (if it exists) such that

$\displaystyle \mu\big(f (-\mathcal{L}) f\big)\,\geq\,\lambda\, \mu(f^2) \ \ \ \ \ (2)$

holds for all ${f}$ such that ${\mu(f)=0}$. In the example of the Markov chain, ${\mathcal{L} = K-I}$ and the inequality can be rewritten as ${E_{x\sim y}[ |f(x)-f(y)|^2] \geq \lambda E_{x,y}[|f(x)-f(y)|^2]}$, where the first expectation is taken over ${x\sim \mu}$ and ${y\sim K(x,\cdot)}$, and the second expectation is over independent ${x,y\sim \mu}$. In particular for random walks on graphs we recover the standard definition of the Laplacian ${L=-\mathcal{L}}$.

Applying (2) to ${P_t f}$ we see that it is equivalent to requiring ${\partial_t \mu((P_tf)^2) \leq -2\lambda \mu((P_tf)^2)}$, which using ${P_0=Id}$ implies the bound ${\mu((P_t f)^2) \leq e^{-2\lambda t} \mu(f^2)}$ for all ${f}$ such that ${\mu(f)=0}$. Applying the appropriate translation the same inequality can be rewritten as

$\displaystyle \mu((P_t f - \mu(f))^2) \leq e^{-2\lambda t} \mu((f-\mu(f))^2),$

which is precisely a ${L^2}$ convergence bound. Conversely, by differentiating the above inequality we recover (2), so that the two are equivalent.

Rather than ${L^2}$ convergence one would often like to derive bounds on convergence in ${L^1}$, or total variation, distance. One way to do this is to use Hölder’s inequality as

$\displaystyle \begin{array}{rcl} \|P_t-\mu\|_{1,\infty}^2&=&\sup_f \|P_t f - \mu(f)\|_1^2 \\ &\leq& \Big(\sup_f \frac{1}{\mu(f)}\Big)\sup_f \mu((P_t f - \mu(f))^2)\\ &\leq& \Big(\sup_f \frac{1}{\mu(f)}\Big)e^{-2\lambda t}. \end{array}$

Thus we get an exponentially decaying bound, but the prefactor could be very large; for instance in the case of a random walk on a graph it would typically equal the number of vertices. Our goal is to do better by using log-Sobolev inequalities. But before investigating those, let’s first see what a Sobolev (without the log!) inequality is.

Sobolev inequalities and ${L^\infty}$-ergodicity. Sobolev inequalities are the “high-${p}$” generalizations of the spectral gap inequality. ${\mu}$ satisfies a Sobolev inequality if there is a ${p>2}$ and constants ${a>0}$, ${b\geq 0}$ such that for all ${f}$ with ${\mu(f)=0}$,

$\displaystyle \|f\|_p^2 \leq a \mu(f (-\mathcal{L}) f) + b \mu(f^2).$

Sobolev inequalities imply the following form of ergodicity:

$\displaystyle \|P_t f - \mu(f)\|_\infty \leq c\big(\frac{a}{2et}+b\big)^\gamma \|f-\mu(f)\|_1,$

where ${c,\gamma}$ are constants depending on ${p}$ only (and ${c\rightarrow \infty}$ as ${p\rightarrow 2}$). If one can take ${b=0}$ (which is often the case) directly gives a polynomial bound on ${\|P_t - \mu\|_{1,\infty}}$ which is incomparable to the one we derived from the spectral gap inequality: on the one hand it does not have the large prefactor, but on the other it only gives polynomial decay in ${t}$. This form of contractivity, in the ${1\rightarrow\infty}$ operator norm, is called “ultracontractivity”.

An important drawback of Sobolev inequalities is that they do not “tensor” well, leading to dimension-dependent bounds. What this means is that if ${P_t}$ acts on functions of a single variable in a certain way, and we let ${P_t^{\otimes n}}$ act on ${n}$-variable functions in the natural way, then the resulting semigroup will in general not satisfy a Sobolev inequality with constants ${a,b}$ independent of ${n}$, even if ${P_t}$ itself did. This contrasts with the spectral gap inequality: it is not hard to see that the spectral gap of ${P_t^{\otimes n}}$ is the same as that of ${P_t}$. This makes Sobolev inequalities particularly hard to derive in the infinite-dimensional settings that originally motivated the introduction of log-Sobolev inequalities (I hope to return to this in the next post).

Log-Sobolev inequalities and ${L^p}$-ergodicity. We finally arrive at log-Sobolev inequalities. We’ll say that ${\mu}$ satisfies a log-Sobolev inequality if there exists constants ${c>0}$, ${d\geq 0}$ such that for all non-negative ${f}$,

$\displaystyle \mu\Big(f^2 \log \frac{f^2}{\|f\|_2^2}\Big) \leq c\mu(f (-\mathcal{L}) f) + d \mu(f^2). \ \ \ \ \ (3)$

Often it is required that ${d=0}$, and the log-Sobolev constant is ${\alpha=1/c}$ where ${c}$ is the smallest positive real for which (3) holds for all ${f}$. A log-Sobolev inequality can be understood as a “weakest” Sobolev inequality, obtained as ${p\rightarrow 2}$; indeed, one can show that if ${\mu}$ satisfies a Sobolev inequality then it also satisfies a log-Sobolev inequality. Gross related the log-Sobolev inequality to hypercontractivity by showing that it is equivalent to the statement that for all ${t>0}$ and all ${2\leq q \leq 1 + e^{4\alpha t} }$ the bound

$\displaystyle \|P_t-\mu\|_{2,q} \leq 1$

holds: ${(P_t)}$ is hypercontractive for the ${2\rightarrow q}$ operator norm if and only if the log-Sobolev constant satisfies ${\alpha \geq \ln(q-1/(4t))}$. One can also show that ${\alpha \leq \lambda/2}$ always holds. But a direct use of the log-Sobolev inequality can give us a better convergence bound than the one we derived from the spectral gap: a more careful use of Hölder’s inequality than the one we made earlier easily leads to the bound

$\displaystyle \|P_t-\mu\|_{1,\infty}^2\,\leq\, 2\log\Big(\sup_f \frac{1}{\mu(f)}\Big) e^{-2\alpha t},$

a marked improvement over the prefactor in the bound derived from ${\lambda}$ alone (provided ${\alpha}$ and ${\lambda}$ are of the same order, which turns out to often be the case).

Aside from this improvement a key point making log-Sobolev inequalities so useful is that, contrary to Sobolev inequalities, they tensor perfectly: if ${\mu_i}$ satisfies a log-Sobolev inequality with coefficients ${(c_i,d_i)}$ for ${i=1,2}$ then ${\mu_1 \otimes \mu_2}$ satisfies one with coefficients ${c=\max(c_1,c_2)}$ and ${d=d_1+d_2}$. Thus for example the standard approach for proving hypercontractivity for the Bonami noise operator on the ${n}$-dimensional hypercube: first, prove the “two-point inequality” for the single-variable case, ${\|T_\rho\|_{p,q} \leq 1}$ for any ${\rho \leq \sqrt{(p-1)/(q-1)}}$. This by itself is a nontrivial statement, but it boils down to a direct calculation. Then use Gross’ equivalence to deduce that ${T_\rho}$ satisfies a log-Sobolev inequality with the appropriate constant ${\alpha}$. Deduce that the ${n}$-dimensional noise operator satisfies a log-Sobolev inequality with the same value of ${\alpha}$, and conclude hypercontractivity of ${T_\rho}$ for general ${n}$: ${\|T_\rho\|_{2,q}\leq 1}$ for all ${2\leq q \leq 1+\rho^{-2}}$.

I hope this got you interested in the topic. I wish I had thought of browsing through the surveys mentioned at the start of the post before the BIRS workshop…Have a look, they’re quite interesting. Next time I’ll try to recall why the second part of the workshop title, “and quantum information theory”…