## Foundations of randomness, Day 2

Back to the fundamentals of randomness and our workshop in Stellenbosch – I want to discuss the remaining two days of talks!

The second day started off with a whirlwind tour of the TCS approach to pseudo-randomness by David Zuckerman. David surveyed known constructions of pseudo-random generators and extractors; he also gave a very comprehensive overview of the different types of sources that have been considered in this area and what kind of extraction (deterministic, seeded, two-source) is known to be possible (or not) for each type of source.

I’ll refer you to David’s slides for details. Two particular sources caught my attention. First David mentioned small-space sources, for which there are good deterministic extractors but so far only in the regime of high (polynomial) min-entropy. This type of source seems quite relevant in practice, and it is natural to ask whether it is possible to do better using device-independent quantum extraction. How could we leverage the fact that the source is generated using bounded space? A second interesting category of sources are non-oblivious bit-fixing sources, which are sources such that some of the bits are uniformly random, and others deterministic, but such that which bits are of which type can be chosen adversarially by the source based on the values taken by previous bits. Together with Santha-Vazirani sources these are the two examples David gave of natural sources from which deterministic extraction is not possible. But we do know that it is possible to extract from a single Santha-Vazirani source, of bias arbitrarily close to ${1/2}$, using a device-independent quantum procedure. So what about non-oblivious bit-fixing sources — can it be done there as well?

Work on device-independent randomness extraction from the quantum information community has so far focused on a small set of sources — uniform seeds of limited length, arbitrary Santha-Vazirani sources, and more recently general min-entropy sources — but I see no reason why these are the only cases of interest; the restricted focus seems to be mostly for historical reasons. In pseudo-randomness different types of sources are often motivated by applications to derandomization, lower bounds, or combinatorial constructions. In quantum information, rather than “beating the classical” we should focus on the scenario that are best motivated by the relevant applications, which so far come mostly from cryptography. Here device-independent randomness certification procedures seem most relevant in their role as “decouplers” (see the discussion on “random to whom” from Valerio’s talk). What kind of structure are we willing to assume on the kind of information an adversary may have kept on a particular source? This question seems particularly relevant in light of recent works on device-independent two-source extraction.

David’s talk was followed with a talk by one of his students, Eshan Chattopadhyay, on their recent breakthrough construction of a two-source extractor for poly-logarithmic min-entropy. It is hard to do any justice to such an intricate (and beautiful!) result, and instead I will point you to Eshan’s slides or the great talk that David gave on the same topic on TCS+ just a few weeks ago.

For the last talk of the morning session Henry Yuen treated us to an (arguably :-)) even more formidable treat: infinite randomness expansion! Henry presented the main steps that led him and Matthew Coudron to their beautiful result showing how, starting from just a few uniformly random bits, one could bootstrap a quantum device-independent expansion procedure and generate as many uniformly random bits as one desires — yes, that’s an ${\infty}$! (I realize a bit late that I never properly explained what “device-independent” even means… if I have any “classical” readers left, for context for Henry’s talk see a blog post of Henry’s for all the required background; for the quantum crypto motivation see the viewpoint by Roger Colbeck; for the computer scientist perhaps the introduction to this paper is readable.)

In Henry and Matt’s protocol the length of the initial seed only needs to be at least some universal constant. The number of random bits available to start with will govern the security parameter (the distance from uniform) of the bits produced, but aside from that any small initial seed will do. The protocol uses two pairs of devices (those can be arbitrarily correlated, e.g. share randomness or quantum entanglement, but are not allowed to communicate directly) which take turns in expanding the current string of bits by an exponential factor. The key observation required to argue that the protocol works is that each expansion step, not only increases the number of bits available, but simultaneously “purifies” them, in the sense that the bits produced will have high entropy even conditioned on any information, classical or quantum, in the possession of the other pair of devices. Given that this other pair generated the input bits used for the expansion in the first place this is not a trivial observation, and is  essentially the content of the powerful “equivalence lemma” of Chung, Shi and Wu discussed in the post describing Valerio Scarani’s talk. The lemma guarantees that the bits produced by a device-independent procedure are random even from the point of view of an adversary who shares arbitrary correlations with the seed (including knowing it perfectly).

The result is quite impressive: if a few uniformly random bits are available, then it is automatically guaranteed that infinitely many just-as-good (in some respects even better) random bits can be generated! So much for our derandomization problems… The next step is to ask if uniformly random bits are even needed: how about weaker sources of randomness, such as Santha-Vazirani sources or general min-entropy sources? The task of device-independent extraction from the former was taken on by Colbeck and Renner (with much follow-up work); for the latter there is the work of Chung, Shi and Wu, which unfortunately still requires a number of devices that scales with the length of the source as well as the security parameter. More to be done!

The afternoon brought two more talks, with a very different focus but equally stimulating. The first was delivered by Ruediger Schack, who presented “The QBist perspective on randomness and laws of nature”. Ruediger’s talk followed the one by Carl Hoefer on the previous day in challenging our attitudes towards randomness and the meaning of probabilities, but Ruediger took us in quite a different direction. One of the points he made is that Bell’s theorem can be interpreted as giving us the following mutually exclusive alternatives about the world: either locality does not hold, and we must accept that far-away objects can have a nonlocal (instantaneous) influences on one another, or we insist on keeping locality — as Einstein did — but ascribe a different “meaning” to probabilistic statements, or predictions, that arise from the Born rule. (Note that in the first option in “nonlocal influence” I am including the kind of non-signaling “influence” which arises from measuring half of an entangled pair.)

This is where QBism — for “Quantum Bayenism” — comes in. According to Ruediger, the solution it offers to this conundrum goes through the assertion that, contrary to e.g. the Copenhagen interpretation, “probabilities are not determined by real properties”. This allows one to “restore locality” while not having recourse to local hidden variables (as Bohmian mechanics would) either. Thus in QBism “there is no Born rule”, in the sense that the rule does not describe an intrinsic probabilistic fact about the world; rather, probabilities are seen as the reflection of an agent’s subjective belief and have no objective existence. The fact that, if Alice measures her half of an EPR pair in the computational basis and obtains a ${|0\rangle}$, then Bob will with certainty observe a ${|0\rangle}$ as well, if he measures in the same basis, is not a “fact” about their systems; rather it is a prescription for how a rational Bob should update his belief as to the outcomes of an experiment he could perform, were he to be informed of the situation at Alice’s side.

The main conceptual tool used to formulate this interpretation of probabilities is the Dutch book used as a guide to how an agent (observer) should update its probabilities. I admit I find this interpretation a bit hard to digest — I cannot help but think of betting as a highly irrational procedure, and in general I would be hard-pressed to bet on any precise number or probability; the Born rule makes very precise predictions and ascribing them to an agent’s subjective belief feels like a bit of a stretch. Quite possibly it is also a problem of language, and I am not an expert on QBism (even the “usual” Bayesianism is far from obvious to me). Ruediger’s slides are very clear and give a good introduction for anyone interested in digging further.

The last talk of the day was given by Carl Miller, on “The extremes of quantum random number generation”. Carl explained how two very different principles enabled him and Yaoyun Shi to derive two independent lower bounds, each of interest in a separate regime, on the amount of randomness that can be generated from a pair of quantum devices in the sequential scenario: the devices are used repeatedly in sequence and one is interested in the min-entropy rate of one of the device’s outputs, as a function of the average observed violation of (in this case) the CHSH inequality: as the violation increases to the maximum the rate will improve to 1, and the question is how high a rate can be maintained is the violation is be bounded away from the optimum.

Carl’s first principle of measurement disturbance leads to a good rate in the high error regime, i.e. when the observed violation deviates significantly from optimum, say it is around ${0.8}$ (for an optimum of ${\approx .85}$). The general idea is that a violation of the CHSH inequality necessarily implies the use of “non-classical” measurements, which do not commute (in fact a maximal violation requires anti-commuting measurements), by Alice’s device. Two non-commuting measurements do not have a common eigenstate, hence whatever the state of the device at least one of Alice’s measurements will perturb it — the outcome of that measurement on the state cannot be deterministic, and randomness is produced.

Of course this is just the intuition, and making it quantitative is more challenging. In their paper Carl and Yaoyun formalize it by introducing a new uncertainty relation (see this very recent survey on the topic for much more) expressed in terms of Renyi ${\alpha}$-entropy for ${\alpha > 1}$. The proof relies on strong convexity of the underlying Schatten-normed space of matrices, and Carl gave some geometric intuition for the uncertainty relation. The use of ${\alpha}$-Renyi entropy is important for the inductive step; it allows to bypass the lack of good chain-rule like arguments for the min-entropy.

The second principle introduced in Carl’s talk is self-testing. This is the idea that a device demonstrating a close-to-optimal violation of the CHSH inequality must have a certain rigid structure: not only does it generate random bits, but the device’s outcomes must be produced in a very specific way. Specifically the device’s state and measurements must be equivalent, up to local isometries, to those used in the “canonical” optimal strategy for the CHSH game. This principle is well-known (see e.g. the paper by McKague for a proof), but usually one only expects it to be of use in the “tiny error” regime, i.e. when the observed violation is of order ${opt - \varepsilon}$ for small epsilon. Carl manages to squeeze the techniques further, and by carefully fine-tuning the argument is able to get good bounds on the randomness produced when the violation is large but still a constant away from optimum, say ${0.83}$ or so. In that regime the bound is better than the one obtained from the first principle; I’ll refer you to Carl’s slides for the precise curves.

After Carl’s talk, the mind full of new ideas we left for a little walk to the Lanzerac wine estate, where we were shown around the wine cellar — including the mandatory wine
tasting of course, with the highlight being the local pinotage — and finished off the day with a nice dinner on the terrasse: it’s late October in South Africa, summer is coming!

## Nonlocal games and operator spaces: a survey

Over the past few months, together with Carlos Palazuelos I wrote a survey article on “Nonlocal Games and Operator Space Theory”, and we just uploaded it to the arXiv. The survey is an invited contribution to a special issue of JMP on Operator Algebras and Quantum Information Theory which will hopefully appear soon.

### Exquisite(ly painful) memories

We had a lot of fun writing the survey. Teaming up with an expert in functional analysis behind probably more than half of the impressive range of results in quantum information theory that the “operator space approach” has so far led to gave me a great opportunity to solidify some of the mathematical material I have labored through in recent years.

It all started during a memorable summer of 2009 spent visiting Harry Buhrman’s group at CWI in Amsterdam. At the time the breakthrough by Perez-Garcia, Wolf, Palazuelos (yes, the same Palazuelos!), Villanueva and Junge demonstrating the existence of tripartite Bell inequalities with unbounded violation was all fresh. The result generated a lot of attention, first because it is a great and unexpected result (contrast with the fact that the maximum violation of bipartite inequalities is uniformly bounded, by Grothendieck’s constant), and second because the proof technique was completely novel, and seemed to rely crucially on “deep arguments from operator space theory”, a relatively young theory (barely younger than quantum information, but then time flows more slowly in mathematics) developed by operator algebraists starting with the work of Ruan in the late 1980s.

I spent probably a full month banging my head against the paper. After a month, I had gained a good grasp of just one thing: when they say deep arguments, they mean deep. Note the comment added for v2 of the paper on arXiv: “Substantial changes in the presentation to make the paper more accessible for a non-specialized reader”. You bet: that comment was already there in summer 2009, and I guarantee it didn’t make me feel any better! I printed the paper, borrowed all referenced books from the library, and tried. Then put it down, out of despair. Then tried again. Then put it down. Picked it up again. To the hell with it, put it down, tried to find my own proof. Failed. Back to the paper. It was a wild chase…I kept thinking I was starting to see how to go about things, only to fail — deeply.

Cutting a long story short, we did get a couple of things out of that summer. The paper by Perez-Garcia et al. [PGWPVJ] covers a lot of ground. The main result is an existential proof of a family of tripartite Bell correlation inequalities, or equivalently a family of three-player XOR games, such that the maximum advantage of quantum players sharing entanglement over classical players using only shared randomness, as measured by the ratio of the corresponding maximum biases (bias = deviation of acceptance probability from ${1/2}$), is unbounded: it grows with the size of the games. A second result proved in [PGWPVJ] is that achieving any unbounded violation requires the players to share a relatively complex entangled state: GHZ states ${|\psi\rangle= \frac{1}{\sqrt{d}} \sum_i |i\rangle|i\rangle|i\rangle}$ (of any dimension ${d}$), arguably a natural generalization of bipartite maximally entangled states, can only lead to a violation bounded by a universal constant.

After much suffering we were able to obtain generalizations and simplifications of both results — stripping out virtually all the operator space theory! First, with Jop Briet, Harry Buhrman and Troy Lee we extended the second result and showed that the maximum violation remains bounded for a larger class of states, including what we called “clique-wise entanglement” — combinations of EPR pairs, GHZ states, etc., as well as “Schmidt states”, states of the form ${|\psi\rangle = \sum_i \alpha_i |i\rangle|i\rangle|i\rangle}$ for arbitrary coefficients ${\alpha_i}$. While this may seem innocuous, it turns out the extension to Schmidt states solves (via a reduction proved in [PGWPVJ]) an open question in operator algebras from the 1970s! The observation won us a fancy-sounding publication (“All Schatten spaces endowed with the Schur product are Q-algebras”) in a fancy-sounding journal (“Journal of Functional Analysis”). The result also has an interesting, completely classical consequence for the direct sum problem: it implies that there exists a three-player XOR game ${G}$ whose bias is bounded away from ${1}$ by a constant, but the bias of the XOR game obtained by taking the direct sum of ${k}$ copies of ${G}$ (the game is played ${k}$ times in parallel and the parity of all answers is checked against the correct parity) remains bounded away from ${0}$, for all ${k}$ — a strong counterexample to parallel repetition (for the direct sum) of three-player games.

Second (and with a couple years’ more effort), with Jop Briet we managed to re-derive the main result of [PGWPVJ], again with some improvements: while still probabilistic our construction is explicit (we give a simple distribution on games such that the large violation holds with high probability), and the dependence of the violation on the game size as well as the entanglement dimension are greatly improved. Arguably our construction borrows its key ideas from whatever we could squeeze out of [PGWPVJ], but there are no more operator spaces! The most fancy tool used in the construction is a tail bound for second-order Gaussian chaos due to Latala.

### The operator space connection

So, does operator space theory really provide a natural framework for the study of Bell inequalities, or does the heavy machinery systematically hide observations that could be obtained by easier means? While the range of results obtained by Perez-Garcia et al. using their techniques (including large violations for bipartite  (non-XOR) inequalities and applications to quantum Shannon theory) seems to indicate that the tools are useful, it does not preclude that similar results could be obtained without having to resort to such high levels of obfuscation.

Since then I have grown more and more fond of the approach, to the point where I would in some cases agree that it provides the “right” point of view on a question about Bell inequalities (or nonlocal games) — a conclusion I would never have been able to reach without the invaluable help of David, Carlos, Marius, and my co-authors in these endeavors, Jop and Oded, whom I gratefully acknowledge! An example where I would say that the operator space viewpoint is undeniably the “right” one arises in work with Oded Regev on “Quantum XOR games” that grew out of our exploration of non-commutative extensions of Grothendieck’s inequality. For me a highlight of this work has been a derivation of the operator space Grothendieck inequality based on elementary techniques strongly inspired by the use of embezzlement states in quantum nonlocality.

These results, and many more, are described in the survey. While the JMP special issue is targeted at mathematicians, we tried to make the survey as self-contained as possible, and my hope is that it provides an easily approachable introduction to the delights of tensor norms, completely bounded maps and the associated “o.s.s.” for the interested quantum information theorist with little or no background in ${C^*}$-algebras or operator space theory.

### An example

I’d like to end with a bird’s-eye view of the operator space point of view on entangled games by taking an example from the survey, an upper bound on the largest possible ratio between the maximum success probabilities achievable by quantum entangled and classical players respectively in a two-player game, as a function of the number of answers per player in the game. For a formal treatment I refer to Proposition 4.5 in the survey; here I will stay at the level of symbols, hoping to carry some of the flavor across, but please keep in mind that the exposition below is not fully accurate.

#### Games as tensors and values as norms

The functional analytic approach views a game as a tensor ${G}$ on the space ${({\mathbb C}^N\otimes {\mathbb C}^K)\otimes ({\mathbb C}^N\otimes {\mathbb C}^K)}$. Here ${N}$ and ${K}$ are respectively the number of questions and answers to and from each player, and each coefficient ${G_{x,y}^{a,b}}$ of the tensor corresponds to the associated payoff to the players in the game. The main thrust of the approach is that different value we are interested in — the classical value, or maximum success probability of classical players in the game, and the entangled value, or maximum success probability of quantum players sharing entanglement — are directly related to the norm of ${G}$ when the underlying spaces are normed in the appropriate way, as Banach spaces (for the classical value) or operator spaces (for the entangled value).

Let’s consider the classical value first. To capture it we’ll turn the vector space ${({\mathbb C}^N\otimes {\mathbb C}^K)\otimes ({\mathbb C}^N\otimes {\mathbb C}^K)}$ into the Banach space ${\ell_1^{N}(\ell_\infty^{K})\otimes_\epsilon \ell_1^{N}(\ell_\infty^{K})}$. What does this mean? Let’s first focus on one of the factors, ${\ell_1^{N}(\ell_\infty^{K})}$. We’re using the ${\ell_\infty}$ norm for the space ${{\mathbb C}^K}$ associated to answers, and the ${\ell_1}$ norm for the space ${{\mathbb C}^N}$ associated to questions. This choice is consistent with the interpretation of the game tensor as an object dual to the player’s strategies: glossing over considerations of non-negativity, a (classical) strategy is a collection of numbers ${\{f_x^a\}}$ such that

$\displaystyle \big\|\{f_x^a\}\big\|_{\ell_\infty^N(\ell_1^K)} = \max_x\,\sum_a |f_x^a| \leq 1.$

Once a norm has been defined on the spaces associated with each player’s strategy, it remains to put them together into a norm on the full game tensor, which lives on the tensor product of the two spaces. In general (including here) there will not be a unique way to norm the algebraic tensor product of two Banach spaces in a way that is consistent with the two Banach norms. (Indeed, the study of all possible such norms was pioneered by Grothendieck.) For our purposes it turns out that the smallest such norm, the injective norm ${\|\cdot\|_\epsilon}$, is the right choice: the norm ${\|G\|_{\ell_1^{N}(\ell_\infty^{K})\otimes_\epsilon \ell_1^{N}(\ell_\infty^{K})}}$ exactly coincides with the maximum success probability of classical players in game ${G}$. This is an easy calculation from the definition of the injective norm, but I’ll skip it here.

So that’s for classical strategies. Quantum strategies are a bit more complicated: instead of sequences of numbers ${\{f_x^a\}}$ they are represented by sequences of (positive semidefinite) matrices ${\{A_x^a\}}$, of any dimension, such that ${\sum_a A_x^a = \mathbb{I}}$ for all ${x}$. This is where operator spaces come in. An operator space is a structure that can be built on top of a Banach space by defining not just one norm but a sequence of matrix norms, on matrices of arbitrary dimension ${d=1,2,\ldots,}$ with entries in the Banach space. Here the most natural operator space structure on ${\ell_\infty^N(\ell_1^K)}$ replaces ${\ell_\infty^N}$ by ${M_N}$ (${N\times N}$ matrices with the operator norm) and ${\ell_1^K}$ by ${S_1^K}$, whose operator space structure is a bit hard to define directly but can be obtained naturally via duality with ${M_K}$ (${S_1^K}$ does not correspond to the Schatten-${1}$ norm; note that here we need an operator space, which provides much more structure than a Banach space). Once again I’ll skip the details: you’ll have to believe me that the resulting operator space structure, that I’ll continue writing as ${\ell_1^N(\ell_\infty^K)}$, precisely captures the notion of quantum strategy, i.e. a family of POVMs. (The “precisely” here is somewhat blunt; in particular there are subtleties involved with e.g. dealing with the requirement that POVM elements be positive semidefinite, but it is possible to work around those.)

Next we need to norm the tensor product, again as an operator space. As it turns out, the injective norm that worked so well for classical strategies has a direct operator space analogue, the minimal norm, which perfectly captures the quantum value of the game tensor! (I won’t show why, but just as for the injective norm it’s a simple calculation from the definition.) I’ll write ${\|G\|_{\ell_1^{N}(\ell_\infty^{K})\otimes_{min} \ell_1^{N}(\ell_\infty^{K})}}$ for this norm, and from now on it’ll serve as a proxy for the entangled value of the game ${G}$.

That’s a lot of “coincidences”, isn’t it? A norm on the tensor product of Banach spaces matches the classical value of a game tensor, while the natural operator space analogue of that norm matches the quantum value. Now you may start to see why someone well-versed in the theory of such norms might see them as the “right” way to look at classical, and quantum, strategies. More details on this point in Section 2 of the survey.

#### A bound on the largest violation

But let’s move on. With the basic normed spaces in place the task of relating the classical and entangled values of a game can be formulated as follows: it amounts to estimating the norm of the identity map (as an aside, almost all results in functional analysis, or at least those that connect to the study of nonlocal games, that I have encountered seem to reduce to studying the “norm of the identity”… a fact that still makes me smile every time I stumble upon it: why isn’t the identity trivial?)

$\displaystyle id:\,\ell_1^{N}(\ell_\infty^{K})\otimes_{\epsilon} \ell_1^{N}(\ell_\infty^{K})\rightarrow \ell_1^{N}(\ell_\infty^{K})\otimes_{min} \ell_1^{N}(\ell_\infty^{K}).$

Indeed, as we have seen the norm of the game tensor when seen as an object on the first space corresponds to its classical value, while the norm of the same tensor considered as an object on the second space matches its entangled value. Any upper bound on the norm on the identity is an upper bound on the largest (multiplicative) advantage that can be gained by quantum players, while any lower bound will imply the existence of a game for which the advantage can be just as large.

Our strategy for bounding the norm of ${id}$ is to decompose it as a sequence of three simpler maps, on which it is easy to obtain norm estimates. We’ll start from the right-hand side, and consider first the arrow

$\displaystyle \ell_1^{NK}\otimes_{min} \ell_1^{NK}\rightarrow \ell_1^{N}(\ell_\infty^{K})\otimes_{min} \ell_1^{N}(\ell_\infty^{K}).$

If the map is taken to be the identity, the condition ${\max_{x} \|\sum_a A_x^a\|\leq 1}$ only gives us ${\max_{x,a} \|\sum_a A_x^a\|\leq 1}$, thus the norm of the identity is ${1}$. We can do better: consider the Fourier transform

$\displaystyle \mathcal{F}: \,\{A_{x}^a\}_{x,a}\mapsto \Big\{ E_{x,k} = \frac{1}{\sqrt{K}}\sum_a e^{\frac{-2i\pi ka}{K}} A_{x}^a\Big\}_k, x=1,\ldots,N.$

It is not hard to verify that if for every ${x}$, ${\{A_x^a\}_a}$ is a POVM then for every ${x,k}$ it will hold that ${\| E_{x,k} \|\leq K^{-1/2}}$: accounting for both players we saved a factor ${K^{-1}}$.

The next arrow we consider is

$\displaystyle \ell_1^{NK}\otimes_{\epsilon} \ell_1^{NK}\rightarrow \ell_1^{NK}\otimes_{min} \ell_1^{NK}.$

This is a key step: we are switching from the ${\epsilon}$ to the ${min}$ norm, from classical to quantum. But note how now the spaces are just ${\ell_1}$: only questions, no answers — but there are signs, and what we are looking at is precisely the setting of XOR games, i.e. Grothendieck’s inequality! Therefore for this map we can simply take the identity, and its norm will be at most Grothendieck’s constant ${K_G}$.

Finally, for the last arrow we consider

$\displaystyle \ell_1^{N}(\ell_\infty^{K})\otimes_{\epsilon} \ell_1^{N}(\ell_\infty^{K})\rightarrow \ell_1^{NK}\otimes_{\epsilon} \ell_1^{NK}.$

In order for the three arrows to compose to ${id}$ we should take this map to be the inverse of the Fourier transform ${\mathcal{F}}$ used in the first step. But note that now we are working with the ${\epsilon}$ norm, or in other words classical strategies. Thus it suffices to bound

$\displaystyle {\max_x\sum_a K^{-1/2}|\sum_{k} e^{\frac{2i\pi ak}{K}} t_{x,k}|}$

for ${t_{x,k}}$ such that ${\max_{x,k} |t_{x,k}|\leq 1}$, and by Parseval this is easily seen to be at most ${K}$. Hence our first map has norm ${K^2}$.

There we are! We managed to write the identity ${id}$, which maps the game tensor from an object acting on classical strategies to one acting on quantum strategies, as the composition of three maps with norms at most ${K^2}$, ${K_G}$ and ${K^{-1}}$ respectively. As a direct consequence, we’ve proven that the largest violation achievable by quantum strategies is at most ${K_G\cdot K}$, i.e. it is at most a constant times the number of possible answers in the game (warning: some constant factors missing, from e.g. glossing over complex vs real vs non-negative issues in the above).

What I like about this proof is that, from the point of view of operator spaces the above sketch is very natural (at least if I am to believe Carlos…), almost an exercise: as already mentioned, estimating the norm of the identity from one space to the other is the bread and butter of functional analysts, and they certainly have more than one trick up their sleeve. Unwinding the argument to frame it as a rounding procedure that transforms quantum entangled strategies into classical ones reveals some surprising tricks (e.g. the Fourier transform, which is used here to perform the actual quantum-to-classical rounding at the optimal step, the middle arrow) that one might not have thought of based solely on the usual “quantum toolbox”.

I’ll refer you to the survey for much more! All comments are very welcome and much appreciated; even typos: there is still plenty of time to fix those before the survey makes its way to the JMP presses.

Posted in Publications, Quantum, Science | Tagged , , | 3 Comments

## Foundations of randomness, remainder of Day 1

Today I will cover the remaining talks from the first day of the workshop. Valerio’s opening talk was followed by a stimulating blackboard talk by Renato Renner, who asked the question “Is the existence of randomness an axiom of quantum physics”? Randomness is usually viewed as entering quantum mechanics through the Born rule, which is stated as an axiom describing the outcome distribution of any given measurement on a quantum state; there is no standard derivation of the rule from other axioms. This is deeply unsatisfying, as the rule appears completely arbitrary, and indeed many attempts have been made at deriving it from more basic structural constraints such as the Hilbert space formalism underlying Gleason’s theorem, or assumptions based on decision theory (Deutsch) or logic (Finkelstein).

Renato and his student Daniela Frauchiger have a proposal whose main thrust consists in demarcating what could be called the “physical substrate” of a given theory (such as quantum mechanics) from its “information-theoretic substrate”. In short, the physical substrate of a theory governs which experiments are “physical”, or possible; it is a completely deterministic set of rules. The information-theoretic substrate provides an additional layer which considers assignments of probabilities to different experiments allowed by the physical substrate, and specifies rules that these probabilities should obey.

Let me try to give a little more detail, as much as I can recall from the talk. Let’s call the framework suggested by Renato the “FR proposal”. The first component in the proposal is a definition of the physical substrate of a theory. In the FR proposal a theory bears on objects called stories: a story is any finite statement describing an experiment, where “experiment” should be interpreted very loosely: essentially, any sequence of events with a set-up, evolution, and outcome. For instance, a story could be that a pendulum is initialized in a certain position, left to swing freely, and after time ${t}$ is observed and found to be in a certain position. Or it could specify that a quantum particle is initialized in a certain state, say ${\ket{+}=\frac{1}{\sqrt{2}}(\ket{0}+\ket{1})}$, then measured in the computational basis ${\{\ket{0},\ket{1}\}}$, and that the outcome ${\ket{0}}$ is observed. Thus a story refers to a physical system and makes claims about observable quantities of the system.

Thus the physical substrate of the theory is a set of rules that completely determine which stories are “physical”, i.e. allowed by the theory, and which are not. For instance, the second story described above is a valid story of quantum mechanics: it is certainly possible that the outcome ${\ket{0}}$ is obtained when the measurement described is performed. On the other hand quantum mechanics would rule out a story such as “the state ${\ket{0}}$ is measured in the computational basis and the outcome ${\ket{1}}$ is obtained”: this is simply not allowed. Note that up to this point the theory makes no claim at all on the “probability” of a story occurring; it just states whether the story is possible or not.

So much for the physical substrate. How do probabilities come in? This is the goal of the information-theoretic substrate of the theory. To introduce probabilities we consider mappings from stories to ${[0,1]}$. Any such mapping can be thought of as a measure of an observer’s state of knowledge, or uncertainty, about any particular story (much more on this and other interpretations of probabilities will be discussed in later talks, such as Carl Hoefer’s coming up next). A priori the mapping could be arbitrary, and the role of the information-theoretic substrate of the theory is to specify which mappings are allowed, or “reasonable”. For instance, in the measurement example from earlier most observers would assign that particular story a probability of ${50\%}$, which is what the Born rule predicts. But without taking the rule for granted, on what basis could we assert that ${50\%}$ is the only “reasonable” probability that can be assigned to the story — why not ${10\%}$, or even ${100\%}$?

The main claim that Renato put forward is that the Born rule can be derived as a consequence of the following two broad postulates: (I apologize for the description of the postulates being rather vague; there are points about the formulation that I am unclear about myself!)

1. The repetition postulate: experiments can be repeated, and if so lead to the same outcomes.
2. The symmetry postulate: certain experiments have symmetries under which the outcome of the experiment is invariant. For example, an experiment which involves flipping three different coins in sequence and reporting the majority outcome is not affected by the order in which the coins are flipped.

The idea is that these postulates should be much easier to accept than the Born rule: certainly, if we want to say anything meaningful about certain probabilities being reasonable or not we ought to enforce some form of compatibility rules, and the two above sound rather reasonable themselves. The first seems a prerequisite to even make sense of probabilities, and the second follows the tradition started by de Finetti of basing the emergence of probability on indistinguishability of the outcomes of an experiment under certain symmetries of the experiment.

Unfortunately there does not yet seem to be a preprint available that describes these ideas in detail; hopefully it is coming soon. I’m quite curious to see how the proposal, that I find rather appealing, can be formalized.

After Renato’s talk — and an excellent lunch — we were treated to an intriguing talk on “Chance laws and quantum randomness” by Carl Hoefer. The talk addressed questions that we computer scientists or physicists rarely discuss openly (and indeed many of us might reserve for dreamy Sunday nights or inebriated Friday evenings). In Carl’s own words: “When we ascribe a primitive (irreducible) chance propensity, or postulate intrinsically random/chancy fundamental laws of nature, do we know what we’re saying?”. Or, less poetically put (but still in Carl’s words — Carl, thanks for making it accessible to us!): What does a claim such as “${\Pr(\text{spin-z-up} | \text{spin-x-up-earlier}) = 0.5}$” mean, and how is it different from “${\Pr(\text{spin-z-up} | \text{spin-x-up-earlier}) = 0.7}$”?

Let the question a chance to sink in — a good answer is not that obvious! It already arose in Renato’s talk, where probabilities were introduced as the reflection of an agent’s subjective belief. Well, this is the first of a number of possible answers to the question that Carl took head on and debunked, one slide at a time. (I am not sure how much I agree with Carl’s arguments here, but at least he showed us that there was certainly room for debate!) Other possible answers considered by Carl include probability as the reflection of an objective frequency for the occurrence of a certain event, or simply a “primitive fact” that would require no explanation. Needless to say, he found none of these interpretations satisfactory. It’s a good exercise to find arguments against each of them; see Carl’s slides for elements of answer (see also the article on Interpretations of probability on the Stanford Encyclopaedia of Philosophy for an extensive discussion).

A Galton board

So what is the speaker’s suggestion? I doubt my summary will make it much justice, but here goes: according to Carl probabilities, or rather “chance laws”, arise as a form of statistical averaging; in particular probabilistic statements can be meaningful irrespective of whether nature is intrinsically deterministic or irreducibly probabilistic. Specifically, Carl observes that many physical processes tend to generate a distribution on their observable outcomes that is robust to initial conditions. For instance, a Galton board will lead to the same Gaussian distribution for the location of the ball, irrespective of the way it is thrown in (except perhaps for an exceptional set of initial conditions of measure zero). Whether the initial conditions are chosen deterministically or probabilistically we can meaningfully state that “there is ${x\%}$ chance that the ball will exit the board at position ${z}$”.

Carl concluded his talk by tying it to the Bohmian interpretation of quantum mechanics — almost, though perhaps not quite, going as far as to frame deterministic Bohmian mechanics as the most reasonable way to interpret probability in quantum mechanics. I found the discussion fascinating: I had never considered Bohmian mechanics (or, for that matter, the question of the origin of probabilities in quantum mechanics) seriously, and Carl made it sound, if not natural, at least not wholly unreasonable! In particular he made a convincing case for the possibility of ascribing meaning to “probability” in a fully deterministic world, without having to resort to the need for “subjective beliefs” or related rabbits such as high sensitivity to initial conditions — indeed, here it is quite the opposite.

The last talk of the day was delivered on the whiteboard, by Marcin Pawlowski. Marcin gave an intuitive derivation of the quantum bound on device-independent randomness generation from the CHSH inequality, the famous (at least for those of us versed in these things… see Appendix A.1 here for a derivation)

$\displaystyle H_\infty(A|X)\geq 1-\log_2\Big(1+\sqrt{2-\frac{I^2}{4}}\Big) ,$

based solely on a monogamy bound due to Toner. Toner introduced a “two-out-of-three” variant of the CHSH game in which there are three players Alice, Bob and Charlie, but the game is played either between Alice and Bob, or Alice and Charlie, chosen uniformly at random by the referee. Although in principle the maximum success probability of the players in this game could be as high as that of the two-player CHSH (since in the end only one instance of the game is played, the third player being ignored), Toner showed that the fact that Alice did not know a priori whom of Bob and Charlie she was going to play with places strong constraints on her ability to generate non-local correlations with at least one of them. Quantitatively, he showed that the players’ maximum success probability in this game is precisely ${3/4}$, whether the players are allowed entanglement or not: the presence of the third player completely eliminates the quantum advantage that can be gained in the usual two-player CHSH game!

The derivation presented by Marcin is simple but enlightening (good exercise!). It could be considered the “proof from the book” for this problem, as it makes the intuition fully explicit: the reason Alice’s outcomes are unpredictable, from the point of view of Eve (=Charlie), is that they are strongly correlated with Bob’s outcomes (up to the Tsirelson bound); monogamy implies that correlations with Eve’s outcomes could not simultaneously be as strong.

Typical view from a Stellenbosch winery

That’s it for Day 1 – time to relax, enjoy a good bottle of local wine, and get ready for the second day!

Posted in Conferences, device independence, Quantum, Science, Talks | | 1 Comment

## Foundations of randomness, Day 1: Scarani

As mentioned in the previous post, a highlight of the project I participated in over the past five weeks was the three-day workshop that we organized at STIAS towards the end of October.

The purpose of the workshop was to bring together a small group of computer scientists, physicists, mathematicians and philosophers around the broad theme of “the nature of randomness”. Each of our respective fields brings its own angle on the problem, with a specific language for formulating associated questions, judging which are most important, and a set of techniques to approach them. In computer science we use entropy, complexity theory and pseudo-randomness; physicists have access to randomness through the fundamental laws of nature; philosophers introduce subtle distinctions between probability, chance, and intrinsic or subjective randomness. The presence of any such framework is a prerequisite to progress: it enables to formulate research problems, develop appropriate methodologies, and obtain results. But any framework also sets implicit boundaries and runs a risk of turning into a dogma, according to which “valid” or “important” questions and results are judged. From my perspective, if there is any respect in which I would dare deem the workshop a success it is precisely in the barriers it helped bring down. The wide-ranging talks and participants forced me to take a broad perspective on randomness and question some of the most deeply ingrained “certainties” that underlie my research (such as a firm — though baseless — belief that the Copenhagen interpretation is the only reasonable interpretation of quantum mechanics worthy of a practicing scientist). This questioning was only made possible thanks to the outstanding effort put in by all participant to deliver both accessible and stimulating talks as well as to actively engage with each other; I cannot thank them enough for making the workshop such an enlightening and interactive experience.

It seems impossible to do any justice to the ideas developed in each of the talks by discussing them here, and in doing so I am bound to misrepresent many of the speaker’s thoughts. For my own benefit I will still make an attempt at listing a few “take-away” messages I wish to keep; obviously all idiocies are mine, while any remaining insight should be attributed to the proper speaker.

The opening talk was delivered by Valerio Scarani, who started us off with a personal take on the history, and future, of “Quantum randomness and the device-independent claim“. The “irreducible presence” (much more on this “irreducibility” will be discussed in later talks!) of randomness in quantum mechanics is usually attributed to the Born rule, which states that the probability of obtaining outcome ${x}$ when measuring state ${|\psi\rangle}$ in a basis containing a unit vector ${|e_x\rangle}$ labeled by ${x}$ is given by the squared modulus of the inner product ${\langle e_x |\psi\rangle|^2}$. Valerio pointed out the (well-known, but not to me) fact that the introduction of this precise quantitative formulation of the rule came almost as an afterthought to Born; indeed the use of the square appears as a footnote to the main text in Born’s paper.

Extract from Born’s paper

(Incidentally, see this paper by Aaronson for interesting consequences of instead using a ${p}$-th norm for ${p\neq 2}$ in defining the rule.) I was somewhat puzzled by this: Born’s paper is dated 1926, a time when quantum mechanics was already a well-established theory, on which scores of physicists relied to do what physicists do — make calculations, formulate predictions, perform experiments, and repeat! But how can one make predictions if there is no rule governing which outcomes are to be obtained? My confusion likely stems from a basic misrepresentation of how physicists work (as may already be clear from the preceding description!); after clarifications I can offer two explanations: First, quantum mechanics was mostly used to compute properties of eigenstates of a given Hamiltonian, for whom properties which commute with the Hamiltonian always take on determined values, so that no probabilities are needed. Second, it is quite rarely the case (or at least was in the 1930s) that the outcome distribution of a fine-grained measurement on a single quantum mechanical system is accessible to experiment; instead only statistical properties of larger systems — such as the energy — are accessible, and the law giving the expectation value taken by a particular observable, ${\langle\psi|O|\psi\rangle}$, can be stated and used without bothering as to what it would mean to “measure in the eigenbasis of ${O}$” and obtain this or that measurement outcome: only the average has any experimental relevance. In any case, to close this parenthesis it is interesting to note that even as we now think of probabilities, and the “squared-modulus rule”, as what makes the specificity of quantum mechanics, the success of the theory initially had (and probably still has) little to do with it. In fact, it may have had quite the opposite effect, and induced a serious amount of confusion…

Indeed Einstein’s immediate reaction to Born’s paper is immortalized in his famous letter to Born from November 1926:

Quantum mechanics calls for a great deal of respect. But some inner voice tells me that this is not the right track. The theory offers a lot, but it hardly brings us closer to the Old One’s secret. For my part at least, I am convinced He doesn’t throw dice.

And thus started one of the most obscure periods in human thought… A famous anecdote along the route (though again, this was news to me) is given by von Neumann’s valiant attempt at establishing a rigorous “no-go” theorem for “hidden variables”, thereby proving the unavoidable resort to some form of uncertainty. Unfortunately it turned out that von Neumann’s theorem placed unreasonable assumptions on the hidden variable models to which it applied, making it all but inapplicable. Pressing ahead, skipping over Einstein, Podolsky and Rosen’s unfortunate account of (non-)locality in quantum mechanics, it is only with Bell’s 1964 paper demonstrating that the existence of local hidden variable theories for quantum mechanics could be experimentally tested that the debate was finally placed on firm — or, at least, in principle decidable — grounds. Indeed the major contribution of Bell’s work was to establish observable consequences for the existence of any hidden variable model for quantum mechanics — under the important assumption, to be further discussed in the talk by Ruediger Schack, that the model be local. Bell’s result had an immediate impact, prompting one of the most interesting experimental challenges in the history of physics. This is because there was no consensus as to what the outcome of the experiment “should” be, and proponents of a breakdown of quantum mechanics (at least in the regime concerned in Bell’s test and its subsequent refinement by Clauser et al.) stood on an equal footing with advocates of fundamental laws of nature that are non-local and probabilistic (of course either camp was dwarfed by the much larger number of physicists who simply considered the question irrelevant for the practice of their work… luckily for the progress of science, which would not have benefited much from stalling for two decades). And indeed early experiments went both ways, with “conclusive” results pointing in either direction obtained on the West and later East coasts of the US. The quest was brought to a climactic conclusion with Aspect’s experiments in Orsay in 1981-82, which conclusively handed the podium over to the Bell camp (how conclusively, however, you can judge from the amount of — justified — attention generated by the most recent experiments in Delft).

The history is fascinating (anyone can suggest a good, entertaining, informed and opinionated account?, but we should move on. Following the speaker’s lead, let me scoot ahead to modern times and the birth of device independence. This is usually attributed to the 1998 “UFO” by Mayers and Yao, who were the first to make a fundamental observation: that a complete quantum mechanical description of a black-box device, including its state and the measurements it performs on the state, can, in some cases, be inferred from the classicalinput-output measurement statistics it provides — provided one is willing to make one structural assumption on the device: that it has a certain bipartite structure, i.e. is composed of two systems on which certain (unknown) measurements can be independently performed (in their paper Mayers and Yao call this a “conjugate coding source”). The reason I qualify this result of “UFO” is that, although the motivation behind it is clear (obtain a proof of security for BB’84 that would be robust to imperfections in Alice’s photon generation device), the technique has no precedent. It is certainly natural to attempt to certify Alice’s device by performing some kind of tomography, but from there to seeing how a bipartite structure could be leveraged to obtain the result in such strong terms! (Note that, without a priori information on either the state or the measurements being performed, “tomography” cannot lead to any meaningful conclusion.) Certainly the authors must have been influenced by their intense ongoing work in proving security of QKD, and the intuition obtained from the security proofs. Still — quite a feat!

A second respect in which the Mayers and Yao paper can be qualified of “UFO” is that in spite of the conceptual breakthrough that we now recognize it for it was all but ignored following the years after its publication. One reason may be that it was published in the most obscure of conferences (proceedings of FOCS, anyone?). A more reasonable explanation is that the math in the paper is hard to follow: the notation is rather heavy and little intuition is given. From a QKD practitioner’s point of view, this is completely foreign territory; the techniques used bear very little resemblance to the techniques used at the time to prove security of QKD protocols and analyze their performance. Finally the result does suffer from one important drawback, which is that it is non-robust: Mayers and Yao were only able to characterize devices which perfectly reproduce the required correlations, but did not give any statement that tolerates ${\varepsilon}$‘s.

The second birth (in terms of explosion of attention and interest) of device independence can be attributed to the 2005 paper by Barrett, Hardy and Kent. Although I cannot speak for them my impression is that the main motivation for their work was foundational; indeed the central question they ask is the extent to which the correlations that are checked by the users in all QKD protocols are sufficient by themselves: that is, can quantum mechanics be thrown out, and security derived only from these correlations? After all, from the point of view of the users the protocol is completely classical, and its security can also be defined classically, as the maximum probability with which an eavesdropper can “guess” the key that is eventually produced. Thus even if an implementation of the protocol may rely on quantum mechanical devices, its mode of operation, and security, can be formulated purely using statements about classical random variables. Barrett et al. showed that security could indeed be established in a model that is broader than quantum mechanics, assuming only the existence of joint probability distributions describing the input-output behavior of the devices and a potentail adversary, and the assumption that these distributions satisfy the no-signaling principle: outputs of either of the three systems do not depend on inputs chosen by any other party. Their work builds upon a strong line of research in non-locality (e.g. the introduction of the famous “PR boxes” by Popescu and Rohrlich a decade earlier), and it had one immense merit, which helps explain its impact: by completely “stripping out” all the messy details that obscured existing security proofs for QKD they managed to restore the intuition behind the security of QKD (see Ekert’s paper for a beautifully succinct formulation of that intuition), showing how that intuition could be formalized and lead to an actual proof of security — and this, under arguably stronger terms than the existing ones! Thus the link between Bell violation and security was finally expressed in its purest form. The beautiful paper was immediately followed by an intense line of works exploring its consequences, deriving device-independent proofs of security for QKD which achieved better and better rates under weaker and weaker assumptions (for instance on noise tolerance, or on the types of attacks allowed of the adversary). This is a very long story, with much more to say (check out the fourth slide from Valerio’s talk!), but once again it is time to move on.

In the second part of his talk Valerio gave us his four main “concerns” regarding device independence. In Valerio’s opinion these “concerns” are important assumptions that underlie device-independent cryptography — assumptions that are reasonable as they are recognized as such, but could undermine the field if not made explicit and discussed upfront.

The four concerns are the following:

1. No-signaling (the physicist’s concern): It is usually claimed that device-independent protocols are secure “provided the no-signaling condition is satisfied”. But what does this mean? That no information can travel in-between the two devices within the amount of time that elapses between the moment at which the choice of basis is made, and the moment at which the device produces an outcome. There is a fundamental ambiguity here: how is the time of “basis choice” defined? Does it correspond to the moment when the experimenter first sets her mind on a particular setting (whether and how much “liberty” she really has in this choice is a different issue, the so-called “free will loophole”)? Or does it correspond to the moment at which the device is “informed” of that choice, as the experimenter presses the appropriate button? Or is it really only when the information reaches the photon polarizer located inside the device? The same questions can be asked for the time of “outcome production”: when exactly is the outcome determined? When the photon hits the receptor? When the outcome is displayed on the device? When the experimenter stares at it? Once more this is a subtle issue, related to the measurement problem in quantum mechanics and which components of the system are modeled as quantum and potentially in coherent superposition, and which are classical and “decohered”. Going from the one to the other requires an application of the Born rule, and it is not clear at what level this should take place.
2. Input randomness (the information-theorist’s concern): This is an important point, which I think is now well-understood but was certainly not when procedures for device-independent randomness certification were first being discussed. The question is how the randomness present in the input to the devices should be quantified — succinctly, “random to whom”? The crucial distinction is as follows: the input should be random-to-device, whereas the output should be random-to-adversary. That it is possible to transform the one to the other is, to a great extent, the miracle of device-independent randomness certification: even if the inputs are completely known to a third party, and even if that party is quantum and shares entanglement with the devices, as long as the inputs are chosen independently from the devices (very concretely, their joint state, when all other parties are traced out, is in tensor product form), the outputs produced are independent from the adversary (same formalization, but now tracing out the devices). For those interested in this issue I highly recommend having a look at the “equivalence lemma” of Chung, Shi and Wu (Lemmas 2.2 and 5.1 here).
3. Detection loophole (the hacker’s concern): arguably this is the most openly discussed of Valerio’s four concerns, and I won’t get into it too much. The problem is that current state-of-the-art photon receptors have low efficiency, and more often than not fail to detect an incoming photon. Thus in order to claim that the statistics observed in an experiment demonstrate the violation of a Bell inequality one has to resort to the “fair sampling assumption”, which states that no-detection events occur independently from the basis choice provided to the device. For most experiments this assumption seems reasonable, but in the context of device independence, where one often claims that the devices “have been prepared by the adversary”, it is certainly not so. Nevertheless we may hope that continued experimental progress (or different measurement techniques, such as the use of diamonds in the Delft experiment) will eventually eliminate this loophole.
4. (In)determinism (the philosopher’s concern): what if randomness simply doesn’t exist? Two of the three most widely debated attempts at making sense of probabilities in quantum mechanics, the many-worlds interpretation and Bohmian mechanics (the third being the Copenhagen interpretation), posit determinism at some level: in many-worlds, there is determinism for a being who can observe all universes; in Bohmian mechanics there is determinism for a “nonlocal” being who has access to all hidden variables. As Valerio pointed out, however, these need not be serious issues, as what we really care is that the randomness in the outputs produced, or the security of the key, are measured with respect to a being in our universe: indeed it should not be of any concern that the random bits are “determined” by some hidden variables if access to these hidden variables requires prior knowledge about the whole state of the universe.

There are a lot of valuable take-home messages from the talk. Valerio ended his talk by making the observation that while the violation of Bell inequalities implies the presence of randomness the converse is not necessarily true… This in turn prompts me to ask what really are the fundamental assumptions that are necessary and sufficient to guarantee randomness. Device independence, as it is now understood, relies on locality: if it can be assumed that certain events take place at positions that are isolated in space-time then certified fountains of randomness follow. But what if we are unsatisfied with the assumption — how necessary is it really? An interesting variation is to certify randomness based on the contextuality of quantum mechanics; this is a challenging setting both from an experimental and theoretical points of view and I expect it to receive more attention in the future. But there are possibilities that take us in completely different directions; as an obvious example pseudo-randomness offers the possibility of certifying randomness based on computational assumptions. Are we on the right track — what if we were to take the irreducible presence of randomness as an assumption, what consequences would follow? (See Yevgeniy Dodis’ talk, to be discussed later, for some consequences in cryptography).

Posted in Conferences, device independence, Talks | | 2 Comments

## The nature of randomness and fundamental physical limits of secrecy

The entrance to STIAS

I spent the past five weeks at the Stellenbosch Institute for Advanced Study (STIAS), a research institute located in beautiful Stellenbosch, 40km North-West of Cape Town. I was participating in a program (“program” may be an ambitious term for a five-participant endeavour — let’s call it a “project” instead), a project then, organized around the theme of The nature of randomness and fundamental physical limits of secrecy. The project was initiated by Artur Ekert, who formulated its theme and secured the support of STIAS. It was a great experience, and I’d like to write a few posts about it. To set the stage I’ll get us started by writing a few things about Stellenbosch and STIAS. The following posts will be more scientifically focused; in particular I’ll try to cover some of the stimulating talks that we were lucky to hear as part of a three-day workshop that was held a couple weeks ago and formed the organ point of the prorgram — no, the project.

Stellenbosch university

STIAS is a very interesting place. It is a small research institute nested within the large and airy campus of Stellenbosch University, known locally as “Maties” (don’t ask), one of the best universities in Africa. The spirit of STIAS is comparable to that of the IAS in Princeton, although it is far smaller: there is a single two-floor building in which I counted 22 offices; as far as I know aside from the director and administrative staff there is no permanent member. The scientific thrust of the institute is provided by its “fellows”, who typically spend a ~4-month period there (our project was thus shorter than average). Fellows are simply individuals who spend some research time at the institute, either on leave from their own institute or as part of a research project carried out with local collaborators. Concretely the institute provides office space, administrative support, and a stipend to cover local expenses. Much beyond this however STIAS provides a wonderful, relaxed and supportive environment in which to carry out research; a place which could easily top anyone’s list for best location to carry out a sabbatical leave.

Favorite study spot

Concretely, STIAS has two main attractions. The first is the local fauna — the STIAS fellows. During my stay among us could be found two philosophers, two biologists, a writer, two sociologists and three historians; not to mention the small group of eccentric physicists who insisted on dragging a whiteboard back and forth between the coffee machine, the sofas and the patio. The second attraction are the lunches. The institute staff cultivates its own little garden and prepares wonderful quiches, pies, salads, and, obviously, cakes, panna cotta, mousses and other culinary treats daily. Bring the two together and you end up with a unique epicurean experience, with the group of fellows gathering every day around a leisurely but stimulating lunch fuelled as much by the delicious food as by t
he wide-ranging conversation. (Note how the recipe closely parallels, for all I can tell, that of the IAS. Admittedly STIAS doesn’t have the cookies — but it does
offer an extensive wine & cheese reception following the “fellow’s seminar” held every
Thursday).

Lunch at STIAS

To set a broader stage, it may be worth mentioning that, as I am sure its director Hendrik Dreyer would not contradict, the culture of STIAS is not unaffected by Stellenbosch’s status as the heart of South African wineland. Indeed the surrounding hills are planted with the second oldest (1679!) vines of South Africa, right after the nearby Constantia and home of the famous pinotage variety of grapes, a cross-product of Pinot and Hermitage which brings out wines of a spirit that could be compared to the best frmo Burgundy. The city itself is rather quaint, populated by a mix of locals, tourists and students. The latter ensure the presence of a healthy coffee culture — and here I cannot help but mention the blue crane and butterfly, purveyor of my daily morning cortado and a highlight of my stay here. The surrounding landscape, dominated by beautiful mountains culminating in wide plateaus, is filled with wineries competing for the best view, lunch spot, and of course, wine tasting. Having been wine-educated in the best snobbish French tradition I was truly impressed by the quality of local wines; most of those I tried were quite balanced and subtle. Moreover, they are enjoyed as wine should: without pretension, and with good food and company! A pleasure that is not limited by the very affordable prices, with the good bottles being priced similarly to the daily liter of red in France, Italy or Spain, well under 10 euro. (I do realize that the notion of “affordable” is a complicated one here, South Africa being one of the countries in the world with the highest income inequality. At a political level our stay was dominated by intense student protests across South African universities in reaction to a sudden hike in registration fees decided by the government, which eventually had to back down.)

But enough tourism, let’s get to the science. In the next post we’ll start exploring “The nature of randomness and fundamental physical limits of secrecy”!

## Hypercontractivity in quantum information theory

In this post I’d like to briefly mention some connections between log-Sobolev inequalities, and more specifically hypercontractivity, with quantum mechanics and quantum information. To anyone interested I recommend the video from Christopher King’s excellent overview talk at the BIRS workshop. The talk includes a nice high-level discussion of the origins of QFT (roughly 25 minutes in) that I won’t get into here.

Existence and uniqueness of ground states in quantum field theories. The very introduction of (classical!) hypercontractivity, before the name was even coined, was motivated by stability questions in quantum field theory. This approach apparently dates back to a paper by Nelson from 1966, “A quartic interaction in two dimensions” (not available online! If anyone has a copy…my source on this are these notes by Barry Simon; see section 4). Nelson’s motivation was to establish existence and uniqueness of the ground state of an interacting boson field theory that arose as the natural quantization of a certain popular classical field theory. Let ${H}$ be the resulting Hamiltonian. ${H}$ acts on the Hilbert space associated with the state space of a bosonic field with ${n}$ degrees of freedom, the symmetric Fock space ${\mathcal{F}_n}$.

The main technical reason that make bosonic systems much easier to study than fermionic ones is that bosonic creation and annihilation operators commute. In the present context this commutativity manifests itself through the existence of an isomorphism, first pointed out by Bargmann, between ${\mathcal{F}_n}$ and the space ${L^2({\mathbb R}^n,\gamma^n)}$, where ${\gamma}$ is the Gaussian measure on ${n}$. The connection arises by mapping states in ${\mathcal{F}_n}$ to functions on local observables that lie in the algebra generated by the bosonic elementary creation and annihilation operators. Since these operators commute (when they act on distinct particles), the space of such functions can be identified with the space of functions on the joint spectra of the operators; doing this properly (as Bagmann did) leads to ${L^2({\mathbb R}^n,\gamma^n)}$, with the creation and annihilation operators ${c_i}$ and ${c_i^*}$ mapped to ${\frac{\partial}{\partial x_i}}$ and its conjugate ${(2x_i-\frac{\partial}{\partial x_i}}$ respectively.

The simplest “free field” bosonic Hamiltonian is given by the number operator

$\displaystyle H_0 \,=\, \sum_i c_i^* c_i, \ \ \ \ \ (1)$

which under Bargmann’s isomorphism is mapped to ${\sum_i 2x_i\frac{\partial}{\partial x_i} - \Delta}$. Up to some mild scaling I’m not sure I understand, this is precisely the Liouvillian associated with the Ornstein-Uhlenbeck noise operator that we saw in the previous post!

Here is how hypercontractivity comes in. ${H_0}$ is Hermitian, and using Bargmann’s isomorphism we can identify ${e^{-t H_0}}$ with a positive operator on ${L^2({\mathbb R}^n,\gamma^n)}$. Performing this first step can already provide interesting results as, in the more general case of a Hamiltonian ${H}$ acting on a field with infinitely any degrees of freedom, it lets one argue for the existence of a ground state of ${H}$ by applying the (infinite-dimensional extension of the) Perron-Frobenius theorem to the positive operator ${e^{-tH}}$. But now suppose we already know e.g. ${H\geq 0}$, so that ${H}$ does have a ground state (there is a lower bound on its spectrum), and we are interested in the existence of a ground state for ${H+V}$, where ${V}$ is a small perturbation. This scenario arises for instance when building an interacting field theory as the perturbation of a non-interacting theory, the situation that Nelson was originally interested in. The question of existence of a ground state for ${H+V}$ reduces to showing that the operator ${e^{-t(H+V)}}$ is bounded in norm, for some ${t>0}$. Using ${\|e^{A+B}\|\leq \|e^A e^B\|}$ it suffices to bound, for any ${|\varphi\rangle}$,

$\displaystyle \|e^{-tV}e^{-tH}|\varphi\rangle\|_2 \leq \|e^{-t V}\|_4 \|e^{-tH}|\varphi\rangle\|_4,$

where the inequality follows from Hölder. Assuming ${\|e^{-t V}\|_4}$ is bounded (let’s say this is how we measure “small perturbation” — in the particular case of the quantum field Nelson was interested in this is precisely the natural normalization on the perturbation), proving a bound on ${\|e^{-t(H+V)}\|}$ reduces to showing that there is a constant ${C}$ such that

$\displaystyle \|e^{-tH}|\varphi\rangle\|_4\leq C \||\varphi\rangle\|_2 \ \ \ \ \ (2)$

for any ${|\varphi\rangle}$, i.e. ${e^{-tH}}$ is hypercontractive (or at least hyperbounded) in the ${2\rightarrow 4}$ operator norm. This is precisely what Nelson did, for ${H}$ equal to the number operator (1), i.e. he proved an estimate of the form~(2) for the Ornstein-Uhlenbeck process, later obtaining optimal bounds: hypercontractivity was born! (As Barry Simons recalls in his notes the term “hypercontractive” was only coined a little later, in a paper of his and Hoeg-Krohn.)

Fermionic Hamiltonians and non-commutative ${L^p}$ spaces. This gives us motivation for studying hypercontractivity of operators on ${L^2({\mathbb R}^n,\gamma^n)}$: to establish stability estimates for the existence of ground states of bosonic Hamiltonians. But things start to get even more interesting when one considers fermionic Hamiltonians. Fermions are represented on the antisymmetric Fock space, and for that space there is no isomorphism similar to Bargmann’s. His isomorphism is made possible by thinking of states as functions acting on observables; if the observables commute we obtain a well-defined space of functions acting on the joint spectrum of the observables, leading to the identification with a space of functions on ${{\mathbb R}^n}$, where ${n}$ is the number of degrees of freedom of the system.

But the fermionic creation and annihilation operators anti-commute: they can’t be simultaneously diagonalized. So states are naturally functions on the non-commutative algebra ${\mathcal{A}}$ generated by these operators. Apparently Segal was the first to explore this path explicitly, and he used it as motivation to introduce non-commutative integration spaces ${L^p(\mathcal{A})}$. (As an aside, I find it interesting how the physics suggested the creation of such a beautiful mathematical theory. I used to believe that the opposite happened more often — first the mathematicians develop structures, then the physicists find uses for them — but I’m increasingly realizing that historically it’s quite the opposite that tends to happen, and this is especially true in all things quantum mechanical!) For the case of fermions the canonical anti-commutation relations make the algebra generated by the creation and annihilation operators into a Clifford algebra ${\mathcal{C}}$, and the question of existence and uniqueness of ground states now suggests us to explore the hypercontractivity properties of the associated semigroup (third example in the previous post) in ${L^p(\mathcal{C})}$. This approach was carried out in beautiful work of Gross; the hypercontractivity estimate used by Gross was generalized, with optimal bounds, by Carlen and Lieb. (I recommend the introduction to their paper for more details on how hypercontractivity comes in, and Gross’ paper for more details on the correspondance between the original fermionic Hamiltonian and the semigroup for which hyperocntractivity is needed.)

Quantum information theory. As Christopher King discussed in his talk at BIRS, aside from their use in quantum field theory non-commutative spaces and hypercontractivity are playing an increasingly important role in quantum information theory. This direction was pioneered in work of Kastoryano and Temme, who showed that, just as classical hypercontractivity and its connection with log-Sobolev inequalities has proven extremely beneficial for the fine study of convergence properties of Markov semi-groups in the “classical” commutative setting (cf. my previous post), hypercontractivity estimates and log-Sobolev inequalities for quantum channels could lead to much improved bounds (compared with estimates based on the spectral gap) on the mixing time of the associated semi-groups. In particular Kastoryano and Temme analyzed the depolarizing channel and obtained the exact value for the associated log-Sobolev constant, extending the classical results on the Bonami-Beckener noise operator that I described in the previous post. The depolarizing channel is already very important for applications, including the analysis of mixing time of dissipative systems or certain quantum algorithms such as the quantum metropolis algorithm.

For more applications and recent developments (including a few outliers…), see the workshop videos!

Posted in Conferences, Quantum | | 7 Comments

## Workshop on hypercontractivity and log-Sobolev inequalities in quantum information theory

A couple weeks ago I attended the workshop on “Hypercontractivity and Log Sobolev Inequalities in Quantum Information Theory” organized by Patrick Hayden, Christopher King, Ashley Montanaro and Mary Beth Ruskai at the Banff International Research Station (BIRS).

The pavilion in which workshop talks were held

BIRS holds 5-day workshops in mathematics and applications throughout the year, itself a small component of the Banff centre, which describes itself as the “largest arts and creativity incubator on the planet”. Sounds attractive? The centre appears to be remarkably successful, managing to attract not only the brightest mathematicians like us(!), but also a range of talented artists from the world all over, reputable international conferences, and more. The setting certainly doesn’t hurt (see the pictures — and, by the way, that view was from the center’s beautiful library!), nor does the Canadian hospitality — with hearty food provided daily in a cafeteria with impressive views on the surrounding mountains. It was my first visit and in spite of the somewhat coldish weather (“much warmer than usual”, as they say…you bet! When “usual” means -20 Celsius, it’s still not quite clear what improvement you’re getting). I enjoyed the place, which occupies some sort of in-between a seclusive retreat in the mountains (think Dagstuhl) and a bustling research centre (think the Simons Institute in Berkeley). Good balance.

Aerial view of the Banff center (in summer!)

The goal of this post is to give a brief introduction to the (first half of the) topic of the workshop, “Hypercontractivity and Log Sobolev Inequalities”. (I hope to talk a bit about the second half, “in Quantum Information theory”, in the next post.) Log-Sobolev inequalities, be they quantum or classical, meant next to nothing to me before attending — I had heard of both, and was somewhat familiar with the uses of hypercontractivity in Boolean function analysis, but log-Sobolev inequalities were much more mysterious. The workshop was successful in bringing different perspectives to bear on these inequalities, their relationship, and applications (videos available online!). For my own benefit I’ll write down some very basic facts on these inequalities, where they come from, and what is the motivation for studying them. What I write will undoubtedly appear extremely naive to even moderate experts, and for much more on the subject I highly recommend a survey by Diaconis and Saloff-Coste, the lecture notes by Guionnet and Zegarlinski, and the (quals-fueled) survey by Biswal. In particular I refer the reader to these resources for proofs of the unproven assertions made in this post (any errors introduced being mine of course), as well as appropriate credit and references for the results mentioned below.

Markov semi-groups. The fundamental objects underlying both the hypercontractive and log-Sobolev inequalities are a space of functions ${\mathcal{F}}$ (think continuous real-valued functions defined on a nice topological space, ${\mathbb{R}^d}$, or your favorite graph) together with, first a Markov semi-group ${(P_t)}$ defined on that space, and second, a measure ${\mu}$ on the same space that is invariant under ${P_t}$, i.e.

$\displaystyle \mu(P_t f)\,=\,\mu(f) \ \ \ \ \ (1)$

for all ${f\in\mathcal{F}}$. Now, even the notion of a “Markov semi-group” can be intimidating (to me), but it really corresponds to the most simple process. A Markov semi-group is a family of maps ${(P_t):\mathcal{F}\rightarrow\mathcal{F}}$ indexed by the positive reals ${t\in\mathbb{R}_+}$ that transforms functions in a way that applying the map for time ${t}$, and then time ${s}$, is equivalent to applying it for time ${t+s}$: ${P_{t+s}=P_tP_s}$. Simple examples are e.g. any random walk on the domain of ${\mathcal{F}}$, with ${P_t}$ denoting averaging over inputs obtained by taking a “${t}$-step” walk, or a diffusion process on real functions, such as convolving with a Gaussian of variance ${t}$ or evolving ${f}$ under the heat equation for time ${t}$. Some concrete examples that I’ll refer to throughout:

Examples

Random walks on regular graphs. Let ${G=(V,E)}$ be a regular undirected graph with degree ${d}$, and ${A}$ its normalized adjacency matrix: ${A_{i,j} = 1/d}$ if ${\{i,j\}\in E}$ and ${A_{i,j}=0}$ otherwise. Let ${\mathcal{F} = {\mathbb R}^V}$. If ${f\in \mathcal{F}}$ and ${v\in V}$, then ${(Gf)(v) = E_{u\sim v}[f(u)]}$, where the expectation is taken over a random neighbor ${u}$ of ${v}$ in ${G}$. If ${f}$ is a distribution over ${V}$, then after ${t}$ steps of the discrete-time random walk on ${G}$ the distribution is given by ${G^t f}$. The standard way to make this into a continuous-time random walk is to wait at a vertex ${v}$ for an amount of time distributed as an exponential with mean ${1}$, and then move to a random neighbor. In other words, the number of steps taken in time ${t}$ is distributed as a Poisson point process ${N(t)}$ with rate ${1}$. Starting in an initial distribution ${f}$, the distribution after time ${t}$ is given by

$\displaystyle P_t f = \sum_{k=0}^\infty \Pr(N(t) = k) G^k f = \sum_{k=0}^\infty \frac{t^k}{k!} e^{-t}G^k f = e^{t(G-I)}f.$

The same construction extends to any irreducible Markov chain ${K(x,y)\geq 0}$, ${\sum_y K(x,y)=1}$, for which the associated semigroup is ${P_t f = e^{t(K-I)}f}$.

Ornstein-Uhlenbeck process. Here we take ${\mathcal{F}}$ the set of continuous functions from ${\mathbb{R}}$ to itself, and

$\displaystyle P_tf(x) = \int_{\mathbb{R}} f(e^{-t} x + \sqrt{1-e^{-2t}} y) \frac{e^{-y^2}/2}{\sqrt{2\pi}} dy.$

This describes the evolution of a particle submitted to Brownian motion under the influence of friction. It is a continuous, Gaussian analogue of the Bonami-Beckner noise operator ${T_\rho}$ on the hypercube that crops up in computer science applications, ${T_\rho f(x) = E_{y\sim x} [f(y)]}$ where ${y}$ is obtained from ${x\in\{0,1\}^n}$ by flipping each of its coordinates independently with probability ${p=1-\rho}$. Time-independent evolution. Schrödinger’s equation dictates that a quantum state ${\rho}$ suject to a fixed Hamiltonian ${H}$ will evolve as ${\rho\mapsto \rho(t) = e^{-iHt}\rho e^{iHt}}$ (where I set ${\hbar =1}$ for simplicity). It is easy to check that ${P_t:\rho\mapsto \rho(t)}$ is a Markov semi-group. Note however this semi-group acts on a different kind of space than the previous examples: ${\rho}$ is an operator on a Hilbert space ${\mathcal{H}}$, rather than a function on a certain domain. Dealing with such examples requires non-commutative analogues of the classical theory to which I hope to return in the next post.

Ergodicity. Now that we have a cast of players, let’s see what kind of questions we’d like to answer about them. The most basic question is that of convergence: given a Markov semi-group ${(P_t)}$ defined on a function space ${\mathcal{F}}$, does the sequence ${(P_t f)}$ converge, and if so at what rate? The latter part of this question requires one to specify a distance measure on ${\mathcal{F}}$. For this we will use the ${L^p}$ norms, defined as ${\|f\|_p = (\int |f|^p d\mu)^{1/p}}$ for some measure ${\mu}$ on the domain of functions in ${\mathcal{F}}$. Important special cases are ${p=1,p=2}$ and ${p=\infty}$. Here the measure ${\mu}$ will always be an invariant measure for ${P_t}$, i.e. it satisfies (1). There does not always exist an invariant measure, and there can be more than one; in practice though ${(P_t)}$ and ${\mu}$ are often defined together and for the purposes of this post I will assume there is a unique invariant ${\mu}$ (for the example of a semi-group defined from a Markov chain, this corresponds to the chain being irreducible, and ${\mu}$ is its unique stationary measure). The question then becomes, does ${(P_t f)}$ converge towards its expectation ${\mu(P_t f) = \mu(f)}$ (by invariance), and if so at what speed? This is precisely the question that log-Sobolev inequalities will let us answer.

One says that ${(P_t f)}$ is ${p}$-ergodic if ${\int |P_t f-\mu(f)|^p d\mu \rightarrow_{t\rightarrow\infty} 0}$ for all ${f\in\mathcal{F}}$, i.e. convergence holds in the ${p}$-norm. Note this is equivalent to requiring ${\|P_t-\mu\|_{p,\infty} \rightarrow_{t\rightarrow \infty} 0}$, where ${\|\cdot\|_{p,q}}$ denotes the operator norm from ${L^p}$ to ${L^q}$. Since ${\|f\|_p \leq \|f\|_{p'}}$ for all ${f}$ and any ${p\leq p'}$, the notion of ${p}$-ergodicity becomes stronger as ${p}$ increases. We will see three different ways to obtain bounds on the rate of convergence: the spectral gap inequality, equivalent to ${2}$-ergodicity; Sobolev inequalities and ultra-contractivity, which imply ${\infty}$-ergodicity, and log-Sobolev inequalities and hypercontractivity, which are related to ${p}$-ergodicity for intermediate ${p}$.

Before getting started there is one last quantity associated with any Markov semi-group that we need to introduce, the Liouvillian ${\mathcal{L}}$. The Liouvillian is defined through its action on functions in ${\mathcal{F}}$ as

$\displaystyle \mathcal{L}f = \lim_{t\rightarrow 0^+}\, \frac{P_t-Id}{t}f.$

Equivalently, the Liouvillian can be defined through the functional equation ${\partial_t(P_t f) = \mathcal{L} (P_t f)}$: it plays the role of infinitesimal generator for the semi-group ${(P_t)}$. For the example of the random walk the Liouvillian is simply ${\mathcal{L} = G-I}$. For the Ornstein-Uhlenbeck process one finds ${\mathcal{L}(f)(x)=f''(x)-xf'(x)}$, and for the Bonami-Beckner noise operator ${\mathcal{L} = (1-\rho) (H-I)}$ where ${H}$ is the adjacency matrix of the hypercube. Finally in the example of (time-independent) Schrödinger evolution we have ${\partial_t(\rho(t)) = i [\rho(t),H]}$, so that the Liouvillian is ${\mathcal{L}(\rho) = -i[H,\rho]}$.

Spectral gap inequality and ${L^2}$-ergodicity. A first family of inequalities are spectral gap inequalities, which turn out to be equivalent to ${L^2}$-ergodicity. The spectral gap of an invariant measure ${\mu}$ for the semigroup generated by ${\mathcal{L}}$ is defined as the largest ${\lambda>0}$ (if it exists) such that

$\displaystyle \mu\big(f (-\mathcal{L}) f\big)\,\geq\,\lambda\, \mu(f^2) \ \ \ \ \ (2)$

holds for all ${f}$ such that ${\mu(f)=0}$. In the example of the Markov chain, ${\mathcal{L} = K-I}$ and the inequality can be rewritten as ${E_{x\sim y}[ |f(x)-f(y)|^2] \geq \lambda E_{x,y}[|f(x)-f(y)|^2]}$, where the first expectation is taken over ${x\sim \mu}$ and ${y\sim K(x,\cdot)}$, and the second expectation is over independent ${x,y\sim \mu}$. In particular for random walks on graphs we recover the standard definition of the Laplacian ${L=-\mathcal{L}}$.

Applying (2) to ${P_t f}$ we see that it is equivalent to requiring ${\partial_t \mu((P_tf)^2) \leq -2\lambda \mu((P_tf)^2)}$, which using ${P_0=Id}$ implies the bound ${\mu((P_t f)^2) \leq e^{-2\lambda t} \mu(f^2)}$ for all ${f}$ such that ${\mu(f)=0}$. Applying the appropriate translation the same inequality can be rewritten as

$\displaystyle \mu((P_t f - \mu(f))^2) \leq e^{-2\lambda t} \mu((f-\mu(f))^2),$

which is precisely a ${L^2}$ convergence bound. Conversely, by differentiating the above inequality we recover (2), so that the two are equivalent.

Rather than ${L^2}$ convergence one would often like to derive bounds on convergence in ${L^1}$, or total variation, distance. One way to do this is to use Hölder’s inequality as

$\displaystyle \begin{array}{rcl} \|P_t-\mu\|_{1,\infty}^2&=&\sup_f \|P_t f - \mu(f)\|_1^2 \\ &\leq& \Big(\sup_f \frac{1}{\mu(f)}\Big)\sup_f \mu((P_t f - \mu(f))^2)\\ &\leq& \Big(\sup_f \frac{1}{\mu(f)}\Big)e^{-2\lambda t}. \end{array}$

Thus we get an exponentially decaying bound, but the prefactor could be very large; for instance in the case of a random walk on a graph it would typically equal the number of vertices. Our goal is to do better by using log-Sobolev inequalities. But before investigating those, let’s first see what a Sobolev (without the log!) inequality is.

Sobolev inequalities and ${L^\infty}$-ergodicity. Sobolev inequalities are the “high-${p}$” generalizations of the spectral gap inequality. ${\mu}$ satisfies a Sobolev inequality if there is a ${p>2}$ and constants ${a>0}$, ${b\geq 0}$ such that for all ${f}$ with ${\mu(f)=0}$,

$\displaystyle \|f\|_p^2 \leq a \mu(f (-\mathcal{L}) f) + b \mu(f^2).$

Sobolev inequalities imply the following form of ergodicity:

$\displaystyle \|P_t f - \mu(f)\|_\infty \leq c\big(\frac{a}{2et}+b\big)^\gamma \|f-\mu(f)\|_1,$

where ${c,\gamma}$ are constants depending on ${p}$ only (and ${c\rightarrow \infty}$ as ${p\rightarrow 2}$). If one can take ${b=0}$ (which is often the case) directly gives a polynomial bound on ${\|P_t - \mu\|_{1,\infty}}$ which is incomparable to the one we derived from the spectral gap inequality: on the one hand it does not have the large prefactor, but on the other it only gives polynomial decay in ${t}$. This form of contractivity, in the ${1\rightarrow\infty}$ operator norm, is called “ultracontractivity”.

An important drawback of Sobolev inequalities is that they do not “tensor” well, leading to dimension-dependent bounds. What this means is that if ${P_t}$ acts on functions of a single variable in a certain way, and we let ${P_t^{\otimes n}}$ act on ${n}$-variable functions in the natural way, then the resulting semigroup will in general not satisfy a Sobolev inequality with constants ${a,b}$ independent of ${n}$, even if ${P_t}$ itself did. This contrasts with the spectral gap inequality: it is not hard to see that the spectral gap of ${P_t^{\otimes n}}$ is the same as that of ${P_t}$. This makes Sobolev inequalities particularly hard to derive in the infinite-dimensional settings that originally motivated the introduction of log-Sobolev inequalities (I hope to return to this in the next post).

Log-Sobolev inequalities and ${L^p}$-ergodicity. We finally arrive at log-Sobolev inequalities. We’ll say that ${\mu}$ satisfies a log-Sobolev inequality if there exists constants ${c>0}$, ${d\geq 0}$ such that for all non-negative ${f}$,

$\displaystyle \mu\Big(f^2 \log \frac{f^2}{\|f\|_2^2}\Big) \leq c\mu(f (-\mathcal{L}) f) + d \mu(f^2). \ \ \ \ \ (3)$

Often it is required that ${d=0}$, and the log-Sobolev constant is ${\alpha=1/c}$ where ${c}$ is the smallest positive real for which (3) holds for all ${f}$. A log-Sobolev inequality can be understood as a “weakest” Sobolev inequality, obtained as ${p\rightarrow 2}$; indeed, one can show that if ${\mu}$ satisfies a Sobolev inequality then it also satisfies a log-Sobolev inequality. Gross related the log-Sobolev inequality to hypercontractivity by showing that it is equivalent to the statement that for all ${t>0}$ and all ${2\leq q \leq 1 + e^{4\alpha t} }$ the bound

$\displaystyle \|P_t-\mu\|_{2,q} \leq 1$

holds: ${(P_t)}$ is hypercontractive for the ${2\rightarrow q}$ operator norm if and only if the log-Sobolev constant satisfies ${\alpha \geq \ln(q-1/(4t))}$. One can also show that ${\alpha \leq \lambda/2}$ always holds. But a direct use of the log-Sobolev inequality can give us a better convergence bound than the one we derived from the spectral gap: a more careful use of Hölder’s inequality than the one we made earlier easily leads to the bound

$\displaystyle \|P_t-\mu\|_{1,\infty}^2\,\leq\, 2\log\Big(\sup_f \frac{1}{\mu(f)}\Big) e^{-2\alpha t},$

a marked improvement over the prefactor in the bound derived from ${\lambda}$ alone (provided ${\alpha}$ and ${\lambda}$ are of the same order, which turns out to often be the case).

Aside from this improvement a key point making log-Sobolev inequalities so useful is that, contrary to Sobolev inequalities, they tensor perfectly: if ${\mu_i}$ satisfies a log-Sobolev inequality with coefficients ${(c_i,d_i)}$ for ${i=1,2}$ then ${\mu_1 \otimes \mu_2}$ satisfies one with coefficients ${c=\max(c_1,c_2)}$ and ${d=d_1+d_2}$. Thus for example the standard approach for proving hypercontractivity for the Bonami noise operator on the ${n}$-dimensional hypercube: first, prove the “two-point inequality” for the single-variable case, ${\|T_\rho\|_{p,q} \leq 1}$ for any ${\rho \leq \sqrt{(p-1)/(q-1)}}$. This by itself is a nontrivial statement, but it boils down to a direct calculation. Then use Gross’ equivalence to deduce that ${T_\rho}$ satisfies a log-Sobolev inequality with the appropriate constant ${\alpha}$. Deduce that the ${n}$-dimensional noise operator satisfies a log-Sobolev inequality with the same value of ${\alpha}$, and conclude hypercontractivity of ${T_\rho}$ for general ${n}$: ${\|T_\rho\|_{2,q}\leq 1}$ for all ${2\leq q \leq 1+\rho^{-2}}$.

I hope this got you interested in the topic. I wish I had thought of browsing through the surveys mentioned at the start of the post before the BIRS workshop…Have a look, they’re quite interesting. Next time I’ll try to recall why the second part of the workshop title, “and quantum information theory”…

Posted in Conferences, Talks | Tagged , , | 2 Comments