## Foundations of randomness, remainder of Day 1

Today I will cover the remaining talks from the first day of the workshop. Valerio’s opening talk was followed by a stimulating blackboard talk by Renato Renner, who asked the question “Is the existence of randomness an axiom of quantum physics”? Randomness is usually viewed as entering quantum mechanics through the Born rule, which is stated as an axiom describing the outcome distribution of any given measurement on a quantum state; there is no standard derivation of the rule from other axioms. This is deeply unsatisfying, as the rule appears completely arbitrary, and indeed many attempts have been made at deriving it from more basic structural constraints such as the Hilbert space formalism underlying Gleason’s theorem, or assumptions based on decision theory (Deutsch) or logic (Finkelstein).

Renato and his student Daniela Fraushinger have a proposal whose main thrust consists in demarcating what could be called the “physical substrate” of a given theory (such as quantum mechanics) from its “information-theoretic substrate”. In short, the physical substrate of a theory governs which experiments are “physical”, or possible; it is a completely deterministic set of rules. The information-theoretic substrate provides an additional layer which considers assignments of probabilities to different experiments allowed by the physical substrate, and specifies rules that these probabilities should obey.

Let me try to give a little more detail, as much as I can recall from the talk. Let’s call the framework suggested by Renato the “FR proposal”. The first component in the proposal is a definition of the physical substrate of a theory. In the FR proposal a theory bears on objects called stories: a story is any finite statement describing an experiment, where “experiment” should be interpreted very loosely: essentially, any sequence of events with a set-up, evolution, and outcome. For instance, a story could be that a pendulum is initialized in a certain position, left to swing freely, and after time ${t}$ is observed and found to be in a certain position. Or it could specify that a quantum particle is initialized in a certain state, say ${\ket{+}=\frac{1}{\sqrt{2}}(\ket{0}+\ket{1})}$, then measured in the computational basis ${\{\ket{0},\ket{1}\}}$, and that the outcome ${\ket{0}}$ is observed. Thus a story refers to a physical system and makes claims about observable quantities of the system.

Thus the physical substrate of the theory is a set of rules that completely determine which stories are “physical”, i.e. allowed by the theory, and which are not. For instance, the second story described above is a valid story of quantum mechanics: it is certainly possible that the outcome ${\ket{0}}$ is obtained when the measurement described is performed. On the other hand quantum mechanics would rule out a story such as “the state ${\ket{0}}$ is measured in the computational basis and the outcome ${\ket{1}}$ is obtained”: this is simply not allowed. Note that up to this point the theory makes no claim at all on the “probability” of a story occurring; it just states whether the story is possible or not.

So much for the physical substrate. How do probabilities come in? This is the goal of the information-theoretic substrate of the theory. To introduce probabilities we consider mappings from stories to ${[0,1]}$. Any such mapping can be thought of as a measure of an observer’s state of knowledge, or uncertainty, about any particular story (much more on this and other interpretations of probabilities will be discussed in later talks, such as Carl Hoefer’s coming up next). A priori the mapping could be arbitrary, and the role of the information-theoretic substrate of the theory is to specify which mappings are allowed, or “reasonable”. For instance, in the measurement example from earlier most observers would assign that particular story a probability of ${50\%}$, which is what the Born rule predicts. But without taking the rule for granted, on what basis could we assert that ${50\%}$ is the only “reasonable” probability that can be assigned to the story — why not ${10\%}$, or even ${100\%}$?

The main claim that Renato put forward is that the Born rule can be derived as a consequence of the following two broad postulates: (I apologize for the description of the postulates being rather vague; there are points about the formulation that I am unclear about myself!)

1. The repetition postulate: experiments can be repeated, and if so lead to the same outcomes.
2. The symmetry postulate: certain experiments have symmetries under which the outcome of the experiment is invariant. For example, an experiment which involves flipping three different coins in sequence and reporting the majority outcome is not affected by the order in which the coins are flipped.

The idea is that these postulates should be much easier to accept than the Born rule: certainly, if we want to say anything meaningful about certain probabilities being reasonable or not we ought to enforce some form of compatibility rules, and the two above sound rather reasonable themselves. The first seems a prerequisite to even make sense of probabilities, and the second follows the tradition started by de Finetti of basing the emergence of probability on indistinguishability of the outcomes of an experiment under certain symmetries of the experiment.

Unfortunately there does not yet seem to be a preprint available that describes these ideas in detail; hopefully it is coming soon. I’m quite curious to see how the proposal, that I find rather appealing, can be formalized.

After Renato’s talk — and an excellent lunch — we were treated to an intriguing talk on “Chance laws and quantum randomness” by Carl Hoefer. The talk addressed questions that we computer scientists or physicists rarely discuss openly (and indeed many of us might reserve for dreamy Sunday nights or inebriated Friday evenings). In Carl’s own words: “When we ascribe a primitive (irreducible) chance propensity, or postulate intrinsically random/chancy fundamental laws of nature, do we know what we’re saying?”. Or, less poetically put (but still in Carl’s words — Carl, thanks for making it accessible to us!): What does a claim such as “${\Pr(\text{spin-z-up} | \text{spin-x-up-earlier}) = 0.5}$” mean, and how is it different from “${\Pr(\text{spin-z-up} | \text{spin-x-up-earlier}) = 0.7}$”?

Let the question a chance to sink in — a good answer is not that obvious! It already arose in Renato’s talk, where probabilities were introduced as the reflection of an agent’s subjective belief. Well, this is the first of a number of possible answers to the question that Carl took head on and debunked, one slide at a time. (I am not sure how much I agree with Carl’s arguments here, but at least he showed us that there was certainly room for debate!) Other possible answers considered by Carl include probability as the reflection of an objective frequency for the occurrence of a certain event, or simply a “primitive fact” that would require no explanation. Needless to say, he found none of these interpretations satisfactory. It’s a good exercise to find arguments against each of them; see Carl’s slides for elements of answer (see also the article on Interpretations of probability on the Stanford Encyclopaedia of Philosophy for an extensive discussion).

A Galton board

So what is the speaker’s suggestion? I doubt my summary will make it much justice, but here goes: according to Carl probabilities, or rather “chance laws”, arise as a form of statistical averaging; in particular probabilistic statements can be meaningful irrespective of whether nature is intrinsically deterministic or irreducibly probabilistic. Specifically, Carl observes that many physical processes tend to generate a distribution on their observable outcomes that is robust to initial conditions. For instance, a Galton board will lead to the same Gaussian distribution for the location of the ball, irrespective of the way it is thrown in (except perhaps for an exceptional set of initial conditions of measure zero). Whether the initial conditions are chosen deterministically or probabilistically we can meaningfully state that “there is ${x\%}$ chance that the ball will exit the board at position ${z}$”.

Carl concluded his talk by tying it to the Bohmian interpretation of quantum mechanics — almost, though perhaps not quite, going as far as to frame deterministic Bohmian mechanics as the most reasonable way to interpret probability in quantum mechanics. I found the discussion fascinating: I had never considered Bohmian mechanics (or, for that matter, the question of the origin of probabilities in quantum mechanics) seriously, and Carl made it sound, if not natural, at least not wholly unreasonable! In particular he made a convincing case for the possibility of ascribing meaning to “probability” in a fully deterministic world, without having to resort to the need for “subjective beliefs” or related rabbits such as high sensitivity to initial conditions — indeed, here it is quite the opposite.

The last talk of the day was delivered on the whiteboard, by Marcin Pawlowski. Marcin gave an intuitive derivation of the quantum bound on device-independent randomness generation from the CHSH inequality, the famous (at least for those of us versed in these things… see Appendix A.1 here for a derivation)

$\displaystyle H_\infty(A|X)\geq 1-\log_2\Big(1+\sqrt{2-\frac{I^2}{4}}\Big) ,$

based solely on a monogamy bound due to Toner. Toner introduced a “two-out-of-three” variant of the CHSH game in which there are three players Alice, Bob and Charlie, but the game is played either between Alice and Bob, or Alice and Charlie, chosen uniformly at random by the referee. Although in principle the maximum success probability of the players in this game could be as high as that of the two-player CHSH (since in the end only one instance of the game is played, the third player being ignored), Toner showed that the fact that Alice did not know a priori whom of Bob and Charlie she was going to play with places strong constraints on her ability to generate non-local correlations with at least one of them. Quantitatively, he showed that the players’ maximum success probability in this game is precisely ${3/4}$, whether the players are allowed entanglement or not: the presence of the third player completely eliminates the quantum advantage that can be gained in the usual two-player CHSH game!

The derivation presented by Marcin is simple but enlightening (good exercise!). It could be considered the “proof from the book” for this problem, as it makes the intuition fully explicit: the reason Alice’s outcomes are unpredictable, from the point of view of Eve (=Charlie), is that they are strongly correlated with Bob’s outcomes (up to the Tsirelson bound); monogamy implies that correlations with Eve’s outcomes could not simultaneously be as strong.

Typical view from a Stellenbosch winery

That’s it for Day 1 – time to relax, enjoy a good bottle of local wine, and get ready for the second day!

Posted in Conferences, device independence, Quantum, Science, Talks | | 1 Comment

## Foundations of randomness, Day 1: Scarani

As mentioned in the previous post, a highlight of the project I participated in over the past five weeks was the three-day workshop that we organized at STIAS towards the end of October.

The purpose of the workshop was to bring together a small group of computer scientists, physicists, mathematicians and philosophers around the broad theme of “the nature of randomness”. Each of our respective fields brings its own angle on the problem, with a specific language for formulating associated questions, judging which are most important, and a set of techniques to approach them. In computer science we use entropy, complexity theory and pseudo-randomness; physicists have access to randomness through the fundamental laws of nature; philosophers introduce subtle distinctions between probability, chance, and intrinsic or subjective randomness. The presence of any such framework is a prerequisite to progress: it enables to formulate research problems, develop appropriate methodologies, and obtain results. But any framework also sets implicit boundaries and runs a risk of turning into a dogma, according to which “valid” or “important” questions and results are judged. From my perspective, if there is any respect in which I would dare deem the workshop a success it is precisely in the barriers it helped bring down. The wide-ranging talks and participants forced me to take a broad perspective on randomness and question some of the most deeply ingrained “certainties” that underlie my research (such as a firm — though baseless — belief that the Copenhagen interpretation is the only reasonable interpretation of quantum mechanics worthy of a practicing scientist). This questioning was only made possible thanks to the outstanding effort put in by all participant to deliver both accessible and stimulating talks as well as to actively engage with each other; I cannot thank them enough for making the workshop such an enlightening and interactive experience.

It seems impossible to do any justice to the ideas developed in each of the talks by discussing them here, and in doing so I am bound to misrepresent many of the speaker’s thoughts. For my own benefit I will still make an attempt at listing a few “take-away” messages I wish to keep; obviously all idiocies are mine, while any remaining insight should be attributed to the proper speaker.

The opening talk was delivered by Valerio Scarani, who started us off with a personal take on the history, and future, of “Quantum randomness and the device-independent claim“. The “irreducible presence” (much more on this “irreducibility” will be discussed in later talks!) of randomness in quantum mechanics is usually attributed to the Born rule, which states that the probability of obtaining outcome ${x}$ when measuring state ${|\psi\rangle}$ in a basis containing a unit vector ${|e_x\rangle}$ labeled by ${x}$ is given by the squared modulus of the inner product ${\langle e_x |\psi\rangle|^2}$. Valerio pointed out the (well-known, but not to me) fact that the introduction of this precise quantitative formulation of the rule came almost as an afterthought to Born; indeed the use of the square appears as a footnote to the main text in Born’s paper.

Extract from Born’s paper

(Incidentally, see this paper by Aaronson for interesting consequences of instead using a ${p}$-th norm for ${p\neq 2}$ in defining the rule.) I was somewhat puzzled by this: Born’s paper is dated 1926, a time when quantum mechanics was already a well-established theory, on which scores of physicists relied to do what physicists do — make calculations, formulate predictions, perform experiments, and repeat! But how can one make predictions if there is no rule governing which outcomes are to be obtained? My confusion likely stems from a basic misrepresentation of how physicists work (as may already be clear from the preceding description!); after clarifications I can offer two explanations: First, quantum mechanics was mostly used to compute properties of eigenstates of a given Hamiltonian, for whom properties which commute with the Hamiltonian always take on determined values, so that no probabilities are needed. Second, it is quite rarely the case (or at least was in the 1930s) that the outcome distribution of a fine-grained measurement on a single quantum mechanical system is accessible to experiment; instead only statistical properties of larger systems — such as the energy — are accessible, and the law giving the expectation value taken by a particular observable, ${\langle\psi|O|\psi\rangle}$, can be stated and used without bothering as to what it would mean to “measure in the eigenbasis of ${O}$” and obtain this or that measurement outcome: only the average has any experimental relevance. In any case, to close this parenthesis it is interesting to note that even as we now think of probabilities, and the “squared-modulus rule”, as what makes the specificity of quantum mechanics, the success of the theory initially had (and probably still has) little to do with it. In fact, it may have had quite the opposite effect, and induced a serious amount of confusion…

Indeed Einstein’s immediate reaction to Born’s paper is immortalized in his famous letter to Born from November 1926:

Quantum mechanics calls for a great deal of respect. But some inner voice tells me that this is not the right track. The theory offers a lot, but it hardly brings us closer to the Old One’s secret. For my part at least, I am convinced He doesn’t throw dice.

And thus started one of the most obscure periods in human thought… A famous anecdote along the route (though again, this was news to me) is given by von Neumann’s valiant attempt at establishing a rigorous “no-go” theorem for “hidden variables”, thereby proving the unavoidable resort to some form of uncertainty. Unfortunately it turned out that von Neumann’s theorem placed unreasonable assumptions on the hidden variable models to which it applied, making it all but inapplicable. Pressing ahead, skipping over Einstein, Podolsky and Rosen’s unfortunate account of (non-)locality in quantum mechanics, it is only with Bell’s 1964 paper demonstrating that the existence of local hidden variable theories for quantum mechanics could be experimentally tested that the debate was finally placed on firm — or, at least, in principle decidable — grounds. Indeed the major contribution of Bell’s work was to establish observable consequences for the existence of any hidden variable model for quantum mechanics — under the important assumption, to be further discussed in the talk by Ruediger Schack, that the model be local. Bell’s result had an immediate impact, prompting one of the most interesting experimental challenges in the history of physics. This is because there was no consensus as to what the outcome of the experiment “should” be, and proponents of a breakdown of quantum mechanics (at least in the regime concerned in Bell’s test and its subsequent refinement by Clauser et al.) stood on an equal footing with advocates of fundamental laws of nature that are non-local and probabilistic (of course either camp was dwarfed by the much larger number of physicists who simply considered the question irrelevant for the practice of their work… luckily for the progress of science, which would not have benefited much from stalling for two decades). And indeed early experiments went both ways, with “conclusive” results pointing in either direction obtained on the West and later East coasts of the US. The quest was brought to a climactic conclusion with Aspect’s experiments in Orsay in 1981-82, which conclusively handed the podium over to the Bell camp (how conclusively, however, you can judge from the amount of — justified — attention generated by the most recent experiments in Delft).

The history is fascinating (anyone can suggest a good, entertaining, informed and opinionated account?, but we should move on. Following the speaker’s lead, let me scoot ahead to modern times and the birth of device independence. This is usually attributed to the 1998 “UFO” by Mayers and Yao, who were the first to make a fundamental observation: that a complete quantum mechanical description of a black-box device, including its state and the measurements it performs on the state, can, in some cases, be inferred from the classicalinput-output measurement statistics it provides — provided one is willing to make one structural assumption on the device: that it has a certain bipartite structure, i.e. is composed of two systems on which certain (unknown) measurements can be independently performed (in their paper Mayers and Yao call this a “conjugate coding source”). The reason I qualify this result of “UFO” is that, although the motivation behind it is clear (obtain a proof of security for BB’84 that would be robust to imperfections in Alice’s photon generation device), the technique has no precedent. It is certainly natural to attempt to certify Alice’s device by performing some kind of tomography, but from there to seeing how a bipartite structure could be leveraged to obtain the result in such strong terms! (Note that, without a priori information on either the state or the measurements being performed, “tomography” cannot lead to any meaningful conclusion.) Certainly the authors must have been influenced by their intense ongoing work in proving security of QKD, and the intuition obtained from the security proofs. Still — quite a feat!

A second respect in which the Mayers and Yao paper can be qualified of “UFO” is that in spite of the conceptual breakthrough that we now recognize it for it was all but ignored following the years after its publication. One reason may be that it was published in the most obscure of conferences (proceedings of FOCS, anyone?). A more reasonable explanation is that the math in the paper is hard to follow: the notation is rather heavy and little intuition is given. From a QKD practitioner’s point of view, this is completely foreign territory; the techniques used bear very little resemblance to the techniques used at the time to prove security of QKD protocols and analyze their performance. Finally the result does suffer from one important drawback, which is that it is non-robust: Mayers and Yao were only able to characterize devices which perfectly reproduce the required correlations, but did not give any statement that tolerates ${\varepsilon}$‘s.

The second birth (in terms of explosion of attention and interest) of device independence can be attributed to the 2005 paper by Barrett, Hardy and Kent. Although I cannot speak for them my impression is that the main motivation for their work was foundational; indeed the central question they ask is the extent to which the correlations that are checked by the users in all QKD protocols are sufficient by themselves: that is, can quantum mechanics be thrown out, and security derived only from these correlations? After all, from the point of view of the users the protocol is completely classical, and its security can also be defined classically, as the maximum probability with which an eavesdropper can “guess” the key that is eventually produced. Thus even if an implementation of the protocol may rely on quantum mechanical devices, its mode of operation, and security, can be formulated purely using statements about classical random variables. Barrett et al. showed that security could indeed be established in a model that is broader than quantum mechanics, assuming only the existence of joint probability distributions describing the input-output behavior of the devices and a potentail adversary, and the assumption that these distributions satisfy the no-signaling principle: outputs of either of the three systems do not depend on inputs chosen by any other party. Their work builds upon a strong line of research in non-locality (e.g. the introduction of the famous “PR boxes” by Popescu and Rohrlich a decade earlier), and it had one immense merit, which helps explain its impact: by completely “stripping out” all the messy details that obscured existing security proofs for QKD they managed to restore the intuition behind the security of QKD (see Ekert’s paper for a beautifully succinct formulation of that intuition), showing how that intuition could be formalized and lead to an actual proof of security — and this, under arguably stronger terms than the existing ones! Thus the link between Bell violation and security was finally expressed in its purest form. The beautiful paper was immediately followed by an intense line of works exploring its consequences, deriving device-independent proofs of security for QKD which achieved better and better rates under weaker and weaker assumptions (for instance on noise tolerance, or on the types of attacks allowed of the adversary). This is a very long story, with much more to say (check out the fourth slide from Valerio’s talk!), but once again it is time to move on.

In the second part of his talk Valerio gave us his four main “concerns” regarding device independence. In Valerio’s opinion these “concerns” are important assumptions that underlie device-independent cryptography — assumptions that are reasonable as they are recognized as such, but could undermine the field if not made explicit and discussed upfront.

The four concerns are the following:

1. No-signaling (the physicist’s concern): It is usually claimed that device-independent protocols are secure “provided the no-signaling condition is satisfied”. But what does this mean? That no information can travel in-between the two devices within the amount of time that elapses between the moment at which the choice of basis is made, and the moment at which the device produces an outcome. There is a fundamental ambiguity here: how is the time of “basis choice” defined? Does it correspond to the moment when the experimenter first sets her mind on a particular setting (whether and how much “liberty” she really has in this choice is a different issue, the so-called “free will loophole”)? Or does it correspond to the moment at which the device is “informed” of that choice, as the experimenter presses the appropriate button? Or is it really only when the information reaches the photon polarizer located inside the device? The same questions can be asked for the time of “outcome production”: when exactly is the outcome determined? When the photon hits the receptor? When the outcome is displayed on the device? When the experimenter stares at it? Once more this is a subtle issue, related to the measurement problem in quantum mechanics and which components of the system are modeled as quantum and potentially in coherent superposition, and which are classical and “decohered”. Going from the one to the other requires an application of the Born rule, and it is not clear at what level this should take place.
2. Input randomness (the information-theorist’s concern): This is an important point, which I think is now well-understood but was certainly not when procedures for device-independent randomness certification were first being discussed. The question is how the randomness present in the input to the devices should be quantified — succinctly, “random to whom”? The crucial distinction is as follows: the input should be random-to-device, whereas the output should be random-to-adversary. That it is possible to transform the one to the other is, to a great extent, the miracle of device-independent randomness certification: even if the inputs are completely known to a third party, and even if that party is quantum and shares entanglement with the devices, as long as the inputs are chosen independently from the devices (very concretely, their joint state, when all other parties are traced out, is in tensor product form), the outputs produced are independent from the adversary (same formalization, but now tracing out the devices). For those interested in this issue I highly recommend having a look at the “equivalence lemma” of Chung, Shi and Wu (Lemmas 2.2 and 5.1 here).
3. Detection loophole (the hacker’s concern): arguably this is the most openly discussed of Valerio’s four concerns, and I won’t get into it too much. The problem is that current state-of-the-art photon receptors have low efficiency, and more often than not fail to detect an incoming photon. Thus in order to claim that the statistics observed in an experiment demonstrate the violation of a Bell inequality one has to resort to the “fair sampling assumption”, which states that no-detection events occur independently from the basis choice provided to the device. For most experiments this assumption seems reasonable, but in the context of device independence, where one often claims that the devices “have been prepared by the adversary”, it is certainly not so. Nevertheless we may hope that continued experimental progress (or different measurement techniques, such as the use of diamonds in the Delft experiment) will eventually eliminate this loophole.
4. (In)determinism (the philosopher’s concern): what if randomness simply doesn’t exist? Two of the three most widely debated attempts at making sense of probabilities in quantum mechanics, the many-worlds interpretation and Bohmian mechanics (the third being the Copenhagen interpretation), posit determinism at some level: in many-worlds, there is determinism for a being who can observe all universes; in Bohmian mechanics there is determinism for a “nonlocal” being who has access to all hidden variables. As Valerio pointed out, however, these need not be serious issues, as what we really care is that the randomness in the outputs produced, or the security of the key, are measured with respect to a being in our universe: indeed it should not be of any concern that the random bits are “determined” by some hidden variables if access to these hidden variables requires prior knowledge about the whole state of the universe.

There are a lot of valuable take-home messages from the talk. Valerio ended his talk by making the observation that while the violation of Bell inequalities implies the presence of randomness the converse is not necessarily true… This in turn prompts me to ask what really are the fundamental assumptions that are necessary and sufficient to guarantee randomness. Device independence, as it is now understood, relies on locality: if it can be assumed that certain events take place at positions that are isolated in space-time then certified fountains of randomness follow. But what if we are unsatisfied with the assumption — how necessary is it really? An interesting variation is to certify randomness based on the contextuality of quantum mechanics; this is a challenging setting both from an experimental and theoretical points of view and I expect it to receive more attention in the future. But there are possibilities that take us in completely different directions; as an obvious example pseudo-randomness offers the possibility of certifying randomness based on computational assumptions. Are we on the right track — what if we were to take the irreducible presence of randomness as an assumption, what consequences would follow? (See Yevgeniy Dodis’ talk, to be discussed later, for some consequences in cryptography).

Posted in Conferences, device independence, Talks | | 2 Comments

## The nature of randomness and fundamental physical limits of secrecy

The entrance to STIAS

I spent the past five weeks at the Stellenbosch Institute for Advanced Study (STIAS), a research institute located in beautiful Stellenbosch, 40km North-West of Cape Town. I was participating in a program (“program” may be an ambitious term for a five-participant endeavour — let’s call it a “project” instead), a project then, organized around the theme of The nature of randomness and fundamental physical limits of secrecy. The project was initiated by Artur Ekert, who formulated its theme and secured the support of STIAS. It was a great experience, and I’d like to write a few posts about it. To set the stage I’ll get us started by writing a few things about Stellenbosch and STIAS. The following posts will be more scientifically focused; in particular I’ll try to cover some of the stimulating talks that we were lucky to hear as part of a three-day workshop that was held a couple weeks ago and formed the organ point of the prorgram — no, the project.

Stellenbosch university

STIAS is a very interesting place. It is a small research institute nested within the large and airy campus of Stellenbosch University, known locally as “Maties” (don’t ask), one of the best universities in Africa. The spirit of STIAS is comparable to that of the IAS in Princeton, although it is far smaller: there is a single two-floor building in which I counted 22 offices; as far as I know aside from the director and administrative staff there is no permanent member. The scientific thrust of the institute is provided by its “fellows”, who typically spend a ~4-month period there (our project was thus shorter than average). Fellows are simply individuals who spend some research time at the institute, either on leave from their own institute or as part of a research project carried out with local collaborators. Concretely the institute provides office space, administrative support, and a stipend to cover local expenses. Much beyond this however STIAS provides a wonderful, relaxed and supportive environment in which to carry out research; a place which could easily top anyone’s list for best location to carry out a sabbatical leave.

Favorite study spot

Concretely, STIAS has two main attractions. The first is the local fauna — the STIAS fellows. During my stay among us could be found two philosophers, two biologists, a writer, two sociologists and three historians; not to mention the small group of eccentric physicists who insisted on dragging a whiteboard back and forth between the coffee machine, the sofas and the patio. The second attraction are the lunches. The institute staff cultivates its own little garden and prepares wonderful quiches, pies, salads, and, obviously, cakes, panna cotta, mousses and other culinary treats daily. Bring the two together and you end up with a unique epicurean experience, with the group of fellows gathering every day around a leisurely but stimulating lunch fuelled as much by the delicious food as by t
he wide-ranging conversation. (Note how the recipe closely parallels, for all I can tell, that of the IAS. Admittedly STIAS doesn’t have the cookies — but it does
offer an extensive wine & cheese reception following the “fellow’s seminar” held every
Thursday).

Lunch at STIAS

To set a broader stage, it may be worth mentioning that, as I am sure its director Hendrik Dreyer would not contradict, the culture of STIAS is not unaffected by Stellenbosch’s status as the heart of South African wineland. Indeed the surrounding hills are planted with the second oldest (1679!) vines of South Africa, right after the nearby Constantia and home of the famous pinotage variety of grapes, a cross-product of Pinot and Hermitage which brings out wines of a spirit that could be compared to the best frmo Burgundy. The city itself is rather quaint, populated by a mix of locals, tourists and students. The latter ensure the presence of a healthy coffee culture — and here I cannot help but mention the blue crane and butterfly, purveyor of my daily morning cortado and a highlight of my stay here. The surrounding landscape, dominated by beautiful mountains culminating in wide plateaus, is filled with wineries competing for the best view, lunch spot, and of course, wine tasting. Having been wine-educated in the best snobbish French tradition I was truly impressed by the quality of local wines; most of those I tried were quite balanced and subtle. Moreover, they are enjoyed as wine should: without pretension, and with good food and company! A pleasure that is not limited by the very affordable prices, with the good bottles being priced similarly to the daily liter of red in France, Italy or Spain, well under 10 euro. (I do realize that the notion of “affordable” is a complicated one here, South Africa being one of the countries in the world with the highest income inequality. At a political level our stay was dominated by intense student protests across South African universities in reaction to a sudden hike in registration fees decided by the government, which eventually had to back down.)

But enough tourism, let’s get to the science. In the next post we’ll start exploring “The nature of randomness and fundamental physical limits of secrecy”!

## Hypercontractivity in quantum information theory

In this post I’d like to briefly mention some connections between log-Sobolev inequalities, and more specifically hypercontractivity, with quantum mechanics and quantum information. To anyone interested I recommend the video from Christopher King’s excellent overview talk at the BIRS workshop. The talk includes a nice high-level discussion of the origins of QFT (roughly 25 minutes in) that I won’t get into here.

Existence and uniqueness of ground states in quantum field theories. The very introduction of (classical!) hypercontractivity, before the name was even coined, was motivated by stability questions in quantum field theory. This approach apparently dates back to a paper by Nelson from 1966, “A quartic interaction in two dimensions” (not available online! If anyone has a copy…my source on this are these notes by Barry Simon; see section 4). Nelson’s motivation was to establish existence and uniqueness of the ground state of an interacting boson field theory that arose as the natural quantization of a certain popular classical field theory. Let ${H}$ be the resulting Hamiltonian. ${H}$ acts on the Hilbert space associated with the state space of a bosonic field with ${n}$ degrees of freedom, the symmetric Fock space ${\mathcal{F}_n}$.

The main technical reason that make bosonic systems much easier to study than fermionic ones is that bosonic creation and annihilation operators commute. In the present context this commutativity manifests itself through the existence of an isomorphism, first pointed out by Bargmann, between ${\mathcal{F}_n}$ and the space ${L^2({\mathbb R}^n,\gamma^n)}$, where ${\gamma}$ is the Gaussian measure on ${n}$. The connection arises by mapping states in ${\mathcal{F}_n}$ to functions on local observables that lie in the algebra generated by the bosonic elementary creation and annihilation operators. Since these operators commute (when they act on distinct particles), the space of such functions can be identified with the space of functions on the joint spectra of the operators; doing this properly (as Bagmann did) leads to ${L^2({\mathbb R}^n,\gamma^n)}$, with the creation and annihilation operators ${c_i}$ and ${c_i^*}$ mapped to ${\frac{\partial}{\partial x_i}}$ and its conjugate ${(2x_i-\frac{\partial}{\partial x_i}}$ respectively.

The simplest “free field” bosonic Hamiltonian is given by the number operator

$\displaystyle H_0 \,=\, \sum_i c_i^* c_i, \ \ \ \ \ (1)$

which under Bargmann’s isomorphism is mapped to ${\sum_i 2x_i\frac{\partial}{\partial x_i} - \Delta}$. Up to some mild scaling I’m not sure I understand, this is precisely the Liouvillian associated with the Ornstein-Uhlenbeck noise operator that we saw in the previous post!

Here is how hypercontractivity comes in. ${H_0}$ is Hermitian, and using Bargmann’s isomorphism we can identify ${e^{-t H_0}}$ with a positive operator on ${L^2({\mathbb R}^n,\gamma^n)}$. Performing this first step can already provide interesting results as, in the more general case of a Hamiltonian ${H}$ acting on a field with infinitely any degrees of freedom, it lets one argue for the existence of a ground state of ${H}$ by applying the (infinite-dimensional extension of the) Perron-Frobenius theorem to the positive operator ${e^{-tH}}$. But now suppose we already know e.g. ${H\geq 0}$, so that ${H}$ does have a ground state (there is a lower bound on its spectrum), and we are interested in the existence of a ground state for ${H+V}$, where ${V}$ is a small perturbation. This scenario arises for instance when building an interacting field theory as the perturbation of a non-interacting theory, the situation that Nelson was originally interested in. The question of existence of a ground state for ${H+V}$ reduces to showing that the operator ${e^{-t(H+V)}}$ is bounded in norm, for some ${t>0}$. Using ${\|e^{A+B}\|\leq \|e^A e^B\|}$ it suffices to bound, for any ${|\varphi\rangle}$,

$\displaystyle \|e^{-tV}e^{-tH}|\varphi\rangle\|_2 \leq \|e^{-t V}\|_4 \|e^{-tH}|\varphi\rangle\|_4,$

where the inequality follows from Hölder. Assuming ${\|e^{-t V}\|_4}$ is bounded (let’s say this is how we measure “small perturbation” — in the particular case of the quantum field Nelson was interested in this is precisely the natural normalization on the perturbation), proving a bound on ${\|e^{-t(H+V)}\|}$ reduces to showing that there is a constant ${C}$ such that

$\displaystyle \|e^{-tH}|\varphi\rangle\|_4\leq C \||\varphi\rangle\|_2 \ \ \ \ \ (2)$

for any ${|\varphi\rangle}$, i.e. ${e^{-tH}}$ is hypercontractive (or at least hyperbounded) in the ${2\rightarrow 4}$ operator norm. This is precisely what Nelson did, for ${H}$ equal to the number operator (1), i.e. he proved an estimate of the form~(2) for the Ornstein-Uhlenbeck process, later obtaining optimal bounds: hypercontractivity was born! (As Barry Simons recalls in his notes the term “hypercontractive” was only coined a little later, in a paper of his and Hoeg-Krohn.)

Fermionic Hamiltonians and non-commutative ${L^p}$ spaces. This gives us motivation for studying hypercontractivity of operators on ${L^2({\mathbb R}^n,\gamma^n)}$: to establish stability estimates for the existence of ground states of bosonic Hamiltonians. But things start to get even more interesting when one considers fermionic Hamiltonians. Fermions are represented on the antisymmetric Fock space, and for that space there is no isomorphism similar to Bargmann’s. His isomorphism is made possible by thinking of states as functions acting on observables; if the observables commute we obtain a well-defined space of functions acting on the joint spectrum of the observables, leading to the identification with a space of functions on ${{\mathbb R}^n}$, where ${n}$ is the number of degrees of freedom of the system.

But the fermionic creation and annihilation operators anti-commute: they can’t be simultaneously diagonalized. So states are naturally functions on the non-commutative algebra ${\mathcal{A}}$ generated by these operators. Apparently Segal was the first to explore this path explicitly, and he used it as motivation to introduce non-commutative integration spaces ${L^p(\mathcal{A})}$. (As an aside, I find it interesting how the physics suggested the creation of such a beautiful mathematical theory. I used to believe that the opposite happened more often — first the mathematicians develop structures, then the physicists find uses for them — but I’m increasingly realizing that historically it’s quite the opposite that tends to happen, and this is especially true in all things quantum mechanical!) For the case of fermions the canonical anti-commutation relations make the algebra generated by the creation and annihilation operators into a Clifford algebra ${\mathcal{C}}$, and the question of existence and uniqueness of ground states now suggests us to explore the hypercontractivity properties of the associated semigroup (third example in the previous post) in ${L^p(\mathcal{C})}$. This approach was carried out in beautiful work of Gross; the hypercontractivity estimate used by Gross was generalized, with optimal bounds, by Carlen and Lieb. (I recommend the introduction to their paper for more details on how hypercontractivity comes in, and Gross’ paper for more details on the correspondance between the original fermionic Hamiltonian and the semigroup for which hyperocntractivity is needed.)

Quantum information theory. As Christopher King discussed in his talk at BIRS, aside from their use in quantum field theory non-commutative spaces and hypercontractivity are playing an increasingly important role in quantum information theory. This direction was pioneered in work of Kastoryano and Temme, who showed that, just as classical hypercontractivity and its connection with log-Sobolev inequalities has proven extremely beneficial for the fine study of convergence properties of Markov semi-groups in the “classical” commutative setting (cf. my previous post), hypercontractivity estimates and log-Sobolev inequalities for quantum channels could lead to much improved bounds (compared with estimates based on the spectral gap) on the mixing time of the associated semi-groups. In particular Kastoryano and Temme analyzed the depolarizing channel and obtained the exact value for the associated log-Sobolev constant, extending the classical results on the Bonami-Beckener noise operator that I described in the previous post. The depolarizing channel is already very important for applications, including the analysis of mixing time of dissipative systems or certain quantum algorithms such as the quantum metropolis algorithm.

For more applications and recent developments (including a few outliers…), see the workshop videos!

Posted in Conferences, Quantum | | 7 Comments

## Workshop on hypercontractivity and log-Sobolev inequalities in quantum information theory

A couple weeks ago I attended the workshop on “Hypercontractivity and Log Sobolev Inequalities in Quantum Information Theory” organized by Patrick Hayden, Christopher King, Ashley Montanaro and Mary Beth Ruskai at the Banff International Research Station (BIRS).

The pavilion in which workshop talks were held

BIRS holds 5-day workshops in mathematics and applications throughout the year, itself a small component of the Banff centre, which describes itself as the “largest arts and creativity incubator on the planet”. Sounds attractive? The centre appears to be remarkably successful, managing to attract not only the brightest mathematicians like us(!), but also a range of talented artists from the world all over, reputable international conferences, and more. The setting certainly doesn’t hurt (see the pictures — and, by the way, that view was from the center’s beautiful library!), nor does the Canadian hospitality — with hearty food provided daily in a cafeteria with impressive views on the surrounding mountains. It was my first visit and in spite of the somewhat coldish weather (“much warmer than usual”, as they say…you bet! When “usual” means -20 Celsius, it’s still not quite clear what improvement you’re getting). I enjoyed the place, which occupies some sort of in-between a seclusive retreat in the mountains (think Dagstuhl) and a bustling research centre (think the Simons Institute in Berkeley). Good balance.

Aerial view of the Banff center (in summer!)

The goal of this post is to give a brief introduction to the (first half of the) topic of the workshop, “Hypercontractivity and Log Sobolev Inequalities”. (I hope to talk a bit about the second half, “in Quantum Information theory”, in the next post.) Log-Sobolev inequalities, be they quantum or classical, meant next to nothing to me before attending — I had heard of both, and was somewhat familiar with the uses of hypercontractivity in Boolean function analysis, but log-Sobolev inequalities were much more mysterious. The workshop was successful in bringing different perspectives to bear on these inequalities, their relationship, and applications (videos available online!). For my own benefit I’ll write down some very basic facts on these inequalities, where they come from, and what is the motivation for studying them. What I write will undoubtedly appear extremely naive to even moderate experts, and for much more on the subject I highly recommend a survey by Diaconis and Saloff-Coste, the lecture notes by Guionnet and Zegarlinski, and the (quals-fueled) survey by Biswal. In particular I refer the reader to these resources for proofs of the unproven assertions made in this post (any errors introduced being mine of course), as well as appropriate credit and references for the results mentioned below.

Markov semi-groups. The fundamental objects underlying both the hypercontractive and log-Sobolev inequalities are a space of functions ${\mathcal{F}}$ (think continuous real-valued functions defined on a nice topological space, ${\mathbb{R}^d}$, or your favorite graph) together with, first a Markov semi-group ${(P_t)}$ defined on that space, and second, a measure ${\mu}$ on the same space that is invariant under ${P_t}$, i.e.

$\displaystyle \mu(P_t f)\,=\,\mu(f) \ \ \ \ \ (1)$

for all ${f\in\mathcal{F}}$. Now, even the notion of a “Markov semi-group” can be intimidating (to me), but it really corresponds to the most simple process. A Markov semi-group is a family of maps ${(P_t):\mathcal{F}\rightarrow\mathcal{F}}$ indexed by the positive reals ${t\in\mathbb{R}_+}$ that transforms functions in a way that applying the map for time ${t}$, and then time ${s}$, is equivalent to applying it for time ${t+s}$: ${P_{t+s}=P_tP_s}$. Simple examples are e.g. any random walk on the domain of ${\mathcal{F}}$, with ${P_t}$ denoting averaging over inputs obtained by taking a “${t}$-step” walk, or a diffusion process on real functions, such as convolving with a Gaussian of variance ${t}$ or evolving ${f}$ under the heat equation for time ${t}$. Some concrete examples that I’ll refer to throughout:

Examples

Random walks on regular graphs. Let ${G=(V,E)}$ be a regular undirected graph with degree ${d}$, and ${A}$ its normalized adjacency matrix: ${A_{i,j} = 1/d}$ if ${\{i,j\}\in E}$ and ${A_{i,j}=0}$ otherwise. Let ${\mathcal{F} = {\mathbb R}^V}$. If ${f\in \mathcal{F}}$ and ${v\in V}$, then ${(Gf)(v) = E_{u\sim v}[f(u)]}$, where the expectation is taken over a random neighbor ${u}$ of ${v}$ in ${G}$. If ${f}$ is a distribution over ${V}$, then after ${t}$ steps of the discrete-time random walk on ${G}$ the distribution is given by ${G^t f}$. The standard way to make this into a continuous-time random walk is to wait at a vertex ${v}$ for an amount of time distributed as an exponential with mean ${1}$, and then move to a random neighbor. In other words, the number of steps taken in time ${t}$ is distributed as a Poisson point process ${N(t)}$ with rate ${1}$. Starting in an initial distribution ${f}$, the distribution after time ${t}$ is given by

$\displaystyle P_t f = \sum_{k=0}^\infty \Pr(N(t) = k) G^k f = \sum_{k=0}^\infty \frac{t^k}{k!} e^{-t}G^k f = e^{t(G-I)}f.$

The same construction extends to any irreducible Markov chain ${K(x,y)\geq 0}$, ${\sum_y K(x,y)=1}$, for which the associated semigroup is ${P_t f = e^{t(K-I)}f}$.

Ornstein-Uhlenbeck process. Here we take ${\mathcal{F}}$ the set of continuous functions from ${\mathbb{R}}$ to itself, and

$\displaystyle P_tf(x) = \int_{\mathbb{R}} f(e^{-t} x + \sqrt{1-e^{-2t}} y) \frac{e^{-y^2}/2}{\sqrt{2\pi}} dy.$

This describes the evolution of a particle submitted to Brownian motion under the influence of friction. It is a continuous, Gaussian analogue of the Bonami-Beckner noise operator ${T_\rho}$ on the hypercube that crops up in computer science applications, ${T_\rho f(x) = E_{y\sim x} [f(y)]}$ where ${y}$ is obtained from ${x\in\{0,1\}^n}$ by flipping each of its coordinates independently with probability ${p=1-\rho}$. Time-independent evolution. Schrödinger’s equation dictates that a quantum state ${\rho}$ suject to a fixed Hamiltonian ${H}$ will evolve as ${\rho\mapsto \rho(t) = e^{-iHt}\rho e^{iHt}}$ (where I set ${\hbar =1}$ for simplicity). It is easy to check that ${P_t:\rho\mapsto \rho(t)}$ is a Markov semi-group. Note however this semi-group acts on a different kind of space than the previous examples: ${\rho}$ is an operator on a Hilbert space ${\mathcal{H}}$, rather than a function on a certain domain. Dealing with such examples requires non-commutative analogues of the classical theory to which I hope to return in the next post.

Ergodicity. Now that we have a cast of players, let’s see what kind of questions we’d like to answer about them. The most basic question is that of convergence: given a Markov semi-group ${(P_t)}$ defined on a function space ${\mathcal{F}}$, does the sequence ${(P_t f)}$ converge, and if so at what rate? The latter part of this question requires one to specify a distance measure on ${\mathcal{F}}$. For this we will use the ${L^p}$ norms, defined as ${\|f\|_p = (\int |f|^p d\mu)^{1/p}}$ for some measure ${\mu}$ on the domain of functions in ${\mathcal{F}}$. Important special cases are ${p=1,p=2}$ and ${p=\infty}$. Here the measure ${\mu}$ will always be an invariant measure for ${P_t}$, i.e. it satisfies (1). There does not always exist an invariant measure, and there can be more than one; in practice though ${(P_t)}$ and ${\mu}$ are often defined together and for the purposes of this post I will assume there is a unique invariant ${\mu}$ (for the example of a semi-group defined from a Markov chain, this corresponds to the chain being irreducible, and ${\mu}$ is its unique stationary measure). The question then becomes, does ${(P_t f)}$ converge towards its expectation ${\mu(P_t f) = \mu(f)}$ (by invariance), and if so at what speed? This is precisely the question that log-Sobolev inequalities will let us answer.

One says that ${(P_t f)}$ is ${p}$-ergodic if ${\int |P_t f-\mu(f)|^p d\mu \rightarrow_{t\rightarrow\infty} 0}$ for all ${f\in\mathcal{F}}$, i.e. convergence holds in the ${p}$-norm. Note this is equivalent to requiring ${\|P_t-\mu\|_{p,\infty} \rightarrow_{t\rightarrow \infty} 0}$, where ${\|\cdot\|_{p,q}}$ denotes the operator norm from ${L^p}$ to ${L^q}$. Since ${\|f\|_p \leq \|f\|_{p'}}$ for all ${f}$ and any ${p\leq p'}$, the notion of ${p}$-ergodicity becomes stronger as ${p}$ increases. We will see three different ways to obtain bounds on the rate of convergence: the spectral gap inequality, equivalent to ${2}$-ergodicity; Sobolev inequalities and ultra-contractivity, which imply ${\infty}$-ergodicity, and log-Sobolev inequalities and hypercontractivity, which are related to ${p}$-ergodicity for intermediate ${p}$.

Before getting started there is one last quantity associated with any Markov semi-group that we need to introduce, the Liouvillian ${\mathcal{L}}$. The Liouvillian is defined through its action on functions in ${\mathcal{F}}$ as

$\displaystyle \mathcal{L}f = \lim_{t\rightarrow 0^+}\, \frac{P_t-Id}{t}f.$

Equivalently, the Liouvillian can be defined through the functional equation ${\partial_t(P_t f) = \mathcal{L} (P_t f)}$: it plays the role of infinitesimal generator for the semi-group ${(P_t)}$. For the example of the random walk the Liouvillian is simply ${\mathcal{L} = G-I}$. For the Ornstein-Uhlenbeck process one finds ${\mathcal{L}(f)(x)=f''(x)-xf'(x)}$, and for the Bonami-Beckner noise operator ${\mathcal{L} = (1-\rho) (H-I)}$ where ${H}$ is the adjacency matrix of the hypercube. Finally in the example of (time-independent) Schrödinger evolution we have ${\partial_t(\rho(t)) = i [\rho(t),H]}$, so that the Liouvillian is ${\mathcal{L}(\rho) = -i[H,\rho]}$.

Spectral gap inequality and ${L^2}$-ergodicity. A first family of inequalities are spectral gap inequalities, which turn out to be equivalent to ${L^2}$-ergodicity. The spectral gap of an invariant measure ${\mu}$ for the semigroup generated by ${\mathcal{L}}$ is defined as the largest ${\lambda>0}$ (if it exists) such that

$\displaystyle \mu\big(f (-\mathcal{L}) f\big)\,\geq\,\lambda\, \mu(f^2) \ \ \ \ \ (2)$

holds for all ${f}$ such that ${\mu(f)=0}$. In the example of the Markov chain, ${\mathcal{L} = K-I}$ and the inequality can be rewritten as ${E_{x\sim y}[ |f(x)-f(y)|^2] \geq \lambda E_{x,y}[|f(x)-f(y)|^2]}$, where the first expectation is taken over ${x\sim \mu}$ and ${y\sim K(x,\cdot)}$, and the second expectation is over independent ${x,y\sim \mu}$. In particular for random walks on graphs we recover the standard definition of the Laplacian ${L=-\mathcal{L}}$.

Applying (2) to ${P_t f}$ we see that it is equivalent to requiring ${\partial_t \mu((P_tf)^2) \leq -2\lambda \mu((P_tf)^2)}$, which using ${P_0=Id}$ implies the bound ${\mu((P_t f)^2) \leq e^{-2\lambda t} \mu(f^2)}$ for all ${f}$ such that ${\mu(f)=0}$. Applying the appropriate translation the same inequality can be rewritten as

$\displaystyle \mu((P_t f - \mu(f))^2) \leq e^{-2\lambda t} \mu((f-\mu(f))^2),$

which is precisely a ${L^2}$ convergence bound. Conversely, by differentiating the above inequality we recover (2), so that the two are equivalent.

Rather than ${L^2}$ convergence one would often like to derive bounds on convergence in ${L^1}$, or total variation, distance. One way to do this is to use Hölder’s inequality as

$\displaystyle \begin{array}{rcl} \|P_t-\mu\|_{1,\infty}^2&=&\sup_f \|P_t f - \mu(f)\|_1^2 \\ &\leq& \Big(\sup_f \frac{1}{\mu(f)}\Big)\sup_f \mu((P_t f - \mu(f))^2)\\ &\leq& \Big(\sup_f \frac{1}{\mu(f)}\Big)e^{-2\lambda t}. \end{array}$

Thus we get an exponentially decaying bound, but the prefactor could be very large; for instance in the case of a random walk on a graph it would typically equal the number of vertices. Our goal is to do better by using log-Sobolev inequalities. But before investigating those, let’s first see what a Sobolev (without the log!) inequality is.

Sobolev inequalities and ${L^\infty}$-ergodicity. Sobolev inequalities are the “high-${p}$” generalizations of the spectral gap inequality. ${\mu}$ satisfies a Sobolev inequality if there is a ${p>2}$ and constants ${a>0}$, ${b\geq 0}$ such that for all ${f}$ with ${\mu(f)=0}$,

$\displaystyle \|f\|_p^2 \leq a \mu(f (-\mathcal{L}) f) + b \mu(f^2).$

Sobolev inequalities imply the following form of ergodicity:

$\displaystyle \|P_t f - \mu(f)\|_\infty \leq c\big(\frac{a}{2et}+b\big)^\gamma \|f-\mu(f)\|_1,$

where ${c,\gamma}$ are constants depending on ${p}$ only (and ${c\rightarrow \infty}$ as ${p\rightarrow 2}$). If one can take ${b=0}$ (which is often the case) directly gives a polynomial bound on ${\|P_t - \mu\|_{1,\infty}}$ which is incomparable to the one we derived from the spectral gap inequality: on the one hand it does not have the large prefactor, but on the other it only gives polynomial decay in ${t}$. This form of contractivity, in the ${1\rightarrow\infty}$ operator norm, is called “ultracontractivity”.

An important drawback of Sobolev inequalities is that they do not “tensor” well, leading to dimension-dependent bounds. What this means is that if ${P_t}$ acts on functions of a single variable in a certain way, and we let ${P_t^{\otimes n}}$ act on ${n}$-variable functions in the natural way, then the resulting semigroup will in general not satisfy a Sobolev inequality with constants ${a,b}$ independent of ${n}$, even if ${P_t}$ itself did. This contrasts with the spectral gap inequality: it is not hard to see that the spectral gap of ${P_t^{\otimes n}}$ is the same as that of ${P_t}$. This makes Sobolev inequalities particularly hard to derive in the infinite-dimensional settings that originally motivated the introduction of log-Sobolev inequalities (I hope to return to this in the next post).

Log-Sobolev inequalities and ${L^p}$-ergodicity. We finally arrive at log-Sobolev inequalities. We’ll say that ${\mu}$ satisfies a log-Sobolev inequality if there exists constants ${c>0}$, ${d\geq 0}$ such that for all non-negative ${f}$,

$\displaystyle \mu\Big(f^2 \log \frac{f^2}{\|f\|_2^2}\Big) \leq c\mu(f (-\mathcal{L}) f) + d \mu(f^2). \ \ \ \ \ (3)$

Often it is required that ${d=0}$, and the log-Sobolev constant is ${\alpha=1/c}$ where ${c}$ is the smallest positive real for which (3) holds for all ${f}$. A log-Sobolev inequality can be understood as a “weakest” Sobolev inequality, obtained as ${p\rightarrow 2}$; indeed, one can show that if ${\mu}$ satisfies a Sobolev inequality then it also satisfies a log-Sobolev inequality. Gross related the log-Sobolev inequality to hypercontractivity by showing that it is equivalent to the statement that for all ${t>0}$ and all ${2\leq q \leq 1 + e^{4\alpha t} }$ the bound

$\displaystyle \|P_t-\mu\|_{2,q} \leq 1$

holds: ${(P_t)}$ is hypercontractive for the ${2\rightarrow q}$ operator norm if and only if the log-Sobolev constant satisfies ${\alpha \geq \ln(q-1/(4t))}$. One can also show that ${\alpha \leq \lambda/2}$ always holds. But a direct use of the log-Sobolev inequality can give us a better convergence bound than the one we derived from the spectral gap: a more careful use of Hölder’s inequality than the one we made earlier easily leads to the bound

$\displaystyle \|P_t-\mu\|_{1,\infty}^2\,\leq\, 2\log\Big(\sup_f \frac{1}{\mu(f)}\Big) e^{-2\alpha t},$

a marked improvement over the prefactor in the bound derived from ${\lambda}$ alone (provided ${\alpha}$ and ${\lambda}$ are of the same order, which turns out to often be the case).

Aside from this improvement a key point making log-Sobolev inequalities so useful is that, contrary to Sobolev inequalities, they tensor perfectly: if ${\mu_i}$ satisfies a log-Sobolev inequality with coefficients ${(c_i,d_i)}$ for ${i=1,2}$ then ${\mu_1 \otimes \mu_2}$ satisfies one with coefficients ${c=\max(c_1,c_2)}$ and ${d=d_1+d_2}$. Thus for example the standard approach for proving hypercontractivity for the Bonami noise operator on the ${n}$-dimensional hypercube: first, prove the “two-point inequality” for the single-variable case, ${\|T_\rho\|_{p,q} \leq 1}$ for any ${\rho \leq \sqrt{(p-1)/(q-1)}}$. This by itself is a nontrivial statement, but it boils down to a direct calculation. Then use Gross’ equivalence to deduce that ${T_\rho}$ satisfies a log-Sobolev inequality with the appropriate constant ${\alpha}$. Deduce that the ${n}$-dimensional noise operator satisfies a log-Sobolev inequality with the same value of ${\alpha}$, and conclude hypercontractivity of ${T_\rho}$ for general ${n}$: ${\|T_\rho\|_{2,q}\leq 1}$ for all ${2\leq q \leq 1+\rho^{-2}}$.

I hope this got you interested in the topic. I wish I had thought of browsing through the surveys mentioned at the start of the post before the BIRS workshop…Have a look, they’re quite interesting. Next time I’ll try to recall why the second part of the workshop title, “and quantum information theory”…

Posted in Conferences, Talks | Tagged , , | 2 Comments

## Where on earth?

Tough life.

(Bonus points for getting the precise location where the picture was taken from.)

More scientific content to come…

Posted in Where on earth | 5 Comments

## ITCS 2015

It might’ve been there for a while (it definitely should :-)), but I only just noticed: the list of papers accepted to ITCS 2015 is out! There were 45 accepts out of 159 submissions, a record rate for ITCS and a level of selectivity, 28%, comparable to that of STOC or FOCS (which typically accept slightly under twice as many papers, and have less than twice as many submissions). I had the pleasure and honor to serve on this year’s program committee (how else do you think I’d get a paper in?), and I was truly impressed by the overall quality of submissions. We could easily have accepted twice as many papers and still claim a spot for ITCS as a top-tier conference in the theory of computation. If so, why didn’t we? Giving an answer requires to think through which purpose we want ITCS, and more generally our constellation of theory (or theory-friendly) conferences, to serve. The question is being increasingly actively debated, though it has already been on the table for a while (and even featured on this blog…); I think it is an important one, and I’d like to share some thoughts.

STOC and FOCS have been giving rhythm to any decent theorist’s academic year for immemorial times (well, almost — as Dick Lipton reminded us in his Knuth award prize at FOCS, some of us do remember the time when the first editions of theory conferences were making their apparition). Their respective Spring and Fall deadlines are so much entrenched that I was recently asked by a graduate student how one could come up with a new 6-month project every 6 months, and make sure to wrap it up by the next deadline? Of course in practice most people work on many projects in parallel, and decide which have a decent chance of being brought to a (reasonable?) state of completion by deadline time a few weeks (hours?) before the deadline. Nevertheless, these “flagship” conferences are there to constantly remind us of our bi-yearly sacrificial duty. The longer one has been going through the process, the harder it is to imagine any changes to the model: Wouldn’t any reduction in the rhythm affect our productivity and the speed at which ideas spread in the community? To the opposite, wouldn’t any increase in rhythm lead to a serious risk of exertion? Isn’t the actual system just right?

Of course not — indeed, I don’t know anyone who’d claim the current model is good. But I don’t know any two researchers who’d agree on how to change it either. Any substantial proposal seems to immediately run into a large amount of resistance — in the form of a barrage of “what about…?”, or “what if…?” — that it is clear any change will have to be extremely gradual. The difficulty is exacerbated by the additional level of inertia imposed by the professional societies, ACM and IEEE, that administer STOC and FOCS respectively. In this respect the recent proposal by Boaz Barak and Omer Reingold had good chances behind it: Boaz and Omer put forward a very concrete proposal that carefully avoided any changes to the backbone structure of STOC and FOCS, preserving submission deadlines, independent committee evaluation, rough numbers of accepted papers, etc., and instead focused on the format of the actual event, in substance proposing a single yearly week-long “theoryfest” to replace the current bi-yearly 3-day gatherings. Unfortunately, from the echoes I’ve had from Saturday’s meeting at FOCS it looks like even this moderate change  is proving too much for the community to take, and the background grumble is likely to overtake the energetic spike proposed by Boaz and Omer.

ITCS, in contrast, should be largely free of such accumulated inertia. The conference series is still in its blooming youth: the coming edition, to be held in January at the Weizmann institute in Israel, will be the 6th instantiation of the series. It is not attached to any professional society (though it does seem to be loosely affiliated with ACM’s SIGACT), and is instead run by a steering committee made of prominent members of the theory community. Since its inception ITCS has been in permanent soul-searching mode: is there space for yet another theory conference? Wouldn’t the debate over a possible merge of STOC and FOCS indicate quite the opposite, that we theorists already have too many venues in which to express ourselves?

I would argue that there is, and that searching for this space can play an important role in defining the future of theory. Irrespective of changes in the format of STOC or FOCS, ITCS can continue to occupy a special niche: the “conceptual” niche (I don’t like the word — it is too overloaded — but I can’t think of a better one). ITCS is not there to wipe off the excess production that didn’t make it to its big sisters. It is different, not only in its selection process but also, and just as importantly, in the format of the event itself. It should affirm this difference by continuing to build the reputation it has started to earn as a small, friendly, innovative conference in which discussions take a prominent part, the program is light and single-track, and the presentations lead to lively discussions between the participants. Not everyone needs to go to ITCS every year — but, given the chance, most would go, and have a pleasant experience. You might not get to learn about all the recent hot results: theoryfest is made for that. But you’ll be exposed to the pulse of the community, emerging research directions you might not have been exposed to otherwise (for instance, because you went to the other parallel session, the one with talks on your favorite area…).

There is a broader picture to keep in mind with respect to the place of theory within the CS landscape at large. By its nature our field is bound to have its relevance regularly questioned: are we looking at the “right” problems? And even if we are, are the kind of answers we bring relevant? Or has theory lost any contact with applications? ITCS could play an important role in addressing these questions. As a “conceptual” conference it is there precisely to bring to light promising avenues and novel directions in which our theoretical ideas and concepts may find stimulation and application. Vitality of the conference will be a sign that theory is able to permanently re-invent itself. Many of us strongly believe that the language we speak is universal; the question then is only whether we’re able to use that language effectively and bring it to bear on the problems that our peers across the sciences care about.

If I am to judge of ITCS’s vitality by the quality of this year’s submissions I have no reason to be worried about the future of theory. As already mentioned, the crop was excellent, and decisions were difficult. I was also impressed at how seriously the “mission” of ITCS was taken into account during the reviewing process. The PC chair, Tim Roughgarden, put a strong emphasis on selecting papers that fit the single sentence in the ITCS call for papers that seeks to distinguish it from other conferences:

ITCS (previously known as ICS) seeks to promote research that carries a strong conceptual message (e.g., introducing a new concept or model, opening a new line of inquiry within traditional or cross-interdisciplinary areas, or introducing new techniques or new applications of known techniques).

This is not to say that strong submissions should be rejected if their “conceptual message” was not apparent. This would be a mistake, for who are we to judge of the presence, and even more so future importance, of a new concept? Only time will tell. But given the large number of strong submissions, difficult decisions had to be made, and in many cases the discussion did seriously take into account what we perceived as the conceptual novelty, or appeal, of a submission. The debate was often extensive; I can assure you that most submissions received very careful attention.  I wish the authors, or even the community at large, could get precise feedback on the discussions, which had the benefit of raising awareness, at least among PC members, of the above quote from the CFP.

We also had some discussions on the format of the conference; because these discussions happened after the formal publication of the CFP I’m not sure we can expect any big changes to the format this year, but I hope that they may lead to improvements in later years. Some of the suggestions that were floated around included having invited talks, varying the length of contributed talks, and having a poster session, possibly preceded by short 3-4 minute “advertisements” of the posters by their authors. Personally I think this is an excellent idea, and I hope it gets implemented in a later edition.

To get back to my opening comment — why did we accept so few papers, if the submissions were so good? Well, if you’re angry your paper got rejected, I can only promise that the accepted ones were at least as good — come and check them out! Indeed, in my mind (I am not pretending to speak for the whole PC here!) the main reason is this: ITCS will be single track, it will be relaxed and friendly, and all presentations will be of high quality. We had to make decisions; some of them were right, some were wrong, and many could have been done either way. But we had to make them, and made them for you. So we hope to see you in January 2015 in Israel, and that you’ll enjoy hearing the talks as much as we enjoyed reading and debating the papers!