## What it is that we do

This post is a follow-up on some somewhat off-hand comments that I made earlier regarding the notion of truth in a “proof-based” discipline such as pure mathematics or theoretical computer science. Since the former is easier to circumscribe and also has a larger literature available on it, for the purposes of the post I will discuss my emerging take on truth in mathematics; what I say applies to TCS as well. (I wasn’t able to find satisfactory writings on the practice of computer science, even more broadly interpreted than just “theory”; any pointers on this are very welcome.) I obviously don’t claim any originality here; I suspect that to some the points I make might be interesting while to others they could feel passé–in the latter case, please help me make progress in the comments!

The question is more interesting than it may seem. First, let’s set aside the Platonician answer that mathematics is about proving statements that apply to abstract objects. None of this makes sense: there is no such thing as an abstract object (What is an “abstract circle?” Euclid’s definition of an abstract circle exists, but that definition has nothing abstract; it is plain English—or well, Greek, which is no better). Neither is there anything fundamentally “true” about the statements that mathematicians make about these objects. This last point is made clear in Voevodsky’s retelling of the story that brought him to start his research program on univalent foundations: see in particular the paragraph starting with “The primary challenge that needed to be addressed” here. While a subset of mathematicians has been carefully examining the logical foundations of mathematics and pursuing a program of formalizing these foundations with the goal of automating verification, two facts are historically evident: (1) this formalization has always come after the fact, i.e. first the theorem is proven, and second only is there some attempt at modifying the format of the “proof” so as to reach some abstract “higher standard”; (2) what the higher standard exactly is is a moving target. In other words, mathematicians do whatever it is that they do first, and only after that does some subset of them look back on what has been done and make an attempt to reformulate it in a way that seems more “rigorous” by the standards of the day. And even once this process has been completed (or rather, iterated over a few times), no absolute truth has been obtained; merely, one has gained higher confidence that some set of formal rules can be applied in a certain order to derive one statement for another. In summary, the logical answer “I define as true any statement that can be derived by said rules starting from said axioms” will not satisfy us. (My point is not to criticize this enterprise; only to point that this is not what mathematics is about. I don’t expect any mathematician would disagree.)

Once this point has been cleared we have opened up the possibility for a much richer interpretation of what mathematics is about, what it is that mathematicians do, and what is so special about the role played by the notion of truth in this enterprise.

## The Greeks

Mathematicians have been around since Greek antiquity. I highly recommend the wonderful book The Shaping of Deduction in Greek Mathematics by Reviel Netz, a historian at Stanford. The book provides a fantastic historical introduction to the birth of modern mathematics, as practiced in the Western world. In this book Netz gives a detailed account of the practice of Greek mathematics that is based on a sometimes almost comical literal reading of the original texts. Netz is not a mathematician, and he takes the Greek writings at face value, without preconceptions as to any deeper meaning, mathematical or otherwise. To see how fantastic an enterprise this might be I highly recommend taking the time to read through a few paragraphs of Euclid’s elements, see e.g. here. (I wasn’t able to find an online version that has clear figures; Netz’ book contains many reproductions and he spends a great deal of time examining their significance as a companion to the text.) Wait, and they called this a proof?

What I find most amazing about Netz’ work is that his book makes it absolutely clear that what Greek mathematics did is invent a very special world, with its own set of rules and specifications as to how things should be presented, what counts as valid and what does not, etc; most importantly this set of rules and specifications is absolutely arbitrary. There is nothing universally true about any of the statements made in Euclid’s book ; what there is however, and which is just as important, is a form of self-consistent, self-perpetuating consistency. This is the Greek mathematician’s most important discovery, and most enduring legacy: a system of thought, based on the use of formal rules, such that users of that system of thought can easily and unequivocally agree on the same statements. Netz contrasts this with the political discourse that the Greeks were so famously fond of; while in rhetoric endless arguments can be made in favor or against any given statement, in mathematics once an argument has been made there is no discussing it; either the argument is correct and accepted by all or it is flawed and all will agree on the presence of a flaw. By saying that mathematics does not establish universal truths we do not take away any of the strength of its evident success as a system of thought.

I insist that in the previous sentence by “mathematics” I mean “Greek mathematics.” There is nothing universal about the formal “proof system” used by the Greeks, and in fact that system is arguably quite different from the one used today (contrast a proof in Euclid’s Elements from one in e.g. Kerodon (see figure below); clearly either school would reject the other’s papers!). There is nothing absolutely perfect or even fool-proof about it either; when I write that “either the argument is correct or it is flawed” I do not dismiss the existence of a substantial grey zone in which there might be, and generally is, discussion. (Netz in his presentation makes it plainly obvious that Greek proofs had many “gaps” and that completely arbitrary choices are made as to what is valid and what is not.) The point is that this grey zone is by orders of magnitude smaller than in other disciplines, and this is what makes mathematics special.

Greek mathematics has its limitations. In particular, it seems like they were not able to expand their explorations much beyond geometry; possibly because they did not have a formal way of re-using proofs. Indeed, only propositions from Euclid’s elements were part of the general corpus of “truths” that could be used without justification; any other proof could only appeal to either a statement from the Elements or a statement proved within the same text (as opposed to a statement made by another mathematician). The current system has a much higher level of re-usability, allowing it to much deeper. It is possible that current attempt at formalization will provide yet another layer to this by enabling re-using statements without the mathematician herself even being cognizant of the statement that is being re-used (e.g. the proof assistant would figure out that there is some theorem somewhere that could be useful).

## The moderns

Once one has taken this somewhat distanced look at the practice of Greek mathematicians it becomes clear that there is no reason for contemporary practice to escape similar distanciation. As arbitrary as the rules that govern the former might seem now, just as arbitrary our own rules will seem later. The observation prompts us to re-evaluate the notion of proof and truth as follows: The goal of the mathematical proof is to convince the mathematician; a statement is true whenever the mathematician is convinced. I emphasize that the mathematician in this sentence is a human being, not a machine.

Propositions in Euclid’s elements are true because Euclid’s text convinced all other Greek mathematicians that they were true; the same hold of Voevodsky’s univalent foundations. To speak about my personal experience, the point was carried home vividly about a decade ago at the end of an entire afternoon spent sitting by the sea nearby Marseille in the South of France together with an illustrious French operator algebraist. For the whole afternoon I had been trying to explain, of all things, the Clause-Horne-Shimony-Holt inequality formulated as a nonlocal game—an elementary construction on which half of my research is based (see this post for example, although we certainly didn’t get that far). To no avail! Alice, Bob, games, the mathematician would have none of it. How could it be so hard? The message I was trying to carry across is bread and butter to quantum information theorists; moreover the language of functional analysis was not foreign to me either and so I believed I ought to have more than it should take to make the explanations clear; in my mind I had a clear theorem, together with an airtight proof, to communicate. Not so: I ended the afternoon defeated, and the point hammered in: as much as we may think that our language is formal, our reasoning rational, our goals clear, as soon as we step out of our self-referential cocoon we have to face the evidence: not so; there is nothing clear, formal, self-evident, in the mathematical truths we take as such. (If you’re not convinced yet, look at the pictures.)

After having read through the viewpoint of a historian on Greek mathematics, it is interesting to come back to the present and read up on the modern mathematician’s own account of their practice. Having delighted in Hardy’s A Mathematician’s Apology in my student years I was naturally attracted to Harris’ explicitly referential book Mathematics Without Apologies. The book aims to provide a personal answer to precisely the same question as Netz’ (“What do pure mathematicians do, and why do they do it?”—first line on back cover), with the essential difference that Harris’ book is written by a contemporary mathematician about the practice of modern mathematics. The perspective taken is thus very different: Harris does not waste a drop of ink to examine the material products of modern mathematics (e.g. printed articles), which naturally from his point of view do not have the least interest besides their mathematical significance; nor does he question the formal system that enables mathematics (he does discuss foundation issues, but these are themselves part of the formal system).

Harris delights in telling us stories about the practice of mathematics from the inside: the way one acquires prestige (he calls it “charisma”), how mathematicians perceive when a mathematical statement is interesting, what drives a mathematician to work on one problem or another, the influence of large research programs such as the Langlands program, etc. Just as Netz, Harris agrees that the proof itself, while an important product of the mathematician’s practice, is not a goal in itself. The goal is to introduce new objects, find relations between them, and build structures that support this exploration (Harris quotes extensively from Grothendieck, a builder of structures if there is any). Crucially the mathematician’s goal is not at all to find truth: her goal is to find beauty. Harris (un-apologetically!) defends the mathematician’s position as playing a privileged and essential role in our (Western) society: mathematics is the rare, if not unique, discipline where thought is valued for itself: not for its consequences or potential applications, nor even for some kind of universal validity or truth that it might reach — for itself, the sheer beauty of it. No apologies. While one might think that the arts are in a similar position, Harris points out that in contemporary discussions about art the notion of beauty has all but disappeared; instead, one talks of society, ecology, politics—all of this is good and important, but it is not about beauty. Mathematics is about exploring the beauty of human thought. This exploration is carried out within the very special, narrow and arbitrary corner that mathematical practice has delineated for itself since the Greeks got us started. And yet does not matter that it is narrow and arbitrary; what matters is that it is purely and uniquely human.

I don’t have a conclusion to give to this post. I suppose it is the kind of digression that one is naturally inclined to make while on sabbatical. (I was delighted to see Harris reference his time at the Institut Henri Poincaré in Paris extensively and spend multiple pages gossiping about Ed Frenkel, then a holder of the same FSMP chair I am now occupying.) Certainly it helps to gain a sense of what it is that one is doing. While these explorations destroyed a number of simple comforting myths about what it is that I do every day, in the end I find the void that lies beneath the surface much more appealing; I feel privileged and comforted in my desire to make use of that privilege.

## Post-scriptum

In addition to the two books referenced in the post there are a couple articles that I found helpful. I am listing them here because I found such writings non-trivial to come by, and so they might be helpful to anyone interested in the topic. I welcome additional references.

• Barton, Ethnomathematics and Philosophy. This is Barton’s Ph.D. thesis, in which he studies the social emergence of mathematics not as inevitable, but as cultural, yet without negating what is so special about it. .
• Wallet and Neuwirth, Enquete sur les modes d’existence des etres mathematiques (unfortunately in French only it seems). The authors build on Netz’ work and others to examine the notion of mathematical proof within Latour’s framework of “modes of existence,” which Latour mostly applied to experimental sciences.
Posted in meta, Science, Uncategorized | Tagged | 6 Comments

## Lecture notes on the Mahadev verification protocol

As announced earlier I am currently teaching a course on “interactive proofs with quantum devices” in Paris. The course is proceeding apace, even though the recent lockdown order in France means that we had to abandon our beautiful auditorium at the Institut Henri Poincaré and retreat behind the fanciful Zoom backgrounds whose pretension is a sad reminder of what our summers used to be (Banff, anyone?). A possible upshot is that more may be able to attend the now-online course; if you are interested see the course page for info. (Things are actually fairly good here–in spite of the apparently strict restrictions on daily outings (max 1h, 1km) you can count on the French to bend things their way; most shops are closed but the streets remain quite busy.)

We just finished a sequence of four lectures on the Mahadev “classical verification of quantum computation” protocol. In the process of preparing the lectures I arrived at a presentation of the protocol that is fairly self-contained, so I decided to compile the associated lecture notes as a stand-alone group of 4 lectures that is available here. The notes are aimed at beginning graduate students with a first course in quantum computing and in complexity desiring to gain a concrete understanding of the inner workings of the result; the notes are a bit lengthy but this in part because they take the time to introduce related concepts and slowly build up to the main result. Overall, my hope is that these should be relatively easily readable and provide a good technical introduction to the Mahadev result on classical verification, including an almost complete analysis of her protocol (there are a few explicitly declared shortcuts that help simplify the presentation without hiding any important aspects). For additional background you can also consult the full 10-week notes. Comments on the notes are welcome; I’m afraid they most likely contain numerous typos so if you find any please feel free to correct them directly on the associated overleaf document.

The last three lectures of the course will be devoted to the problem of testing under spatial assumptions, building up to an introduction to the proof of MIP* = RE. If all goes well I’ll aim to prepare some stand-alone notes for that part as well.

Posted in Quantum, Science, teaching, Uncategorized | | 3 Comments

## It happens to everyone…but it’s not fun

A recent post on this blog concerned the posting of our paper MIP*=RE on the arXiv and gave a personal history of the the sequence of works that led to the result. Quite unfortunately (dramatically?) a few weeks after initial posting of the paper (and the blog post) John Wright discovered an important error in the proof of a key result in this sequence: my paper Three-player entangled XOR games are NP-hard to approximate, published in 2016 in a special issue of the SIAM journal on computing dedicated to selected papers from the FOCS’13 conference. While I did not mention this paper directly in the previous blog post, its main result, a proof of soundness of the Raz-Safra low-degree test against entangled-player strategies, is a key ingredient in the proof of the quantum low-degree test, itself a key ingredient in the MIP*=RE paper. (Strictly speaking the latter paper relies on an extension of my result to two-player games obtained in a follow-up with Natarajan. Since that paper re-used the flawed part of my earlier analysis in a black-box manner it is similarly impacted.) So then…?

## Scientific aspects

I’ll start with the science. The result MIP*=RE, to the best of our knowledge, remains correct. In order to remove the dependence of the proof on the flawed paper we extended the soundness analysis of Babai, Fortnow and Lund (BFL)’s multilinearity test against entangled provers from my paper with Ito to the case of multivariate polynomials of low individual degree. (This extension, for the case of classical provers, is already mentioned in the BFL paper.) We just posted a self-contained analysis of that test on the arXiv here and updated the MIP*=RE paper to account for the replacement (see v2.). The latter paper is currently under review; on this I will simply say that, as for all mathematical works, it is advised to wait until the outcome of the refereeing process is complete before declaring confidence in the validity of the result. For a more in-depth description of the changes made I refer to the introduction of the new paper.

Our analysis of the “low individual-degree test” mentioned in the preceding paragraph can be used to recover the main result of my SICOMP’16 paper in a weakened form. Since the proof is different and does not directly fix the error I have decided to withdraw the paper from SICOMP. For more details on the error itself and consequences to other works, such as the quantum low-degree test and the quantum games PCP, I refer to the short note I wrote to accompany the withdrawal of the paper. The one-sentence summary is that essentially all subsequent results expressed in terms of “high” complexity classes such as QMA-EXP, NEEXP, etc., still hold, while “scaled-down” results on the hardness of entangled games can only be recovered by allowing a substantial weakening of parameters. In particular, the quantum low-degree test holds in its scaled-up version (testing exp(n) EPR pairs using poly(n) resources), but the scaled-down version requires polylog(n) communication to test n EPR pairs, instead of O(log n) as claimed.

## Personal aspects

In addition to notifying researchers in the area of the bug, my goal in writing this blog post is to help me exorcise the demon of having a large mistake in one of my papers. In doing so I was inspired by Scott Aaronson’s blog post on a similar topic. (I’ll admit that even just linking explicitly to his post helps reassure myself, a power which I believe was one of Scott’s aims in writing the post. So, thanks Scott, and allow me to pass it on!) The faulty paper is not based on a minor back-of-the-envelope observation; in fact it is one that I was quite proud of. The mistake in it is not small either; it’s a mistake that I cannot find any excuse for having made. Yet here I am: after having spent the past 6 months trying to find an alternative proof, I now strongly believe that the problem cannot be solved using the kind of techniques that I had imagined could do so. Whether the theorem statement is true or not, I don’t know; but at the moment I am unable to prove it. I have to accept that there is a bug.

As painful as it is I realize that I am writing this post from a relatively comfortable position. Who knows if I would have been able to do the same had we not been able to recover a full proof of MIP*=RE. Moreover, after having banged my head against the problem for 6 months straight (COVID helping, walls were never far) I am now able to see my failure in a more positive light: the story I told in the previous blog post is not yet closed; there is an open challenge for me to solve. It is a very personal challenge; having spent the past 6 months delineating it I have accumulated sufficient grounds on which to believe that it is an interesting one. I feel grateful for this.

## Social aspects

After the scientific and the personal aspects, let me end with the sociological. This is a semi-tangent but it is a good opportunity to discuss a topic that we scientists, possibly even more so us working in the “hard sciences” (as the French call “proof-based” disciplines), are insufficiently sensitized to. This is the topic of how science is made, and what is the reality of this “absolute truth” that we claim to discover and establish in our mathematical results.

My paper was posted on arXiv in 2013, it was accepted to the FOCS conference and published in its proceedings the same year, and it appeared in the journal SICOMP in 2016. Both publications were refereed. Since its posting the paper has been cited 47 times (google scholar) and its main result is used in an essential way in at least half a dozen papers (my best guess). 7 years later a big hole has been found in the proof. How did the “truth value of my result evolve in the process? Was it always wrong or was there a time where it had truth, in whatever appropriate sense of the word?

I realize that these questions can be given trivial answers—I know what is an axiom and what is a proof system. Yet I am trying to push myself, and my reader, to look a little deeper. An analogy might help. The situation brought to mind a book by French philosopher of science Bruno Latour, called (in its English translation) Laboratory Life: The Construction of Scientific Facts. This is a wonderful book, which goes well beyond the classic misconceptions from Popper or even Kuhn; it should be mandatory reading for every scientist. In one of the early chapters of the book Latour makes a detailed study of how subsequent citations can collectively enshrine an initial claim based entirely on the citer’s conscious or unconscious biases in making use of the citation (i.e. in complete independence from any ground “truth” or “importance” of the cited work). An entertaining example of this can be found in this article, which dissects the claim that “The myth from the 1930s that spinach is a rich source of iron was due to misleading information in the original publication: a malpositioned decimal point gave a 10-fold overestimate of iron content.” The example, pursued in great depth in the article, shows very well how one citation at a time the (spoiler: unjustified) claim is given more and more credibility until it eventually becomes a fact: from initial citations written in a tentative tone “according to Z, it could be that…” to more assertive citations “Z has shown that” by more and more well-known researchers in highly-read journals to pure fact (citation above). I highly recommend the article!

It is easy to dismiss this story as being the result of “sloppy” authors misrepresenting a “soft” claim whose truth value is not well-determined in the first place, being a statement about the world rather than about some hypothetical mathematical universe. Yet I believe that it is worth taking the time to examine with an open mind what exactly, if anything, distinguishes a claim about the iron content of spinach from the main “theorem” of my paper. From its initial posting on the arXiv to its presentation in a conference and its journal publication to the multiple citations it received through its use in subsequent works, and including multiple other considerations such as my own credibility (itself the result of so many other considerations) and the results base “believability”, when was the logical statement itself evaluated? Does it matter? Did the unchallenged existence of the result for 7 years impact the course of science? Or was it a mistake that was bound to be discovered and has no lasting consequences?

These are questions for the reader, that can be (and are probably better) asked in other contexts than the limited one of my result. Indeed there is a much broader point to all this, that I only meant to raise in an indirect manner. It is impossible to disregard the fact that our scientific work is grounded in cultural and societal effects, but we may disagree on the impact that this grounding has. We owe it to ourselves and to our readers (broadly interpreted—from colleagues to funding agencies to the broader public) to refuse to hide behind the thin veil of “hard science”, mathematics or logic, and educate ourselves to what it is that we really are doing.

Posted in meta, Quantum, Science, Uncategorized | 8 Comments

## Announcing a short course in Paris

This coming academic year I am on sabbatical, in Paris. It’s certainly a funny year to be on sabbatical. (It’s a funny year to be doing anything, isn’t it? Or is “funny” not the appropriate word…Yet I can’t find any other way to look at it that doesn’t send me straight into the abyss. So, let it be “funny”—knowing that, no, I’m not actually laughing right now.) On the one hand, I am lucky to have escaped the incessant debates on the format of teaching, how many people per square foot are allowed in each building on campus, what distance I should stay from my students were I to attempt to meet them in person, and so many other similar decisions that have come to take up a larger and larger fraction of our professional lives (not to mention of course the incommensurate challenges that many are facing at the personal and familial level). On the other hand, the situation makes it much harder to meet others and engage in new collaborations, one of the goals of my sabbatical. I’ll see how it plays out; I’ll be sure to write more on this blog as time progresses.

During the sabbatical I am being hosted successively by different French institutions. For the first 6 months I had the good fortune of being awarded a “chair” from the “Fondation Sciences Mathématiques de Paris” (FSMP), a private foundation which supports, in very general terms, the development of the mathematics community in Paris, from the organization of general-public conferences to the support of research collaborations. My only formal obligation during these 6 months is to give 20 hours of lecture on a theme of my choosing. The goal that I elected for the course is provide an in-depth introduction to two major works in quantum complexity and cryptography of the past few years: first, Mahavev’s 2018 result on classical verification of quantum computation (a result for which I already shared my enthusiasm here); second, my result MIP*=RE with Ji, Natarajan, Wright and Yuen on the power of quantum multi-prover interactive proof systems, which I mentioned in the previous post, and its consequences. For more about the course, including a tentative breakdown of lectures and some resources, see the course webpage.

While at the time of writing the course is still scheduled to start as an in-person meeting (to take place in a very large layered amphitheater with ample space for social distancing), there is no telling how the situation, and regulations, will evolve in the near future. To accommodate participants who are unable or prefer not to travel in person, all lectures starting with the first one will be recorded. In addition I will post course materials, including lecture notes, here. The purpose of this post is to advertise the course: participants from everywhere are welcome to watch the recorded videos, read the notes, and write to me with any questions in suggestions. In particular I plan to outsource the proof-reading of the notes via overleaf and I welcome any participant’s interest in helping with that; draft notes for the first lecture are already available here. Anyone is welcome to make direct corrections, or add inline comments pointing to issues that may need my attention.

The program that I chose is ambitious, and we will see how far we get along. My goal is to start slow, so as to remain inclusive with respect to varying backgrounds in computer science, mathematics or physics. At first I will give complete definitions, state and prove simple lemmas, etc., in order to establish common language. As time progresses I expect that things will become a little more high-level, less self-contained, and more technical. Depending on your background and interests, you may find the first few lectures, or the last few ones, more interesting. Teaching the course will certainly be beneficial for me because I believe that there is a strong unity behind the two works I chose to present. I hope to make that unity apparent by presenting them together. Moreover, both works introduce new techniques that leave many avenues open; I hope that a “clean” presentation will help me, and others, build on them.

A side benefit of an “un-necessary” course such as this one is that it contributes to bringing a certain community together. (By “un-necessary” I mean that the course will not be required for any curriculum; if it did not take place, as long as it was replaced by other research-level activities its absence would not be felt.) COIVD-19 unfortunately turns that opportunity into a challenge. It is because of it that I insist–regulations allowing– on having the course take place in person: as much as we are getting used to Zoom, and as well as it may be working as a replacement for many aspects of our interactive lives, from in-person classes to conferences to research collaborations, a scientific event such as this one, with sustained involvement by a small set of participants coming from distant backgrounds, is probably one of the more challenging ones to make work online. I hope it doesn’t come to that. Even if it does, one of the lessons learned from the Spring 2020 semester on quantum computing at the Simons Institute in Berkeley, which was interrupted half-ways due to the pandemic, is that having an initial in-person phase was of great help to cement future online interactions. So, I hope that I am able to lecture on Tuesday; after that, we will see.

Posted in Uncategorized | 2 Comments

## A Masters project

In a previous post I reported on the beautiful recent result by Natarajan and Wright showing the astounding power of multi-prover interactive proofs with quantum provers sharing entanglement: in letters, ${\text{NEEXP} \subseteq \text{MIP}^\star}$. In this post I want to report on follow-up work with Ji, Natarajan, Wright, and Yuen, that we just posted to arXiv. This time however I will tell the story from a personal point of view, with all the caveats that this implies: the “hard science” will be limited (but there could be a hint as to how “science”, to use a big word, “progresses”, to use an ill-defined one), the story is far too long, and it might be mostly of interest to me only. It’s a one-sided story, but that has to be. (In particular below I may at times attribute credit in the form “X had this idea”. This is my recollection only, and it is likely to be inaccurate. Certainly I am ignoring a lot of important threads.) I wrote this because I enjoyed recollecting some of the best moments in the story just as much as some the hardest; it is fun to look back and find meanings in ideas that initially appeared disconnected. Think of it as an example of how different lines of work can come together in unexpected ways; a case for open-ended research. It’s also an antidote against despair that I am preparing for myself: whenever I feel I’ve been stuck on a project for far too long, I’ll come back to this post and ask myself if it’s been 14 years yet — if not, then press on.

It likely comes as a surprise to me only that I am no longer fresh out of the cradle. My academic life started in earnest some 14 years ago, when in the Spring of 2006 I completed my Masters thesis in Computer Science under the supervision of Julia Kempe, at Orsay in France. I had met Julia the previous term: her class on quantum computing was, by far, the best-taught and most exciting course in the Masters program I was attending, and she had gotten me instantly hooked. Julia agreed to supervise my thesis, and suggested that I look into some interesting recent result by Stephanie Wehner that linked the study of entanglement and nonlocality in quantum mechanics to complexity-theoretic questions about interactive proof systems (specifically, this was Stephanie’s paper showing that ${\text{XOR-MIP}^\star \subseteq \text{QIP}(2)}$).

At the time the topic was very new. It had been initiated the previous year with a beautiful paper by Cleve et al. (that I have recommended to many a student since!). It was a perfect fit for me: the mathematical aspects of complexity theory and quantum computing connected to my undergraduate background, while the relative concreteness of quantum mechanics (it is a physical theory after all) spoke to my desire for real-world connection (not “impact” or even “application” — just “connection”). Once I got myself up to speed in the area (which consisted of three papers: the two I already mentioned, together with a paper by Kobayashi and Matsumoto where they studied interactive proofs with quantum messages), Julia suggested looking into the the “entangled-prover” class ${\text{MIP}^\star}$ introduced in the aforementioned paper by Cleve et al. Nothing was known about this class! Nothing besides the trivial inclusion of single-prover interactive proofs, IP, and the containment in…ALL, the trivial class that contains all languages.
Yet the characterization MIP=NEXP of its classical counterpart by Babai et al. in the 1990s had led to one of the most productive lines of work in complexity of the past few decades, through the PCP theorem and its use from hardness of approximation to efficient cryptographic schemes. Surely, studying ${\text{MIP}^\star}$ had to be a productive direction? In spite of its well-established connection to classical complexity theory, via the formalism of interactive proofs, this was a real gamble. The study of entanglement from the complexity-theoretic perspective was entirely new, and bound to be fraught with difficulty; very few results were available and the existing lines of works, from the foundations of nonlocality to more recent endeavors in device-independent cryptography, provided little other starting point than strong evidence that even the simplest examples came with many unanswered questions. But my mentor was fearless, and far from a novice in terms of defraying new areas, having done pioneering work in areas ranging from quantum random walks to Hamiltonian complexity through adiabatic computation. Surely this would lead to something?

It certainly did. More sleepless nights than papers, clearly, but then the opposite would only indicate dullness. Julia’s question led to far more unexpected consequences than I, or I believe she, could have imagined at the time. I am writing this post to celebrate, in a personal way, the latest step in 15 years of research by dozens of researchers: today my co-authors and I uploaded to the quant-ph arXiv what we consider a complete characterization of the power of entangled-prover interactive proof systems by proving the equality ${\text{MIP}^\star = \text{RE}}$, the class of all recursively enumerable languages (a complete problem for RE is the halting problem). Without goign too much into the result itself (if you’re interested, we have a long introduction waiting for you), and since this is a personal blog, I will continue on with some personal thoughts about the path that got us there.

When Julia & I started working on the question, our main source of inspiration were the results by Cleve et al. showing that the nonlocal correlations of entanglement had interesting consequences when seen through the lens of interactive proof systems in complexity theory. Since the EPR paper a lot of work in understanding entanglement had already been accomplished in the Physics community, most notably by Mermin, Peres, Bell, and more recently the works in device-indepent quantum cryptography by Acin, Pironio, Scarani and many others stimulated by Ekert’s proposal for quantum key distribution and Mayers and Yao’s idea for “device-independent cryptography”. By then we certainly knew that “spooky action-at-a-distance” did not entail any faster-than-light communication, and indeed was not really “action-at-a-distance” in the first place but merely “correlation-at-a-distance”. What Cleve et al. recognized is that these “spooky correlations-at-a-distance” were sufficiently special so as to not only give numerically different values in “Bell inequalities”, the tool invented by Bell to evidence nonlocality in quantum mechanics, but also have some potentially profound consequences in complexity theory. In particular, examples such as the “Magic Square game” demonstrated that enough correlation could be gained from entanglement so as to defeat basic proof systems whose soundness relied only on the absence of communication between the provers, an assumption that until then had been wrongly equated with the assumption that any computation performed by the provers could be modeled entirely locally. I think that the fallacy of this implicit assumption came as a surprise to complexity theorists, who may still not have entirely internalized it. Yet the perfect quantum strategy for the Magic Square game provides a very concrete “counter-example” to the soundness of the “clause-vs-variable” game for 3SAT. Indeed this game, a reformulation by Aravind and Cleve-Mermin of a Bell Inequality discovered by Mermin and Peres in 1990, can be easily re-framed as a 3SAT system of equations that is not satisfiable and yet is such that the associated two-player clause-vs-variable game has a perfect quantum strategy. It is this observation, made in the paper by Cleve et al., that gave the first strong hint that the use of entanglement in interactive proof systems could make many classical results in the area go awry.

By importing the study of non-locality into complexity theory Cleve et al. immediately brought it into the realm of asymptotic analysis. Complexity theorists don’t study fixed objects, they study families of objects that tend to have a uniform underlying structure and whose interesting properties manifest themselves “in the limit”. As a result of this new perspective focus shifted from the study of single games or correlations to infinite families thereof. Some of the early successes of this translation include the “unbounded violations” that arose from translating asymptotic separations in communication complexity to the language of Bell inequalities and correlations (e.g. this paper). These early successes attracted the attention of some physicists working in foundations as well as some mathematical physicists, leading to a productive exploration that combined tools from quantum information, functional analysis and complexity theory.

The initial observations made by Cleve et al. had pointed to ${\text{MIP}^\star}$ as a possibly interesting complexity class to study. Rather amazingly, nothing was known about it! They had shown that under strong restrictions on the verifier’s predicate (it should be an XOR of two answer bits), a collapse took place: by the work of Hastad, XOR-MIP equals NEXP, but ${\text{XOR-MIP}^\star}$ is included in EXP. This seemed very fortuitous (the inclusion is proved via a connection with semidefinite programming that seems tied to the structure of XOR-MIP protocols): could entanglement induce a collapse of the entire, unrestricted class? We thought (at this point mostly Julia thought, because I had no clue) that this ought not to be the case, and so we set ourselves to show that the equality ${\text{MIP}^\star=\text{NEXP}}$, that would directly parallel Babai et al.’s characterization MIP=NEXP, holds. We tried to show this by introducing techniques to “immunize” games against entanglement: modify an interactive proof system so that its structure makes it “resistant” to the kind of “nonlocal powers” that can be used to defeat the clause-vs-variable game (witness the Magic Square). This was partially successful, and led to one of the papers I am most proud of — I am proud of it because I think it introduced elementary techniques (such as the use of the Cauchy-Schwarz inequality — inside joke — more seriously, basic things such as “prover-switching”, “commutation tests”, etc.) that are now routine manipulations in the area. The paper was a hard sell! It’s good to remember the first rejections we received. They were not unjustified: the main point of criticism was that we were only able to establish a hardness result for exponentially small completeness-soundness gap. A result for such a small gap in the classical setting follows directly from a very elementary analysis based on the Cook-Levin theorem. So then why did we have to write so many pages (and so many applications of Cauchy-Schwarz!) to arrive at basically the same result (with a ${^\star}$)?

Eventually we got lucky and the paper was accepted to a conference. But the real problem, of establishing any non-trivial lower bound on the class ${\text{MIP}^\star}$ with constant (or, in the absence of any parallel repetition theorem, inverse-polynomial) completeness-soundness gap, remained. By that time I had transitioned from a Masters student in France to a graduate student in Berkeley, and the problem (pre-)occupied me during some of the most difficult years of my Ph.D. I fully remember spending my first year entirely thinking about this (oh and sure, that systems class I had to pass to satisfy the Berkeley requirements), and then my second year — yet, getting nowhere. (I checked the arXiv to make sure I’m not making this up: two full years, no posts.) I am forever grateful to my fellow student Anindya De for having taken me out of the cycle of torture by knocking on my door with one of the most interesting questions I have studied, that led me into quantum cryptography and quickly resulted in an enjoyable paper. It was good to feel productive again! (Though the paper had fun reactions as well: after putting it on the arXiv we quickly heard from experts in the area that we had solved an irrelevant problem, and that we better learn about information theory — which we did, eventually leading to another paper, etc.) The project had distracted me and I set interactive proofs aside; clearly, I was stuck.

About a year later I visited IQC in Waterloo. I don’t remember in what context the visit took place. What I do remember is a meeting in the office of Tsuyoshi Ito, at the time a postdoctoral scholar at IQC. Tsuyoshi asked me to explain our result with Julia. He then asked a very pointed question: the bedrock for the classical analysis of interactive proof systems is the “linearity test” of Blum-Luby-Rubinfeld (BLR). Is there any sense in which we could devise a quantum version of that test?

What a question! This was great. At first it seemed fruitless: in what sense could one argue that quantum provers apply a “linear function”? Sure, quantum mechanics is linear, but that is besides the point. The linearity is a property of the prover’s answers as a function of their question. So what to make of the quantum state, the inherent randomness, etc.?

It took us a few months to figure it out. Once we got there however, the answer was relatively simple — the prover should be making a question-independent measurement that returns a linear function that it applies to its question in order to obtain the answer returned to the verifier — and it opened the path to our subsequent paper showing that the inclusion of NEXP in ${\text{MIP}^\star}$ indeed holds. Tsuyoshi’s question about linearity testing had allowed us to make the connection with PCP techniques; from there to MIP=NEXP there was only one step to make, which is to analyze multi-linearity testing. That step was suggested by my Ph.D. advisor, Umesh Vazirani, who was well aware of the many pathways towards the classical PCP theorem (indeed a lot of the activity that led to the proof of the theorem took place in Berkeley, with many of Umesh’s current or former students making substantial contributions). It took a lot of technical work, yet conceptually a single question from my co-author had sufficed to take me out of a 3-year slumber.

This was in 2012, and I thought we were done. For some reason the converse inclusion, of ${\text{MIP}^\star}$ in NEXP, seemed to resist our efforts, but surely it couldn’t resist much longer. Navascues et al. had introduced a hierarchy of semidefinite programs that seemed to give the right answer (technically they could only show convergence to a relaxation, the commuting value, but that seemed like a technicality; in particular, the values coincide when restricted to finite-dimensional strategies, which is all we computer scientists cared about). There were no convergence bounds on the hierarchy, yet at the same time commutative SDP hierarchies were being used to obtain very strong results in combinatorial optimization, and it seemed like it would only be a matter of time before someone came up with an analysis of the quantum case. (I had been trying to solve a related “dimension reduction problem” with Oded Regev for years, and we were making no progress; yet it seemed someone ought to!)

In Spring 2014 during an open questions session at a workshop at the Simons Institute in Berkeley Dorit Aharonov suggested that I ask the question of the possible inclusion of QMA-EXP, the exponential-sized-proofs analogue of QMA, in ${\text{MIP}^\star}$. A stronger result than the inclusion of NEXP (under assumptions), wouldn’t it be a more natural “fully quantum” analogue of MIP=NEXP? Dorit’s suggestion was motivated by research on the “quantum PCP theorem”, that aims to establish similar hardness results in the realm of the local Hamiltonian problem; see e.g. this post for the connection. I had no idea how to approach the question — I also didn’t really believe the answer could be positive — but what can you do, if Dorit asks you something… So I reluctantly went to the board and asked the question. Joe Fitzsimons was in the audience, and he immediately picked it up! Joe had the fantastic ideas of using quantum error-correction, or more specifically secret-sharing, to distribute a quantum proof among the provers. His enthusiasm overcame my skepticism, and we eventually showed the desired inclusion. Maybe ${\text{MIP}^\star}$ was bigger than ${\text{NEXP}}$ after all.

Our result, however, had a similar deficiency as the one with Julia, in that the completeness-soundness gap was exponentially small. Obtaining a result with a constant gap took 3 years of couple more years of work and the fantastic energy and insights of a Ph.D. student at MIT, Anand Natarajan. Anand is the first person I know of to have had the courage to dive in to the most technical aspects of the analysis of the aforementioned results, while also bringing in the insights of a “true quantum information theorist” that were supported by Anand’s background in Physics and upbringing in the group of Aram Harrow at MIT. (In contrast I think of myself more as a “raw” mathematician; I don’t really understand quantum states other than as psd matrices…not that I understand math either of course; I suppose I’m some kind of a half-baked mish-mash.) Anand had many ideas but one of the most beautiful ones led to what he poetically called the “Pauli braiding test”, a “truly quantum” analogue of the BLR linearity test that amounts to doing two linearity tests in conjugate bases and piecing the results together into a robust test for ${n}$-qubit entanglement (I wrote about our work on this here).

At approximately the same time Zhengfeng Ji had another wonderful idea, that was in some sense orthogonal to our work. (My interpretation of) Zhengfeng’s idea is that one can see an interactive proof system as a computation (verifier-prover-verifier) and use Kitaev’s circuit-to-Hamiltonian construction to transform the entire computation into a “quantum CSP” (in the same sense that the local Hamiltonian problem is a quantum analogue of classical constraint satisfaction problems (CSP)) that could then itself be verified by a quantum multi-prover interactive proof system…with exponential gains in efficiency! Zhengfeng’s result implied an exponential improvement in complexity compared to the result by Julia and myself, showing inclusion of NEEXP, instead of NEXP, in ${\text{MIP}^\star}$. However, Zhengfeng’s technique suffered from the same exponentially small completeness-soundness gap as we had, so that the best lower bound on ${\text{MIP}^\star}$ per se remained NEXP.

Both works led to follow-ups. With Natarajan we promoted the Pauli braiding test into a “quantum low-degree test” that allowed us to show the inclusion of QMA-EXP into ${\text{MIP}^\star}$, with constant gap, thereby finally answering the question posed by Aharonov 4 years after it was asked. (I should also say that by then all results on ${\text{MIP}^\star}$ started relying on a sequence of parallel repetition results shown by Bavarian, Yuen, and others; I am skipping this part.) In parallel, with Ji, Fitzsimons, and Yuen we showed that Ji’s compression technique could be “iterated” an arbitrary number of times. In fact, by going back to “first principles” and representing verifiers uniformly as Turing machines we realized that the compression technique could be used iteratively to (up to small caveats) give a new proof of the fact (first shown by Slofstra using an embedding theorem for finitely presented group) that the zero-gap version of ${\text{MIP}^\star}$ contains the halting problem. In particular, the entangled value is uncomputable! This was not the first time that uncomputability crops in to a natural problem in quantum computing (e.g. the spectral gap paper), yet it still surprises when it shows up. Uncomputable! How can anything be uncomputable!

As we were wrapping up our paper Henry Yuen realized that our “iterated compression of interactive proof systems” was likely optimal, in the following sense. Even a mild improvement of the technique, in the form of a slower closing of the completeness-soundness gap through compression, would yield a much stronger result: undecidability of the constant-gap class ${\text{MIP}^\star}$. It was already known by work of Navascues et al., Fritz, and others, that such a result would have, if not surprising, certainly consequences that seemed like they would be taking us out of our depth. In particular, undecidability of any language in ${\text{MIP}^\star}$ would imply a negative resolution to a series of equivalent conjectures in functional analysis, from Tsirelson’s problem to Connes’ Embedding Conjecture through Kirchberg’s QWEP conjecture. While we liked our result, I don’t think that we believed it could resolve any conjecture(s) in functional analysis.

So we moved on. At least I moved on, I did some cryptography for a change. But Anand Natarajan and his co-author John Wright did not stop there. They had the last major insight in this story, which underlies their recent STOC best paper described in the previous post. Briefly, they were able to combine the two lines of work, by Natarajan & myself on low-degree testing and by Ji et al. on compression, to obtain a compression that is specially tailored to the existing ${\text{MIP}^\star}$ protocol for NEXP and compresses that protocol without reducing its completeness-soundness gap. This then let them show Ji’s result that ${\text{MIP}^\star}$ contains NEEXP, but this time with constant gap! The result received well-deserved attention. In particular, it is the first in this line of works to not suffer from any caveats (such as a closing gap, or randomized reductions, or some kind of “unfair” tweak on the model that one could attribute the gain in power to), and it implies an unconditional separation between MIP and ${\text{MIP}^\star}$.

As they were putting the last touches on their result, suddenly something happened, which is that a path towards a much bigger result opened up. What Natarajan & Wright had achieved is a one-step gapless compression. In our iterated compression paper we had observed that iterated gapless compression would lead to ${\text{MIP}^\star=\text{RE}}$, implying negative answers to the aforementioned conjectures. So then?

I suppose it took some more work, but in some way all the ideas had been laid out in the previous 15 years of work in the complexity of quantum interactive proof systems; we just had to put it together. And so a decade after the characterization QIP = PSPACE of single-prover quantum interactive proof systems, we have arrived at a characterization of quantum multiprover interactive proof systems, ${\text{MIP}^\star = \text{RE}}$. With one author in common between the two papers: congratulations Zhengfeng!

Even though we just posted a paper, in a sense there is much more left to do. I am hopeful that our complexity-theoretic result will attract enough interest from the mathematicians’ community, and especially operator algebraists, for whom CEP is a central problem, that some of them will be willing to devote time to understanding the result. I also recognize that much effort is needed on our own side to make it accessible in the first place! I don’t doubt that eventually complexity theory will not be needed to obtain the purely mathematical consequences; yet I am hopeful that some of the ideas may eventually find their way into the construction of interesting mathematical objects (such as, who knows, a non-hyperlinear group).

That was a good Masters project…thanks Julia!

Posted in meta, QPCP, Quantum, Science | Tagged , , , | 28 Comments

## Randomness and interaction? Entanglement ups the game!

[05/25/19 Update: Kevin Hartnett has a nice article at Quanta explaining Natarajan & Wright’s result in slightly more layman terms than I’d be able to…see here: Computer Scientists Expand the Frontier of Verifiable Knowledge]

The study of entanglement through the length of interactive proof systems has been one of the most productive applications of complexity theory to the physical sciences that I know of. Last week Anand Natarajan and John Wright, postdoctoral scholars at Caltech and MIT respectively, added a major stone to this line of work. Anand & John (hereafter “NW”) establish the following wild claim: it is possible for a classical polynomial-time verifier to decide membership in any language in non-deterministic doubly exponential time by asking questions to two infinitely powerful, but untrusted, provers sharing entanglement. In symbols, NEEXP ${\subseteq}$ MIP${^\star}$! (The last symbol is for emphasis — no, we don’t have an MIP${^\star}$! class — yet.)

What is amazing about this result is the formidable gap between the complexity of the verifier and the complexity of the language being verified. We know since the 90s that the use of interaction and randomness can greatly expand the power of polynomial-time verifiers, from NP to PSPACE (with a single prover) and NEXP (with two provers). As a result of the work of Natarajan and Wright, we now know that yet an additional ingredient, the use of entanglement between the provers, can be leveraged by the verifier — the same verifier as in the previous results, a classical randomized polynomial-time machine — to obtain an exponential increase in its verification power. Randomness and interaction brought us one exponential; entanglement gives us another.

To gain intuition for the result consider first the structure of a classical two-prover one-round interactive proof system for non-deterministic doubly exponential time, with exponential-time verifier. Cutting some corners, such a protocol can be obtained by “scaling up” a standard two-prover protocol for non-deterministic singly exponential time. In the protocol, the verifier would sample a pair of exponential-length questions ${(X,Y)}$, send ${X}$ and ${Y}$ to each prover, receive answers ${A}$ and ${B}$, and perform an exponential-time computation that verifies some predicate about ${(X,Y,A,B)}$.

How can entanglement help design an exponentially more efficient protocol? At first it may seem like a polynomial-time verifier has no way to even get started: if it can only communicate polynomial-length messages with the provers, how can it leverage their power? And indeed, if the provers are classical, it can’t: it is known that even with a polynomial number of provers, and polynomially many rounds of interaction, a polynomial-time verifier cannot decide any language beyond NEXP.

But the provers in the NW protocol are not classical. They can share entanglement. How can the verifier exploit this to its advantage? The key property that is needed is know as the rigidity of entanglement. In words, rigidity is the idea that by verifying the presence of certain statistical correlations between the provers’ questions and answers the verifier can determine precisely (up to a local change of basis) the quantum state and measurements that the provers must have been using to generate their answers. The most famous example of rigidity is the CHSH game: as already shown by Werner and Summers in 1982, the CHSH game can only be optimally, or even near-optimally, won by measuring a maximally entangled state using two mutually unbiased bases for each player. No other state or measurements will do, unless they trivially imply an EPR pair and mutually unbiased bases (such as a state that is the tensor product of an EPR pair with an additional entangled state).

Rigidity gives the verifier control over the provers’ use of their entanglement. The simplest use of this is for the verifier to force the provers to share a certain number ${N}$ of EPR pairs and measure them to obtain identical uniformly distributed ${N}$-bit strings. Such a test for ${N}$ EPR pairs can be constructed from ${N}$ CHSH games. In a paper with Natarajan we give a more efficient test that only requires questions and answers of length that is poly-logarithmic in ${N}$. Interestingly, the test is built on classical machinery — the low-degree test — that plays a central role in the analysis of some classical multi-prover proof systems for NEXP.

At this point we have made an inch of progress: it is possible for a polynomial-time (in ${n=\log N}$) verifier to “command” two quantum provers sharing entanglement to share ${N=2^n}$ EPR pairs, and measure them in identical bases to obtain identical uniformly random ${N}$-bit strings. What is this useful for? Not much — yet. But here comes the main insight in NW: suppose we could similarly force the provers to generate, not identical uniformly random strings, but a pair of ${N}$-bit strings ${(X,Y)}$ that is distributed as a pair of questions from the verifier in the aforementioned interactive proof system for NEEXP with exponential-time (in ${n}$) verifier. Then we could use a polynomial-time (in ${n}$) verifier to “command” the provers to generate their exponentially-long questions ${(X,Y)}$ by themselves. The provers would then compute answers ${(A,B)}$ as in the NEEXP protocol. Finally, they would prove to the verifier, using a polynomial interaction, that ${(A,B)}$ is a valid pair of answers to the pair of questions ${(X,Y)}$ — indeed, the latter verification is an NEXP problem, hence can be verified using a protocol with polynomial-time verifier.

Sounds crazy? Yes. But they did it! Of course there are many issues with the brief summary above — for example, how does the verifier even know the questions ${X,Y}$ sampled by the provers? The answer is that it doesn’t need to know the entire question; only that it was sampled correctly, and that the quadruple ${(X,Y,A,B)}$ satisfies the verification predicate of the exponential-time verifier. This can be verified using a polynomial-time interactive proof.

Diving in, the most interesting insight in the NW construction is what they call “introspection”. What makes multi-prover proof systems powerful is the ability for the verifier to send correlated questions to the provers, in a way such that each prover has only partial information about the other’s question — informally, the verifier plays a variant of prisonner’s dilemma with the provers. In particular, any interesting distribution ${(X,Y)}$ will have the property that ${X}$ and ${Y}$ are not fully correlated. For a concrete example think of the “planes-vs-lines” distribution, where ${X}$ is a uniformly random plane and ${Y}$ a uniformly random line in ${X}$. The aforementioned test for ${N}$ EPR pairs can be used to force both provers to sample the same uniformly random plane ${X}$. But how does the verifier ensure that one of the provers “forgets” parts of the plane, to only remember a uniformly random line ${Y}$ that is contained in it? NW’s insight is that the information present in a quantum state — such as the prover’s half-EPR pairs — can be “erased” by commanding the prover to perform a measurement in the wrong basis — a basis that is mutually unbiased with the basis used by the other prover to obtain its share of the query. Building on this idea, NW develop a battery of delicate tests that provide the verifier the ability to control precisely what information gets distributed to each prover. This allows a polynomial-time verifier to perfectly simulate the local environment that the exponential-time verifier would have created for the provers in a protocol for NEEXP, thus simulating the latter protocol with exponentially less resources.

One of the aspects of the NW result I like best is that they showed how the “history state barrier” could be overcome. Previous works attempting to establish strong lower bounds on the class MIP${^\star}$, such as the paper by Yuen et al., relies on a compression technique that requires the provers to share a history state of the computation performed by a larger protocol. Unfortunately, history states are very non-robust, and as a result such works only succeeded in developing protocols with vanishing completeness-soundness gap. NW entirely bypass the use of history states, and this allows them to maintain a constant gap.

Seven years ago Tsuyoshi Ito and I showed that MIP${}^\star$ contains NEXP. At the time, we thought this may be the end of the story — although it seemed challenging, surely someone would eventually prove a matching upper bound. Natarajan and Wright have defeated this expectation by showing that MIP${^\star}$ contains NEEXP. What next? NEEEXP? The halting problem? I hope to make this the topic of a future post.

Posted in CHSH, QPCP, Quantum, Uncategorized | Tagged | 8 Comments

## The cryptographic leash

This post is meant as a companion to an introductory post I wrote on the blog of Caltech’s IQIM (Institute for Quantum Information and Matter), of which I am a member. The post describes a “summer cluster” on quantum computation that I co-organized with Andrew Childs, Ignacio Cirac, and Umesh Vazirani at the Simons Institute in Berkeley over the past couple months. The IQIM post also describes one of the highlights of the workshop we organized as part of this program: the recent result by Mahadev on classical verification of quantum computation. The present post is a continuation of that one, so that I would encourage you to read it first. In this post my goal is to give additional detail on Mahadev’s result. For the real thing you should of course read the paper (you may want to start by watching the author’s beautiful talk at our workshop). What follows is my attempt at an introduction, in great part written for the sake of clarifying my own understanding. I am indebted to Urmila for multiple conversations in which she indefatigably answered my questions and cleared my confusions — of course, any remaining inaccuracies in this post are entirely mine.

The result

Let’s start by recalling Mahadev’s result. She shows that from any quantum computation, specified by a polynomial-size quantum circuit ${C}$, it is possible to efficiently compute a classical-verifier quantum-prover protocol, i.e.~a prescription for the actions of a classical probabilistic polynomial-time verifier interacting with a quantum prover, that has the following properties. For simplicity, assume that ${C}$ produces a deterministic outcome ${o(C)\in\{0,1\}}$ when it is executed on qubits initialized in the state ${| 0 \rangle}$ (any input can be hard-coded in the circuit). At the end of the protocol, the verifier always makes one of three possible decisions: “reject”; “accept, 0”; “accept, 1”. The completeness property states that for any circuit ${C}$ there is a “honest” behavior for the prover that can be implemented by a polynomial-time quantum device and that will result in the verifier making the decision “accept, ${o(C)}$”, where ${o(C)}$ is the correct outcome, with probability ${1}$. The soundness property states that for any behavior of the quantum prover in the protocol, either the probability that the verifier returns the outcome “accept, ${1-o(C)}$” is negligibly small, or the quantum prover has the ability to break a post-quantum cryptographic scheme with non-negligible advantage. Specifically, the proof of the soundness property demonstrates that a prover that manages to mislead the verifier into making the wrong decision (for any circuit) can be turned into an efficient attack on the learning with errors (LWE) problem (with superpolynomial noise ratio).

The fact that the protocol is only sound against computationally bounded provers sets it apart from previous approaches, which increased the power of the verifier by allowing her to dispose of a miniature quantum computer, but established soundness against computationally unbounded provers. The magic of Mahadev’s result is that she manages to leverage this sole assumption, computational boundnedness of the prover, to tie a very tight “leash” around its neck, by purely classical means. My use of the word “leash” is not innocent: informally, it seems that the cryptographic assumption allows Mahadev to achieve the kind of feats that were previously known, for classical verifiers, in the model where there are two quantum provers sharing entanglement. I am not sure how far the analogy extends, and would like to explore it further; this has already started with a collaboration with Brakerski, Christiano, Mahadev and Vazirani that led to a single-prover protocol for certifiable randomness expansion. Nevertheless, the main open question left open by Mahadev’s work remains whether the computational assumption is even necessary: could a similar result hold, where the honest prover can perform the required actions in quantum polynomial-time, but the protocol remains sound against arbitrarily powerful provers? (Experts will have recognized that the existence of a protocol where the honest prover is as powerful as PSPACE follows from the classical results that BQP is in PSPACE, and that PSPACE=IP. Unfortunately, we currently don’t expect even a supercharged AWS cloud to be able to implement PSPACE-complete computations.)

Encoding computation in ground states

Let’s get to business: how does this work? Fix a quantum circuit ${C}$ that the verifier is interested in. Assume the description of ${C}$ known to both the verifier and the prover. As earlier, assume further that when ${C}$ is executed on a state initialized to ${| 0 \rangle}$ a measurement of the output qubit of the circuit returns either the outcome ${0}$ or the outcome ${1}$, deterministically. The verifier wishes to determine which case holds.

The first step that the verifier performs is a classical polynomial-time reduction from this circuit output decision problem to the following Hamiltonian energy decision problem. In the Hamiltonian energy decision problem the input is the description of a pair of classical polynomial-time randomized circuits. The first circuit, ${S}$, takes as input a random string ${r}$, and returns a string ${\theta\in\{X,Z\}^n}$. The second circuit, ${V}$, takes as input a string ${\theta\in\{X,Z\}^n}$ of the kind returned by the first circuit, as well as an ${n}$-bit string ${a\in\{0,1\}^n}$, and returns a “decision bit” ${b\in \{0,1\}}$. The goal of the verifier is to distinguish between the following two cases. Either there exists an ${n}$-qubit state ${\rho}$ such that, when a string ${\theta}$ is sampled according to ${S}$ (choosing a uniformly random ${r}$ as input), the ${n}$ qubits of ${\rho}$ are measured in the bases specified by ${\theta}$ (i.e.~the ${i}$-th qubit is measured in the computational basis in case ${\theta_i=Z}$, and in the Hadamard basis in case ${\theta_i=X}$), the resulting ${n}$-bit outcome ${a}$ satisfies ${V(\theta,a)=1}$ with probability at least ${3/4}$. Or, for any state ${\rho}$, the same procedure results in ${V(\theta,a)=0}$ with probability at most ${2/3}$.

I called this problem the Hamiltonian energy decision problem because the circuits ${S}$ and ${V}$ implicitly specify a Hamiltonian, whose minimal energy the verifier aims to approximate. Note that the Hamiltonian is not required to be local, and furthermore it may involve an average of exponentially many terms (as many as there are random strings ${r}$). The problem is still in QMA, because the verifier is efficient. It is not hard to show that the problem is QMA-hard. What the formulation above buys us, compared to using the usual QMA-complete formulation of the local Hamiltonian problem, is the constant energy gap — which comes at the cost of exponentially many terms and loss of locality. (Open question: I would like to know if it is possible to achieve a constant gap with only one of these caveats: local with exponentially many terms, or nonlocal with polynomially terms.) Of course here we only care that the problem is BQP-hard, and that the witness can be computed by a BQP prover; this is indeed the case. We also don’t really care that there is a constant gap – the soundness of the final protocol could be amplified by other means – but it is convenient that we are able to assume it.

The reduction that achieves this is a combination of Kitaev’s history state construction with some gadgetry from perturbation theory and an amplification trick. The first step reduces the verification that ${C}$ returns outcome ${1}$ (resp. ${0}$) on input ${| 0 \rangle}$ to the verification that a local Hamiltonian ${H}$ (computed from ${C}$) has ground state energy exponentially close to ${0}$ (resp. at least some positive inverse polynomial). The second step consists in applying perturbation theory to reduce to the case where ${H}$ is a weighted linear combination of terms of the form ${X_iX_j}$ and ${Z_iZ_j}$, where ${X_i}$, ${Z_j}$ are the Pauli ${X}$ and ${Z}$ operators on the ${i}$-th and ${j}$-th qubit respectively. The final step is an amplification trick, that produces a nonlocal Hamiltonian whose each term is a tensor product of single-qubit ${X}$ and ${Z}$ observables and has ground state energy either less than ${1/4}$ or larger than ${1/3}$ (when the Hamiltonian is scaled to be non-negative with norm at most ${1}$).

These steps are fairly standard. The first two are combined in a paper by Fitzsimons and Morimae to obtain a protocol for “post-hoc” verification of quantum computation: the prover prepares the ground state of an ${XZ}$ local Hamiltonian whose energy encodes the outcome of the computation, and sends it to the verifier one qubit at a time; the verifier only needs to perform single-qubit ${X}$ and ${Z}$ measurements to estimate the energy. The last step, amplification, is described in a paper with Natarajan, where we use it to obtain a multi-prover interactive proof system for QMA.

For the remainder of this post, I take the reduction for granted and focus on the core of Mahadev’s result, a verification protocol for the following problem: given a Hamiltonian of the form described in the previous paragraph, decide whether the ground state energy of ${H}$ is smaller than ${1/4}$, or larger than ${1/3}$.

Stitching distributions into a qubit

In fact, for the sake of presentation I’ll make one further drastic simplification, which is that the verifier’s goal has been reduced to verifying the existence of a single-qubit state ${\rho}$, whose existence is claimed by the prover. Specifically, suppose that the prover claims that it has the ability to prepare a state ${| \psi \rangle}$ such that ${\langle \psi |X| \psi \rangle = E_X}$, and ${\langle \psi |Z| \psi \rangle=E_Z}$, for real parameters ${E_X,E_Z}$. In other words, that the Hamiltonian ${H = \frac{1}{2}(X+Z)}$ has minimal energy at most ${\frac{1}{2}(E_X+E_Z)}$. How can one verify this claim? (Of course we could do it analytically\ldots{}but that approach would break apart as soon as expectation values on larger sets of qubits are considered.)

We could ask the prover to measure in the ${X}$ basis, or the ${Z}$ basis, repeatedly on identical copies of ${| \psi \rangle}$, and report the outcomes. But how do we know that all these measurements were performed on the same state, and that the prover didn’t choose e.g. ${| \psi \rangle=| 1 \rangle}$ to report the ${Z}$-basis outcomes, and ${| \psi \rangle=| - \rangle}$ to report the ${X}$-basis outcomes? We need to find a way to prevent the prover from measuring a different state depending on the basis it is asked for — as well as to ensure the measurement is performed in the right basis.

Committing to a qubit

The key idea in Mahadev’s protocol is to use cryptographic techniques to force the prover to “commit” to the state ${| \psi \rangle}$ in a way that, once the commitment has been performed, the prover no longer has the liberty to “decide” which measurement it performs on the commited qubit (unless it breaks the cryptographic assumption).

I described the commitment scheme in the companion post here. For convenience, let me quote from that post. Recall that the scheme is based on a pair of trapdoor permutations ${f_0,f_1:\{0,1\}^n \rightarrow \{0,1\}^n}$ that is claw-free. Informally, this means that it is hard to produce any pair ${(x_0,x_1)}$ such that ${f_0(x_0)=f_1(x_1)}$.

The commitment phase of the protocol works as follows. Starting from a state ${| \psi \rangle=\alpha| 0 \rangle+\beta| 1 \rangle}$ of its choice, the prover is supposed to perform the following steps. First, the prover creates a uniform superposition over the common domain of ${f_0}$ and ${f_1}$. Then it evaluates either function, ${f_0}$ or ${f_1}$, in an additional register, by controlling on the qubit of ${| \psi \rangle}$. Finally, the prover measures the register that contains the image of ${f_0}$ or ${f_1}$. This achieves the following sequence of transformations:

$\displaystyle \begin{array}{rcl} \alpha| 0 \rangle+\beta| 1 \rangle &\mapsto& (\alpha| 0 \rangle + \beta| 1 \rangle) \otimes \Big(2^{-n/2} \sum_{x\in\{0,1\}^n} | x \rangle\Big) \\ &\mapsto & 2^{-n/2} \sum_x \alpha | 0 \rangle| x \rangle| f_0(x) \rangle + \beta | 1 \rangle| f_1(x) \rangle\\ &\mapsto & \big(\alpha| 0 \rangle| x_0 \rangle+\beta| 1 \rangle| x_1 \rangle\big)| y \rangle\;, \end{array}$

where ${y\in\{0,1\}^n}$ is the measured image. The string ${y}$ is the prover’s commitment string, that it reports the verifier.

The intuition for this commitment procedure is that it introduces asymmetry between prover and verifier: the prover knows ${y}$ (it had to report it to the verifier) but not ${x_0}$ and ${x_1}$ (this is the claw-free assumption on the pair ${(f_0,f_1)}$), which seems to prevent it from recovering the original state ${| \psi \rangle}$, since it does not have the ability to “uncompute” ${x_0}$ and ${x_1}$. In contrast, the verifier can use the trapdoor information to recover both preimages.

In a little more detail, how is this used? Note that at this point, from the verifier’s point of view the only information that has been received is the prover’s commitment string ${y}$. In general there are multiple ways a prover could have come up with a value ${y}$: for example, by selecting an ${x\in\{0,1\}^n}$ and returning ${y=f_0(x)}$. Or, by directly selecting an arbitrary string ${y\in\{0,1\}^n}$. At this stage of the protocol, any of these strategies look fine.

Let’s modify the commitment phase by adding a little test. With some probability, the verifier, upon receiving the commitment string ${y}$, decides to challenge the prover by asking it to report a valid preimage of ${y}$, either under ${f_0}$ or under ${f_1}$ (to the prover’s choice). Since both ${f_0}$ and ${f_1}$ are presumed to be hard to invert, the only way the prover can answer this challenge is if it already “knows” a valid preimage — or at a minimum, if it has a superposition on preimages that it can measure when tested. Thus the fact that the prover is required to succeed in the commitment test, when it is performed, guarantees that after the prover has returned the commitment string ${y}$ we may without loss of generality assume that the prover’s state can be written as

$\displaystyle | \tilde{\psi} \rangle=\tilde{\alpha}| 0,x_0 \rangle| \phi_0 \rangle+\tilde{\beta}| 1,x_1 \rangle| \phi_1 \rangle\;, \ \ \ \ \ (1)$

where we have purposefully spelled out the two possible preimages that the prover could return if challenged. Note that aside from the fact that it gives the ability to obtain ${x_0}$ or ${x_1}$, this format does not make any assumption on ${| \tilde{\psi} \rangle}$; in particular the register containing the preimage can be entangled with other private registers of the prover.

We have defined a four-message commitment protocol: the verifier sends the security parameters to prover; the prover sends a commitment string ${y}$ back; an optional one-round preimage test is executed. Now is the time to give a first definition for the single qubit to which the prover has “committed” by returning ${y}$. This committed qubit is the state ${\sigma}$ that we ultimately aim to show has the claimed expectation values under ${X}$ and ${Z}$ measurements.

Let ${\sigma}$ be the qubit obtained from ${| \tilde{\psi} \rangle}$ by erasing ${x_0}$ and ${x_1}$ (which is possible given knowledge of ${y}$) and returning the first qubit of the resulting state. (Later we will slightly modify this definition, but it is a good placeholder to get us started.) Note that the verifier does not know the state ${\sigma}$; in fact, strictly speaking ${\sigma}$ is not present on the prover’s workspace either. The point is that ${\sigma}$ exists, and this is all we need. Our remaining task is to find a way for the verifier to extract from the prover measurement outcomes that are distributed as would be a measurement of ${\sigma}$ in the ${X}$ or ${Z}$ basis, without the prover having the ability to deviate. If the verifier can do this, for a basis of her choice, she can choose a basis at random, estimate the expectation value, and check the prover’s claim (the values ${E_X}$ or ${E_Z}$).

As already mentioned, the key point that we’ll use in order to achieve this is that at the end of the commitment phase, the verifier has obtained some leverage over the prover: given ${y}$ and the trapdoor information, the verifier can recover both ${x_0}$ and ${x_1}$. In contrast, the prover, while it holds the state ${| \tilde{\psi} \rangle}$, is not able to freely operate on it. Without the trapdoor, it can no longer uncompute ${x_0}$ and ${x_1}$ to recover the initial state ${| \psi \rangle}$, and so it can’t obviously apply, say, the unitary on ${| \tilde{\psi} \rangle}$ that would amount to performing a single-qubit rotation on ${| \psi \rangle}$.

Measuring in the computational basis

We need to explain how the verifier extracts measurement outcomes in the X (Hadamard) or Z (computational) basis from the prover. For each basis there is a small sub-protocol. At the end of the sub-protocol the verifier records a single bit, that it considers is the outcome obtained by a measurement of the committed qubit, ${\sigma}$, in the corresponding basis. We call this bit the verifier’s “decoded bit” for that basis.

The protocol for extracting the outcome of a measurement in the computational basis is straightforward. Recall that by definition the prover’s state after the commitment phase has ended is the state ${| \tilde{\psi} \rangle}$ in (1). Moreover, recall that we made a choice of basis for the provers’ space such that when the prover is challenged for a preimage of ${y}$, it measures the first ${(n+1)}$ qubits of ${| \tilde{\psi} \rangle}$ in the computational basis and returns the outcome. Now observe that the first bit of this outcome is ${0}$ with probability ${|\tilde{\alpha}|^2}$, and ${1}$ with probability ${|\tilde{\beta}|^2}$. This is exactly the distribution of the outcome of a measurement of the committed qubit ${\sigma}$ in the computational basis, by definition! Thus to extract a measurement outcome in the computational basis the verifier simply executes the preimage test and records the first bit returned by the prover as the decoded bit.

Extracting a measurement outcome in the Hadamard basis is more delicate. Recall the form of the prover’s state in (1). Given our definition of the committed qubit ${\sigma}$, the natural way to obtain a measurement of ${\sigma}$ in the Hadamard basis, starting from ${| \tilde{\psi} \rangle}$, is to first erase the register containing ${x_0}$ and ${x_1}$, and then perform a Hadamard measurement of the first qubit. But even an honest prover cannot accomplish this, as it does not have the trapdoor information that would allow to erase ${x_0}$ and ${x_1}$ (of course we purposefully set things up this way). What the prover can do, however, is measure all ${n}$ qubits of the register containing ${x_0}$ and ${x_1}$ in the Hadamard basis. The result of this measurement is an ${n}$-bit string ${d\in\{0,1\}^n}$. The corresponding post-measurement state is, up to global phase,

$\displaystyle \tilde{\alpha}| 0 \rangle| \phi_0 \rangle+(-1)^{d\cdot(x_0+ x_1)}\tilde{\beta}| 1 \rangle| \phi_1 \rangle\;,$

where the addition ${x_0+x_1}$ is taken bitwise, modulo ${2}$. Observe that this state is almost exactly the committed qubit ${\sigma}$ — except for a phase flip, ${Z^{d\cdot(x_0\oplus x_1)}}$, applied on the first qubit. If the prover measures the remaining qubit in the Hadamard basis, the phase flip leads to a bit flip on the outcome ${m}$ of the measurement. So the verifier can ask the prover to report both ${d}$ and ${m}$; if she recourds the decoded bit ${b=m\oplus d\cdot (x_0\oplus x_1)}$ then this bit matches the outcome of a measurement of ${\sigma}$ in the Hadamard basis.

This completes the description of the measurement sub-protocol for the Hadamard basis. It is clear that a honest prover, performing the actions described above, will induce the verifier into recording the correct outcome. Now of course in general the prover may act in an arbitrary way! It could report any values for ${(m,d)}$: the verifier accepts any outcomes on faith. How could this possibly work out? There is magic in Mahadev’s proof.

Malicious provers

Let’s assume, as we already have, that the prover is arbitrary but that, if tested in the commitment phase, it succeeds with certainty. According to the discussion around (1) this implies that at the end of the commitment phase the prover holds a state of the form ${| \tilde{\psi} \rangle}$. Moreover, by definition, when asked for a computational basis measurement the prover measures the first ${(n+1)}$ qubits of ${| \tilde{\psi} \rangle}$ in the computational basis and reports the outcome; the verifier records the first bit as its decoded bit.

As we already argued, our earlier definition of the committed qubit ${\sigma}$ ensures that the verifier’s decoded bit for the case of a computational basis measurement matches the outcome of a measurement of ${\sigma}$ in the computational basis. Unfortunately for the case of a Hadamard basis measurement we are in trouble. Since the prover may in principle report an arbitrary pair ${(m,d)}$ there is no chance to argue that this matches (in distribution) the outcome of a measurement of ${\sigma}$ in the Hadamard basis. To find a state that is consistent with the verifier’s decoded bit in both bases we need to change our definition of the committed qubit to take into account the prover’s action in the case it is asked for a Hadamard measurement.

Recall that the main leverage that the verifier has over the prover is that, while the prover does have the possibility of reporting arbitrary outcomes ${(m,d)}$, it does not have control over the verifier’s decoding, i.e.~the operation ${b\leftarrow m\oplus (d\cdot(x_0+ x_1))}$. Let’s work a little bit and spell out the distribution of the verifier’s Hadamard basis decoded bit, ${b}$. Towards this it is convenient to think of the prover in the following way: the prover first applies an arbitrary unitary “attack” ${U}$ on ${| \tilde{\psi} \rangle}$, then “honestly” measures the first ${(n+1)}$ qubits in the Hadamard basis, and finally reports the ${(n+1)}$-bit outcome ${(m,d)}$. An arbitrary ${(n+1)}$-bit-outcome measurement can always be expressed in this way. With this setup we can write the probability that the decoded bit is some value ${b\in\{0,1\}}$ as

$\displaystyle \Pr(b) \,=\, \sum_{d\in\{0,1\}^n} \langle \tilde{\psi} | U^\dagger H \big((X^{d\cdot(x_0+x_1)} | b \rangle\!\langle b |X^{d\cdot(x_0+x_1)}) \otimes | d \rangle\!\langle d |\big)HU| \tilde{\psi} \rangle\;. \ \ \ \ \ (2)$

Before we can proceed we should say a little more about the computational assumptions that are placed on the pair of functions ${(f_0,f_1)}$. Earlier we mentioned this pair of functions should be claw-free, but in fact a little more is needed — though all requirements can ultimately be met by a construction based on the Learning With Errors problem. Rather than state the exact assumptions, I will mention two important consequences. The first is that the pair of functions is “collapsing”, a notion introduced by Unruh in his investigations of collision-resistance against quantum attacks. In our context this property implies that it is computationally hard to distinguish between an arbitrary superposition over preimages, as in ${| \tilde{\psi} \rangle}$, and the “collapsed” state obtained by measuring the control register (the first qubit). The second is that for any ${n}$-bit string ${d}$ that can be obtained as the outcome of an arbitrary, but computationally efficient, measurement on the collapsed state, the bit ${d\cdot(x_0+x_1)}$ is computationally indistinguishable from uniform. (This is analogous to a “hardcore bit” property, since ${x_0+x_1}$ encodes information about both preimages simultaneously, and such information should not be accessible if the pair ${(f_0,f_1)}$ is claw-free.)

These two assumptions taken together justify the following two modifications to the expression for ${\Pr(b)}$ in (2), that will lead to a computationally indistinguishable distribution. First, we can “collapse” the first ${(n+1)}$ qubits of ${| \tilde{\psi} \rangle}$ by measuring them in the computational basis. Second, we can replace the bit ${d\cdot(x_0+x_1)}$ by a uniformly random bit ${r}$. Using that ${\sum_d | d \rangle\!\langle d |=\ensuremath{\mathop{\rm Id}\nolimits}}$, the expression simplifies to

$\displaystyle {\Pr}^{'}(b) \,=\, \frac{1}{4}\sum_{r\in\{0,1\}} \langle \tilde{\psi} | (Z^r \otimes \ensuremath{\mathop{\rm Id}\nolimits}) (U^\dagger) (Z^r \otimes \ensuremath{\mathop{\rm Id}\nolimits}) H \big( | b \rangle\!\langle b |) \otimes \ensuremath{\mathop{\rm Id}\nolimits}\big) (Z^{r} \otimes \ensuremath{\mathop{\rm Id}\nolimits}) HU (Z^{r} \otimes \ensuremath{\mathop{\rm Id}\nolimits}) | \tilde{\psi} \rangle\;, \ \ \ \ \ (3)$

where the outermost ${Z^r}$ were inserted thanks to the first assumption (the collapsing property), and the innermost ${Z^r}$ come from commuting ${X^r}$ past the Hadamard. I should clarify that obtaining (3) formally requires more care. In particular, I made use of computational indistinguishability in an expression that involves a quantity that is hard to compute (the parity ${x_0+x_1}$). This is illegal, and to work around the difficulty Mahadev has to introduce some additional ingenious manipulations that I am skipping here.

Note the key effect that the random ${Z^r}$ operator has in (3): it effectively trivializes the action of the prover’s “attack” ${U}$ on the first qubit with respect to the computational basis. Thus the result of this argument is that we have managed to argue that the verifier’s decoded bit ${b}$ associated with the Hadamard basis is computationally indistinguishable from the outcome of a Hadamard measurement on the state

$\displaystyle \sigma' = \mbox{\rm Tr}_E\big((I \otimes U_I) | \tilde{\psi} \rangle\langle \tilde{\psi} | (I\otimes U_I)^\dagger+ (X\otimes U_X) | \tilde{\psi} \rangle\langle \tilde{\psi} | (X \otimes U_X)^\dagger \big) \;,$

where we expanded the first qubit of the unitary ${U}$ as ${U = I\otimes U_I + X \otimes U_X + Z\otimes U_Z + XZ \otimes U_{XZ}}$, and ${E}$ represents all registers except the first qubit. Note that the second term involves an ${X}$ on the first qubit, which has no effect on a measurement in the Hadamard basis. Thus, ${\sigma'}$ can be updated to a state ${\sigma''}$ where we have “erased” the ${X}$ operator on the first qubit. Moreover, by definition, a measurement of the first (and only) qubit of ${\sigma''}$ in the computational basis yields an outcome distributed exactly as it would on ${\sigma}$. In particular, it is consistent with the verifier’s decoded bit in the computational basis measurement protocol.

We are done! The state ${\sigma'}$ is a well-defined single-qubit state such that the distribution of decoded bits recorded by the verifier for either basis is computationally indistinguishable from the distribution of outcomes of a measurement of ${\sigma'}$ in the same basis. Note that ${\sigma'}$ may not “exist” at any point of the protocol. But this is besides the point: as long as ${\sigma'}$ is a well-defined quantum state, and the verifier correctly records decoded measurement outcomes, this eventually leads to a valid certificate for the prover’s claim that the XZ Hamiltonian that encodes the computation has low enough energy.

Phew. Catch your breath, read this post again (and please do ask for clarifications as needed), and then move on to the beautiful paper, whose introduction already has more depth than I could provide here, and whose body fills in all the remaining gaps. (This includes how to deal with states that are more than a single qubit, an issue that my presentation of the single-qubit case may make seem more thorny than it is — in fact, it is possible to express the argument given here in a way that makes it relatively straightforward to extend to multiple qubits, though there are some technical issues, explained in Mahadev’s paper.) And then – use the idea to prove something!

Posted in Conferences, QCrypto, Simons | | 1 Comment

## UCSD Spring school on Quantum Computation

A couple months from now Dorit AharonovDavid Gosset and myself will be giving a short 3.5-day “Spring School” that is meant to be an introduction to recent topics in quantum computing, directed at young researchers in theoretical computer science at large. The school is organized by Shachar Lovett at the University of California in San Diego, from March 19th to 22nd (these dates coincide with Spring break at many universities). Shachar has been organizing such schools successfully for multiple years now (last year’s school was taught by Boaz Barak and David Steurer on Sums of Squares), and we hope that this year’s will be just as fun (and instructive). The (free) registration deadline is on February 1st, and we have limited funds available to support travel for a few needy students apply here by February 1st.

The past two years, and possibly even more so the coming couple years, may well be remembered as the moment when quantum computing entered the mainstream. Most of us have heard of IBM’s quantum computer in the cloud, of Google’s effort in , and of Microsoft’s naturally fault-tolerant \href. Some of us might also have encountered a few of the dozens of startups promising everything from quantum hardware to quantum learning, that seem to be appearing out of nowhere, raising capital in just a few months.

It is an interesting question, better left for wiser times, whether these events will be remembered as the initial sparks of a revolution in computing, or as the height of a “quantum bubble”. Bubble or no bubble, quantum information science is here to stay: while current developments make topics such as the computational power of small-scale quantum computers, the possibilities for testing quantum mechanics, all the more exciting, quantum cryptography, the theory of quantum error-correction, the ever-increasing applications of “quantum techniques” to problems from theoretical computer science, do not hinge on the success of current experiments.

In guise of teaser, our plan for the school is roughly as follows. Each day will have about 6 hours of lecture, a couple hours of informal “TA sessions” (to learn a language, one needs to practice it!), and some time for social interaction. This is a fairly heavy schedule, but if these are the 3.5 days you are going to spend learning about quantum information in your career, we want them to be useful. What this means is that we’ll simultaneously aim to cover the basics, so as to establish a common language, while quickly zooming in to a selection of the most interesting questions, such as the power of alternate models of quantum computation, the theory of quantum error-correction and fault-tolerance, or problems in quantum testing and quantum delegation.

In a little more detail, and although you should not treat this as contractual information, here is a sketch of our program for the school:

Day 1: Introduction to quantum information: one qubit, ${n}$qubits, the quantum circuit model, simple algorithms and computational speed-ups in the query model. Introduction to quantum complexity: the class QMA and the local Hamiltonian problem.

Day 2: Protocols for delegating quantum computations. The adiabatic model of computation and its equivalence to the circuit model. Quantum error-correction, stabilizer codes, and fault tolerance.

Day 3: Restricted models of computation (shallow circuits, commuting circuits). Testing quantum systems. The quantum PCP conjecture and connection to many-body entanglement. Multi-prover interactive proofs with multiple provers sharing entanglement. Quantum linearity testing.

Day 4: More restricted models of computation. Quantum optimization algorithms. Stoquastic Hamiltonians. Quantum Monte-Carlo and simulation.

If you’re not an expert in quantum information, a lot of these topics might not make much sense a priori. This is why you should come! Our goal in these 3.5 days is to summarize what we believe ought to be the highlights of a couple semesters’ worth of graduate courses in quantum information. Aside from the basics in the first day, each lecture will cover a topic of current interest, giving you the ability to understand the importance of recent progress, and start thinking about some of the more TCS-friendly problems. Towards this, we’ll highlight as many open problems as we can think of (and fit in the alloted time), and allow ample time for questions, discussions, and hands-on exercise sessions. Join us: register here!

## A beginner’s guide to PC chairing

I recently had the privilege to serve as program committee (PC) chair for the yearly conference on quantum cryptography, QCRYPT’17 (note: for obvious public-relations reasons all names of conferences, PC members and authors in this post have been replaced by entirely fictional aliases). Although I had served on the PC of QCRYPT (and other conferences and workshops in quantum information and theoretical computer science) before, this was my first experience as chair. Since I am not aware of any publicly available resources that would help prepare one for this task, I thought I would share selected aspects of my experience here.

It is easiest to organize the discussion in chronological order: I will go through all the steps (and missteps), from the initial invitation email to the final notification to authors (probably safer to stop there – the next step would drag me into a discussion of the authors’ reaction, the consequences of which even the use of aliases may not save me from).

## Lights:

You just received that glowing email — “Dear XX, would you be interested to serve as PC chair for ConfY’18?”. Followed by the obligatory series of flattering comments. Such an honor… who would refuse? But don’t jump on that reply-send button just now. Here are a few points to take into consideration before making the decision.

First off, carefully consider the reviewing schedule. The dates of the conference are likely decided already, giving you a good sense of when the submission and notification deadlines will fall. The period in-between represents two to four months of your working life. Are you ready to give them up? I estimate that most days within that period you will have to allocate one to two hours’ work to the PC reviewing process (the load is not evenly spread: during the reviewing phase, it depends how many papers you assign yourself to review; during the discussion phase, it depends on how active you are, whether there is an in-person meeting, etc.). This is a serious commitment, comparable in load to taking on an additional 12-week teaching job. So if you’re already planning on teaching two courses during the same period – think twice.

A second point to consider discussing upfront with the steering committee (SC) is your “mission”. The SC probably has its own idea of the scope of the conference (there might even be a charter), how many papers they would like to be accepted, what justifies a “ConfY-worthy paper”, etc. How rigid are they going to be regarding these points? How much interference can you expect — do you have full latitude in deciding final acceptances (should be)? How flexible is the final number of accepts?

Last but not least, make sure this is something you want to do. How good is ConfY? Does it serve a specific purpose that you value? How often have you attended, or served on the PC? Do you feel competent to make decisions across all areas covered by the conference? Check the past couple years’ accepts. Many conferences are broader than we think, just because when we attend we tend to unconsciously apply a selective bias towards those talks for which we can at least parse the title. This time you’ll have to understand the contents of every single one of the submitted (let alone accepted) papers. So again, is this something you {\textit want} to do?

## Camera,

Selecting the PC. Now that the fatal decision has been made, my first piece of advice is all too simple: seek advice. Your first task is to form a PC. This is clearly the most influential decision you will make, both in terms of the quality and content of the final program, as well as the ease with which you and your “team” will get there. Choosing a PC is a delicate balancing act. A good mix of seniority and young blood is needed: seniority for the experience, the perspective, and the credibility; young blood for the energy, the taste for novelty, the muscle power. It is a good idea to involve a PC member from the previous installment of the conference; this may in particular help with the more difficult cases of resubmission.

I was fortunate to receive multiple recommendations from the SC, past conference chairs, and colleagues. While you obviously want to favor diversity and broad representation of topic areas, I also recommend selecting PC members with whom one has a personal connection. My experience has been that the amount of effort any one person is willing to put into the PC process varies hugely. It is inevitable that some PC members will eventually drift away. The more connection you have to them the easier it will be to handle irresponsiveness or divergences of opinion.

The more important comment I will make, one which I wish I had been more keenly aware of, is to know your PC. You will eventually select a team of researchers with complementary qualities, not only in terms of the areas that they are familiar with but also in more human terms: some will be good at responding to “quick opinion” calls on difficult papers, while others will have insightful comments about the overall balance of papers in the emerging list of accepted papers, or generate thoughtful advice on the handling of the more tricky cases, etc. At multiple points in the process you will need help; it is crucial to know the right person to turn to, lest you waste precious days or make ill-informed decisions.

With a list of names in hand, you are ready to send out invitations. (Before doing so, consider forming a rough schedule for the reviewing process. This will be needed for PC members to decide whether they will be sufficiently available during the requisite periods.) In my experience this part went smoothly. About ${75\%}$ of those on my initial list accepted the invitation (thanks!!). Filling in the remaining slots took a little more time. A small tip: if a researcher does not respond to your invitation within a reasonable delay, or is slow to decide whether to join or not, don’t push too hard: while you need a strong PC, you also need a responsive PC. It is not a good idea to start off in a situation where someone is “doing you a favor” by accepting the invitation as a result of some heavy arm-twisting.

Drafting a CFP. The second main item on your pre-submission agenda is the drafting of a call for papers (CFP). This may be done in cooperation with the SC. CFP from previous years can serve as a starting point. Check with last year’s PC chair if they were happy with the wording used, or if they have a posteriori recommendations: did they omit an important area of interest? Were the submission instructions, including formatting guidelines, clear?

A good CFP balances out two conflicting desirata: first, it should make your life easier by ensuring that submissions follow a reasonably homogeneous format, and are presented in a way that facilitates the reviewing process; second, it should not place an unreasonable burden on the authors who, as we all know, have better things to do (and will read the instructions, if they ever read them, no earlier than 23:59 in any timezone – making an overly precise CFP a sure recipe for disaster).

One place where precision is needed is in the formulation of the requirements for rigor and completeness. Are full proofs expected, or will a short 3-page abstract suffice? Or should it be both – a short abstract clearly presenting the main ideas, together with a full paper providing precise complete proofs? Be warned that, whatever the guidelines, they will be stretched, forcing you into quick judgment calls as to whether a submission fits the CFP guidelines.

You should also pay attention to the part of the CFP that concerns the scope of the conference: although for all I know this is all but ignored by most authors, and varies little from year to year, it does play an important role in carving out an inch of originality and specificity for the conference.

Another item on the CFP is the “key dates” that will bound the time available for the reviewing process: the submission deadline and the notification date. Here again there are conflicting requirements: the submission date should be as late as possible (to ensure accepted papers are as fresh as possible by the time the conference takes place), the reviewing phase as long as possible (you’re going to need it…), and the notification as early as possible (so there is time to compile proceedings, when they exist, and for authors to make travel arrangements). In my experience as PC member the time allocated for reviewing almost invariably felt too long – yes, I did write too long. However much time is allocated for the reviewing phase invariably ends up divided into ${\sim 90\%}$ procrastination and ${\sim 20\%}$ actual reviewing effort (obviously the actual reviewing gets under way too late for it to be completed by the reviewing deadline, which typically gets overstretched by some ${\sim 10\%}$). I suggest that a good calendar should allocate a month for collecting reviews, and a month for discussion. This is tight but sufficient, and will ensure that everyone remains engaged throughout. A month for reviewing allows a week for going through papers and identifying those for which external help should be sought; 2-3 weeks for actual reviewing; and a week for collecting reviews, putting the scores together, and scrambling through the last-minute calls for help. Similarly, a month of discussion would allow a week for score homogenization, two weeks to narrow down on the ${20\%}$ (say) borderline papers, and a final week to make those tough ultimate decisions. Tight, but feasible. Remember: however much time you allocate, will be taken up!

Now, as good a calendar you may have come up with, plan for delays. In my case I typically informed PC members that “reviews have to be completed by April 29th” and “the discussion phase will start on May 1st”. The “hidden” three days in-between the two dates were more than needed to track down missing reviews. Don’t ask PC members to complete a task the day you need the task completed, as it simply won’t happen: people have busy schedules, operate in different (and sometimes shifting) timezones, and have other deadlines to deal with. To respect your PC you ought to give them a precise calendar that you will follow, so they are able to plan ahead; but you also need to allow for the unavoidable time conflicts, last-minute no-shows, and other unpredictable events.

One last item before you break off. To set up the submissions webpage you’ll need to decide on a reviewing management software. I (quite mistakenly) didn’t give much thought to this. As PC member I had had a decent experience with easychair, and was under the impression that it was the most commonly used software – and would therefore be easiest to work with for the PC. Even though things went, on the whole, fairly smoothly, I had more than one occasion to regret the decision. The topic would deserve a blog post in itself, and I won’t expand here. Just make sure you carefully consider how easy the software will make different parts of the reviewing process, such as computing statistics, tracking missing reviews, ordering papers based on various criteria, allowing an efficient tagging system to keep track of memos or tentative decisions, handling communication with authors (including possibly the compilation of proceedings), etc.

## Action!

Alright, so you went for a stroll and enjoyed your most leisurely conference submission deadline ever – as PC chair, you’re probably not allowed to submit – but the bell has rung, the submission sever closed…now it’s your turn!

The last few hours. Actually, maybe this wasn’t quite your most leisurely submission deadline after all. I was advised to elect a “midnight anywhere on earth” deadline, as this supposedly made the guideline easier to comprehend for everyone. Not only do I now have strong evidence that I am not the only one to find this denomination absurdly confusing – where on earth is this place anyways, anywhere on earth?? – I would in any case strongly suggest setting a deadline that falls at a reasonable time in the PC chair’s timezone. You will get email from people unable to access the submission server (for whatever reason), unsure whether their submission fits the guidelines, asking whether they can get an extension, etc. It is more helpful if you can deal with such email as they arrive, rather than the next day.

Paper bidding. Before reviewing can get under way you need to assign papers to PC members. And before you can assign papers, PC members need to express their preferences. The resulting allocation is critical. It determines how many headaches you will face later on: how many papers will have low confidence reviews, how many closely related papers will have been reviewed by disjoint sets of PC members, how many papers will live on the enthusiastic scores of expert subreviewers. I found this phase challenging. An automatic assignment can be completed in milliseconds, but doesn’t take into account related submissions or expertise of PC members aside from their declared preference, which is a single noisy bit of information. I highly recommend (I realize I am “highly recommending” a lot of things for a first-timer – I only wish I had been told some of these ahead of time!) taking the process very seriously, and spending enough time to review, and tweak, the automatic assignment before it is made final.

Refereeing. Each PC member now has a healthy batch of papers assigned, and a deadline by which to submit reviews. What kind of guidelines can you give to make the process as smooth as possible? Discrepancies in scores are always an issue: whichever reviewing software you use, it is bound to produce some kind of score-based ranking; this initial ranking, although it will change during the discussion phase, induces a huge bias in final decisions (this effect is exacerbated for conferences, such as QCRYPT, where there is no in-person meeting). I don’t have a magic solution to this, but establishing clear guidelines in terms of the significance and expected proportion for each numerical score helps. I eventually found it necessary to prod outliers to modify their scores. This is one of the things easychair did not make particularly easy, forcing me to download data in Excel format and run some basic home-made scripts on the spreadsheet.

Aside from scoring, it is useful to include precise guidelines on the use of sub-referees and conflicts of interest (COIs). I allowed sub-refereeing but insisted that the final opinion should be the PC member’s. (It is not ok to copy-paste a sub-review while barely having gone through it!) Unfortunately sub-reviewers tend to be experts, and experts tend to be overly enthusiastic: watch out for papers that received three high scores, each of which with high confidence: easychair will rank those right at the top, but they may well be worth a second look.

Regarding COIs, I did not set overly strict rules (with the idea that “everyone knows when it is appropriate to declare a COI”), and regretted it. It is simply too uncomfortable to realize at a late stage that this very enthusiastic review was written by a PC member who happens to be a close collaborator of one of the authors, but chose not to disclose the COI. Do you discard the review? I don’t know. It depends: maybe the source of the COI played a role in the PC member’s vocal defense of the paper, and maybe not. Better not let it happen. It is not necessarily that even weak COI should forbid reviewing, but rather that COIs should be made explicit. As long as everyone states their position, things are in the open and can be taken into account.

Discussion. With all the reviews in (dream on… some reasonable fraction of the reviews in) begins the second phase of the reviewing process, the discussion phase. Success of this phase rests almost entirely on engagement of the PC chair and a few dedicated, dynamic PC members. Among PCs I have sat on the most satisfying were ones where the chair visibly spent large amounts of energy in the stimulation of online discussion. This is no trivial task: we all lead busy lives, and it is easy to let things slip; papers with high scores get in, low scores get out; a few days to discuss the few in the middle and we’ll be done…not so! Unfortunately, the initial ranking is bound to be abysmal. It is necessary to work to straighten things up. Some basic tricks apply: search for papers with high discrepancies in scores, low confidence, missing, very short, or uninformative reviews, etc. It is useful to individually prod PC members to keep the discussion going. This is a place where the “know your PC” recommendation comes in: for each submission, you need to be able to identify who will be able to clarify the arguments in favor and against the paper; who will have the technical expertise to clarify the relationship between papers X and Y, etc. It’s an exhausting, but highly rewarding process: I learned a lot by listening to my colleagues and trying to grasp at times rather subtle – and opinionated – arguments that could reach quite far from my expertise.

Decisions! The discussion has been going on for a couple weeks, and you already only have little time left: it is time to start making decisions. Proceeding in phases seems popular, and effective. It helps to progressively sharpen the acceptance threshold. As long as there are too many papers in play it is very hard to get a sense of where the boundary will lie; typically, far too many papers will have positive scores and enthusiastic proponents than can ultimately be accepted.

However much ahead of time you get started, the real decisions will take place in the last few days. I found it helpful to set a clear calendar for the process, marking days when decisions would be made, identifying clear categories (accept, accept?, discuss!, etc.), and setting explicit targets for each phase (accept X/reject Y many more papers, etc.), even if I wasn’t always able to meet them. It is also important that the PC as a whole be aware of the target number of papers that is to be accepted. I have frequently been on PC where the chair gave us the information that “we will accept all great papers”, only to learn later that a hard limit had (of course) been set. Conversely, I’ve also been extremely annoyed at last-minute decisions along the lines of, well, we accepted about as much we could, but there’s 4 left undecided cases, and, well, they’re all really good, so why don’t we just stretch the program a bit and accept all 4 at the last minute. To me this is the PC not doing its job… be prepared to make difficult decisions! Make it clear to the PC (and to yourself) what your goal is. Is it to serve the authors, the conference attendees, the advancement of science – all of the above (good luck)?

## Post-mortem

This was fun. Exhausting, but fun. Of course not all authors (or PC members) were happy. There will be complaints. And some of them will be justified: there is no perfect allocation. Mistakes happen. We did our best!

Some tasks lie down the road. Put a program together. Help select a best (student) paper. Gather statistics for the business meeting. But first things first: take a deep breath. This was fun.

Posted in Conferences, Jobs, QCrypto | | 4 Comments

## Pauli braiding

[7/9/17 Update: Following a suggestion by Oded Regev I upgraded Section 1 from “probabilistic functions” to “matrix-valued functions”. This hopefully makes it a more useful, and interesting, mid-point between the classical analysis of BLR and the non-abelian extension discussed afterwards. I also fixed a bunch of typos — I apologize for the many remaining ones. The pdf has also been fixed.]

Last week Anand Natarajan from MIT presented our joint work on “A Quantum Linearity Test for Robustly Verifying Entanglement” at the STOC’17 conference in Montreal. Since we first posted our paper on the quant-ph arXiv, Anand and I discovered that the test and its analysis could be reformulated in a more general framework of tests for group relations, and rounding of approximate group representations to exact group representations. This reformulation is stimulated by a beautiful paper by Gowers and Hatami on “Inverse and stability theorems for approximate representations of finite groups”, which was first pointed to me by William Slofstra. The purpose of this post is to present the Gowers-Hatami result as a natural extension of the Blum-Luby-Rubinfeld linearity test to the non-abelian setting, with application to entanglement testing. (Of course Gowers and Hatami are well aware of this — though maybe not of the application to entanglement tests!) My hope in doing so is to make our result more accessible, and hopefully draw some of my readers from theoretical computer science into a beautiful area.

I will strive to make the post self-contained and accessible, with no quantum information background required — indeed, most of the content is purely — dare I say elegantly — mathematical. In the interest of being precise (and working out better parameters for our result than appear in our paper) I include essentially full proofs, though I may allow myself to skip a line or two in some of the calculations.

Given the post remains rather equation-heavy, here is a pdf with the same contents; it may be more convenient to read.

I am grateful to Anand, and Oded Regev and John Wright, for helpful comments on a preliminary version of this post.

1. Linearity testing

The Blum-Luby-Rubinfeld linearity test provides a means to certify that a function ${f:{\mathbb Z}_2^n\rightarrow\{\pm1 \}}$ is close to a linear function. The test can be formulated as a two-player game:

BLR linearity test:

• (a) The referee selects ${a,b\in{\mathbb Z}_2^n}$ uniformly at random. He sends the pair ${(a,b)}$ to one player, and either ${a}$, ${b}$, or ${a+b}$ (chosen uniformly at random) to the other.
• (b) The first player replies with two bits, and the second player with a single bit. The referee accepts if and only if the player’s answers satisfy the natural consistency constraint.

This test, as all others considered here, treats both players symmetrically. This allows us to restrict our attention to the case of players who both apply the same strategy, an assumption I will systematically make from now on.

Blum et al.’s result states that any strategy for the players in the linearity test must provide answers chosen according to a function that is close to linear. In this section I will provide a slight “matrix-valued” extension of the BLR result, that follows almost directly from the usual Fourier-analytic proof but will help clarify the extension to the non-abelian case.

1.1. Matrix-valued strategies

The “classical” analysis of the BLR test starts by modeling an arbitrary strategy for the players as a pair of functions ${f:{\mathbb Z}_2^n\rightarrow \{\pm 1\}}$ (for the second player, who receives a single string as query) and ${f':{\mathbb Z}_2^n \times {\mathbb Z}_2^n \rightarrow \{\pm 1\}\times\{\pm 1\}}$ (for the first player, who receives a pair of strings as query). In doing so we are making an assumption: that the players are deterministic. More generally, we should allow “probabilistic strategies”, which can be modeled via “probabilistic functions” ${f:{\mathbb Z}_2^n \times \Omega \rightarrow \{\pm 1\}}$ and ${f':{\mathbb Z}_2^n \times {\mathbb Z}_2^n \times\Omega\rightarrow \{\pm 1\}\times\{\pm 1\}}$ respectively, where ${(\Omega,\mu)}$ is an arbitrary probability space which plays the role of shared randomness between the players. Note that the usual claim that “probabilistic strategies are irrelevant because they can succeed no better than deterministic strategies” is somewhat moot here: the point is not to investigate success probabilities — it is easy to pass the BLR test with probability ${1}$ — but rather derive structural consequences from the assumption that a certain strategy passes the test. In this respect, enlarging the kinds of strategies we consider valid can shed new light on the strengths, and weaknesses, of the test.

Thus, and with an eye towards the “quantum” analysis to come, let us consider an even broader set of strategies, which I’ll refer to as “matrix-valued” strategies. A natural matrix-valued analogue of a function ${f:{\mathbb Z}_2^n \rightarrow \{\pm 1\}}$ is ${F:{\mathbb Z}_2^n \rightarrow O_d({\mathbb C})}$, where ${O_d({\mathbb C})}$ is the set of ${d\times d}$ Hermitian matrices that square to identity (equivalently, have all eigenvalues in ${\{\pm 1\}}$); these matrices are called “observables” in quantum mechanics. Similarly, we may generalize a function ${f':{\mathbb Z}_2^n \times {\mathbb Z}_2^n \rightarrow \{\pm 1 \} \times \{\pm 1\}}$ to a function ${F':{\mathbb Z}_2^n \times {\mathbb Z}_2^n \rightarrow O_d({\mathbb C}) \times O_d({\mathbb C})}$. Here we’ll impose an additional requirement: any pair ${(B,C)}$ in the range of ${F'}$ should be such that ${B}$ and ${C}$ commute. The latter condition is important so that we can make sense of the function as a strategy for the provers: we should be able to ascribe a probability distribution on outcomes ${(a,(b,c))}$ to any query ${(x,(y,z))}$ sent to the players. This is achieved by defining

$\displaystyle \Pr\big((F(x), F'(y,z))=(a,(b,c))\big)\,=\,\frac{1}{d}\,\mathrm{Tr}\big( F(x)^aF'(y,z)_1^b F'(y,z)_2^c\big), \ \ \ \ \ (1)$

where for any observable ${O}$ we denote ${O^{+1}}$ and ${O^{-1}}$ the projections on the ${+1}$ and ${-1}$ eigenspaces of ${O}$, respectively (so ${O=O^{+1}-O^{-1}}$ and ${O^{+1}+O^{-1}=I}$). The condition that ${F'(y,z)_1}$ and ${F'(y,z)_2}$ commute ensures that this expression is always non-negative; moreover it is easy to check that for all ${(x,(y,z))}$ it specifies a well-defined probability distribution on ${\{\pm 1\}\times (\{\pm1\}\times \{\pm1\})}$ . Observe also that in case ${d=1}$ we recover the classical deterministic case, for which with our notation ${f(x)^a = 1_{f(x)=a}}$. If all ${F(x)}$ and ${F'(y,z)}$ are simultaneously diagonal matrices we recover the probabilistic case, with the role of ${\Omega}$ (the shared randomness) played by the rows of the matrices (hence the normalization of ${1/d}$; we will see later how to incorporate the use of non-uniform weights).

With these notions in place we establish the following simple lemma, which states the only consequence of the BLR test we will need.

Lemma 1 Let ${n}$ be an integer, ${\varepsilon\geq 0}$, and ${F:{\mathbb Z}_2^n\rightarrow O_d({\mathbb C})}$ and ${F':{\mathbb Z}_2^n \times {\mathbb Z}_2^n \rightarrow O_d({\mathbb C})\times O_d({\mathbb C})}$ a matrix strategy for the BLR test, such that players determining their answers according to this strategy (specifically, according to (1)) succeed in the test with probability at least ${1-\varepsilon}$. Then

$\displaystyle \mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n}\,\frac{1}{d}\, \Re\,\mathrm{Tr}\big( F(x)F(y)F(x+y)\big) \,\geq\, 1-O(\varepsilon).$

Introducing a normalized inner product ${\langle A,B\rangle_f = d^{-1} \mathrm{Tr}(AB^*)}$ on the space of ${d\times d}$ matrices with complex entries (the ${^*}$ designates the conjugate-transpose), the conclusion of the lemma is that ${\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \langle F(x)F(y),\,F(x+y)\rangle_f \,=\, 1-O(\varepsilon)}$.

Proof: Success with probability ${1-\varepsilon}$ in the test implies the three conditions

$\displaystyle \begin{array}{rcl} &&\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \langle F'(x,y)_1,F(x)\rangle_f \,\geq\, 1-3\varepsilon,\\ &&\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \langle F'(x,y)_2,F(y)\rangle_f \,\geq\, 1-3\varepsilon,\\ &&\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \langle F'(x,y)_1F'(x,y)_2,F(x+y)\rangle_f \,\geq\, 1-3\varepsilon. \end{array}$

To conclude, use the triangle inequality as

$\displaystyle \begin{array}{rcl} &\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} & \big\|F(x)F(y)-F(x+y) \big\|_f^2 \\ & &\qquad\leq \,3\Big(\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \big\|(F(x)-F'(x,y)_1)F(y) \big\|_f^2\\ && \qquad\qquad + \mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \big\|(F(y)-F'(x,y)_2)F'(x,y)_1 \big\|_f^2\\ &&\qquad\qquad+\mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \big\|F'(x,y)_1F'(x,y)_2-F(x+y) \big\|_f^2\Big), \end{array}$

where ${\|A\|_f^2 = \langle A,A\rangle_f}$ denotes the dimension-normalized Frobenius norm. Expanding each squared norm and using the preceding conditions and ${F(x)^2=1}$ for all ${x}$ proves the lemma. $\Box$

1.2. The BLR theorem for matrix-valued strategies

Before stating a BLR theorem for matrix-valued strategies we need to define what it means for such a function ${G: {\mathbb Z}_2^n \rightarrow O_d({\mathbb C})}$ to be linear. Consider first the case of probabilistic functions, i.e. ${G}$ such that all ${G(x)}$ are diagonal, in the same basis. Any such ${G}$ whose every diagonal entry is of the form ${\chi_{S}(x) = (-1)^{S \cdot x}}$ for some ${S\in\{0,1\}^n}$ which may depend on the row/column number will pass the BLR test. This shows that we cannot hope to force ${G}$ to be a single linear function, we must allow “mixtures”. Formally, call ${G}$ linear if ${G(x) = \sum_S \chi_S(x) P_S}$ for some decomposition ${\{P_S\}}$ of the identity, i.e. the ${P_S}$ are pairwsie orthogonal projections such that ${\sum_S P_S=I}$. Note that this indeed captures the probabilistic case; in fact, up to a basis change it is essentially equivalent to it. Thus the following may come as a surprise.

Theorem 2 Let ${n}$ be an integer, ${\varepsilon\geq 0}$, and ${F:{\mathbb Z}_2^n \rightarrow O_d({\mathbb C})}$ such that

$\displaystyle \mathop{\mathbb E}_{x,y\in {\mathbb Z}_2^n} \, \frac{1}{d}\,\Re\,\langle F(x)F(y),F(x+y)\rangle_f \,\geq\, 1-\varepsilon. \ \ \ \ \ (2)$

Then there exists a ${d'\geq d}$, an isometry ${V:{\mathbb C}^d\rightarrow{\mathbb C}^{d'}}$, and a linear ${G:{\mathbb Z}_2^n \rightarrow O_{d'}({\mathbb C})}$ such that

$\displaystyle \mathop{\mathbb E}_{x\in{\mathbb Z}_2^n} \,\big\| F(x) - V^* G(x)V\big\|_f^2 \,\leq\, 2\,\varepsilon.$

Note the role of ${V}$ here, and the lack of control on ${d'}$ (more on both aspects later). Even if ${F}$ is a deterministic function ${f}$, i.e. ${d=1}$, the function ${G}$ returned by the theorem may be matrix-valued. In this case the isometry ${V}$ is simply a unit vector ${v\in {\mathbb C}^{d'}}$, and expanding out the squared norm in the conclusion of the theorem yields the equivalent conclusion

$\displaystyle \sum_S (v^* P_S v)\,\Big(\mathop{\mathbb E}_{x} f(x)\, \chi_S(x) \Big) \,\geq\, 1-\varepsilon,$

where we expanded ${G(x) = \sum_S \chi_S(x) P_S}$ using our definition of a linear matrix-valued function. Note that ${\{ v^* P_S v\}}$ defines a probability distribution on ${\{0,1\}^n}$. Thus by an averaging argument there must exist an ${S}$ such that ${f(x)=\chi_S(x)}$ for a fraction at least ${1-\varepsilon/2}$ of all ${x\in{\mathbb Z}_2^n}$: the usual conclusion of the BLR theorem is recovered.

Proof: The proof of the theorem follows the classic Fourier-analytic proof of Bellare et al. Our first step is to define the isometry ${V}$. For a vector ${u\in {\mathbb C}^d}$, define

$\displaystyle V u = \sum_S \hat{F}(S) u \otimes e_S \in {\mathbb C}^d \otimes {\mathbb C}^{2^n},$

where ${\hat{F}(S) = \mathop{\mathbb E}_{x} \chi_S(x) F(x)}$ is the matrix-valued Fourier coefficient of ${F}$ at ${S}$ and ${\{e_S\}_{S\in\{0,1\}^n}}$ an arbitrary orthonormal basis of ${{\mathbb C}^{2^n}}$. An easily verified extension of Parseval’s formula shows ${\sum_S \hat{F}(S)^2 = I}$ (recall ${F(x)^2=I}$ for all ${x}$), so that ${V^*V = I}$: ${V}$ is indeed an isometry.

Next, define the linear probabilistic function ${G}$ by ${G(x) = \sum_S \chi_S(x) P_S}$, where ${P_S = I \otimes e_Se_S^*}$ forms a partition of identity. We can evaluate

$\displaystyle \begin{array}{rcl} &\mathop{\mathbb E}_{x} \,\frac{1}{d}\,\langle F(x),V^*G(x)V \rangle_f &= \mathop{\mathbb E}_{x} \sum_{S}\,\frac{1}{d}\,\langle F(x),\, \chi_S(x) \hat{F}(S)^2 \rangle_f \\ &&= \mathop{\mathbb E}_{x,y} \,\frac{1}{d}\,\langle F(x+y),\,F(x)F(y) \rangle_f, \end{array}$

where the last equality follows by expanding the Fourier coefficients and noticing the appropriate cancellation. Together with (2), this proves the theorem. $\Box$

At the risk of sounding yet more pedantic, it might be useful to comment on the relation between this proof and the usual argument. The main observation in Bellare et al.’s proof is that approximate linearity, expressed by (2), implies a lower bound on the sum of the cubes of the Fourier coefficients of ${f}$. Together with Parseval’s formula, this bound implies the existence of a large Fourier coefficient, which identifies a close-by linear function.

The proof I gave decouples the argument. Its first step, the construction of the isometry ${V}$ depends on ${F}$, but does not use anything regarding approximate linearity. It only uses Parseval’s formula to argue that the isometry is well-defined. A noteworthy feature of this step is that the function ${G}$ on the extended space is always well-defined as well: given a function ${F}$, it is always possible to consider the linear matrix-valued function which “samples ${S}$ according to ${\hat{F}(S)^2}$” and then returns ${\chi_S(x)}$. The second step of the proof evaluates the correlation of ${F}$ with the “pull-back” of ${G}$, and observes that this correlation is precisely our measure of “approximate linearity” of ${F}$, concluding the proof without having had to explicitly notice that there existed a large Fourier coefficient.

1.3. The group-theoretic perspective

Let’s re-interpret the proof we just gave using group-theoretic language. A linear function ${g: {\mathbb Z}_2^n\rightarrow\{\pm 1\}}$ is, by definition, a mapping which respects the additive group structure on ${{\mathbb Z}_2^n}$, namely it is a representation. Since ${G=({\mathbb Z}_2^n,+)}$ is an abelian group, it has ${|G|=2^n}$ irreducible ${1}$-dimensional representations, given by the characters ${\chi_S}$. As such, the linear function defined in the proof of Theorem 2 is nothing but a list of all irreducible representations of ${G}$.

The condition (2) derived in the proof of the theorem can be interpreted as the condition that ${F}$ is an “approximate representation” of ${G}$. Let’s make this a general definition. For ${d}$-dimensional matrices ${A,B}$ and ${\sigma}$ such that ${\sigma}$ is positive semidefinite, write

$\displaystyle \langle A,B\rangle_\sigma = \mathrm{Tr}(AB^* \sigma),$

where we use ${B^*}$ to denote the conjugate-transpose. The following definition considers arbitrary finite groups (not necessarily abelian).

Definition 3 Given a finite group ${G}$, an integer ${d\geq 1}$, ${\varepsilon\geq 0}$, and a ${d}$-dimensional positive semidefinite matrix ${\sigma}$ with trace ${1}$, an ${(\varepsilon,\sigma)}$-representation of ${G}$ is a function ${f: G \rightarrow U_d({\mathbb C})}$, the unitary group of ${d\times d}$ matrices, such that

$\displaystyle \mathop{\mathbb E}_{x,y\in G} \,\Re\big(\big\langle f(x)^*f(y) ,f(x^{-1}y) \big\rangle_\sigma\big) \,\geq\, 1-\varepsilon, \ \ \ \ \ (3)$

where the expectation is taken under the uniform distribution over ${G}$.

The condition (3) in the definition is very closely related to Gowers’s ${U^2}$ norm

$\displaystyle \|f\|_{U^2}^4 \,=\, \mathop{\mathbb E}_{xy^{-1}=zw^{-1}}\, \big\langle f(x)f(y)^* ,f(z)f(w)^* \big\rangle_\sigma.$

While a large Gowers norm implies closeness to an affine function, we are interested in testing linear functions, and the condition (3) will arise naturally from our calculations in the next section.

If ${G=({\mathbb Z}_2^n,+)}$, the product ${xy^{-1}}$ should be written additively as ${x-y=x+y}$, so that the condition (2) is precisely that ${F}$ is an ${(\varepsilon,\sigma)}$-representation of ${G}$, where ${\sigma = d^{-1}I}$. Theorem 2 can thus be reformulated as stating that for any ${(\varepsilon,\sigma)}$-approximate representation of the abelian group ${G=({\mathbb Z}_2^n,+)}$ there exists an isometry ${V:{\mathbb C}^d \rightarrow {\mathbb C}^d\otimes {\mathbb C}^{2^n}}$ and an exact representation ${g}$ of ${G}$ on ${{\mathbb C}^d \otimes {\mathbb C}^{2^n}}$ such that ${f}$ is well-approximated by the “pull-back” ${V^*gV}$ of ${g}$ to ${{\mathbb C}^d}$. In the next section I will make the words in quotes precise and generalize the result to the case of arbitrary finite groups.

2. Approximate representations of non-abelian groups

2.1. The Gowers-Hatami theorem

In their paper Gowers and Hatami consider the problem of “rounding” approximate group representations to exact representations. I highly recommend the paper, which gives a thorough introduction to the topic, including multiple motivations. Here I will state and prove a slightly more general, but quantitatively weaker, variant of their result inspired by the somewhat convoluted analysis of the BLR test given in the previous section.

Theorem 4 (Gowers-Hatami) Let ${G}$ be a finite group, ${\varepsilon\geq 0}$, and ${f:G\rightarrow U_d({\mathbb C})}$ an ${(\varepsilon,\sigma)}$-representation of ${G}$. Then there exists a ${d'\geq d}$, an isometry ${V:{\mathbb C}^d\rightarrow {\mathbb C}^{d'}}$, and a representation ${g:G\rightarrow U_{d'}({\mathbb C})}$ such that

$\displaystyle \mathop{\mathbb E}_{x\in G}\, \big\| f(x) - V^*g(x)V \big\|_\sigma^2\, \leq\, 2\,\varepsilon.$

Gowers and Hatami limit themselves to the case of ${\sigma = d^{-1}I_d}$, which corresponds to the dimension-normalized Frobenius norm. In this scenario they in addition obtain a tight control of the dimension ${d'}$, and show that one can always take ${d'\ = (1+O(\varepsilon))d}$ in the theorem. I will give a much shorter proof than theirs (the proof is implicit in their argument) that does not seem to allow to recover this estimate. (It is possible to adapt their proof to keep a control of ${d'}$ even in the case of general ${\sigma}$, but I will not explain this here.) Essentially the same proof as the one sketched below has been extended to some classes of infinite groups by De Chiffre, Ozawa and Thom in a recent preprint.

Note that, contrary to the BLR theorem, where the “embedding” is not strictly necessary (if ${\varepsilon}$ is small enough we can identify a single close-by linear function), as noted by Gowers and Hatami Theorem 4 does not in general hold with ${d'=d}$. The reason is that it is possible for ${G}$ to have an approximate representation in some dimension ${d}$, but no exact representation of the same dimension: to obtain an example of this, take any group ${G}$ that has all non-trivial irreducible representations of large enough dimension, and create an approximate representation in e.g. dimension one less by “cutting off” one row and column from an exact representation. The dimension normalization induced by the norm ${\|\cdot\|_\sigma}$ will barely notice this, but it will be impossible to “round” the approximate representation obtained to an exact one without modifying the dimension.

The necessity for the embedding helps distinguish the Gowers-Hatami result from other extensions of the linearity test to the non-abelian setting, such as the work by Ben-Or et al. on non-Abelian homomorphism testing (I thank Oded Regev for pointing me to the paper). In that paper the authors show that a function ${f:G\rightarrow H}$, where ${G}$ and ${H}$ are finite non-abelian groups, which satisfies ${\Pr( f(x)f(y)=f(xy) ) \geq 1-\varepsilon}$, is ${O(\varepsilon)}$-close to a homomorphism ${g:G\rightarrow H}$. The main difference with the setting for the Gowers-Hatami result is that since ${H}$ is finite, Ben-Or et al. use the Kronecker ${\delta}$ function as distance on ${H}$. This allows them to employ combinatorial arguments, and provide a rounding procedure that does not need to modify the range space (${H}$). In contrast, here the unitary group is infinite.

The main ingredient needed to extend the analysis of the BLR test is an appropriate notion of Fourier transform over non-abelian groups. Given an irreducible representation ${\rho: G \rightarrow U_{d_\rho}({\mathbb C})}$, define

$\displaystyle \hat{f}(\rho) \,=\, \mathop{\mathbb E}_{x\in G} \,f(x) \otimes \overline{\rho(x)}. \ \ \ \ \ (4)$

In case ${G}$ is abelian, we always have ${d_\rho=1}$, the tensor product is a product, and (4) reduces to the usual definition of Fourier coefficient. The only properties we will need of irreducible representations is that they satisfy the relation

$\displaystyle \sum_\rho \,d_\rho\,\mathrm{Tr}(\rho(x)) \,=\, |G|\delta_{xe}\;, \ \ \ \ \ (5)$

for any ${x\in G}$. Note that plugging in ${x=e}$ (the identity element in ${G}$) yields ${\sum_\rho d_\rho^2= |G|}$.

Proof: } As in the proof of Theorem 2 our first step is to define an isometry ${V:{\mathbb C}^d \rightarrow {\mathbb C}^d \otimes (\oplus_\rho {\mathbb C}^{d_\rho} \otimes {\mathbb C}^{d_\rho})}$ by

$\displaystyle V :\;u \in {\mathbb C}^d \,\mapsto\, \bigoplus_\rho \,d_\rho^{1/2} \sum_{i=1}^{d_\rho} \,\big(\hat{f}(\rho) (u\otimes e_i)\big) \otimes e_i,$

where the direct sum ranges over all irreducible representations ${\rho}$ of ${G}$ and ${\{e_i\}}$ is the canonical basis. Note what ${V}$ does: it “embeds” any vector ${u\in {\mathbb C}^d}$ into a direct sum, over irreducible representations ${\rho}$, of a ${d}$-dimensional vector of ${d_\rho\times d_\rho}$ matrices. Each (matrix) entry of this vector can be thought of as the Fourier coefficient of the corresponding entry of the vector ${f(x)u}$ associated with ${\rho}$. If ${G={\mathbb Z}_2^n}$ and ${f}$ ranges over ${O_({\mathbb C})}$ this recovers the isometry defined in the proof of Theorem 2. And indeed, the fact that ${V}$ is an isometry again follows from the appropriate extension of Parseval’s formula:

$\displaystyle \begin{array}{rcl} & V^* V &= \sum_\rho d_\rho \sum_i (I\otimes e_i^*) \hat{f}(\rho)^*\hat{f}(\rho) (I\otimes e_i)\\ &&= \mathop{\mathbb E}_{x,y}\, f(x)^*f(y) \sum_\rho d_\rho \sum_i (e_i^* \rho(x)^T \overline{\rho(y)} e_i)\\ &&= \sum_\rho \frac{d_\rho^2}{|G|}I = I, \end{array}$

where for the second line we used the definition (4) of ${\hat{f}(\rho)}$ and for the third we used (5) and the fact that ${f}$ takes values in the unitary group.

Following the same steps as in the proof of Theorem 2, we next define

$\displaystyle g(x) = \bigoplus_\rho \,\big(I_d \otimes I_{d_\rho} \otimes \rho(x)\big),$

a direct sum over all irreducible representations of ${G}$ (hence itself a representation). Lets’ first compute the “pull-back” of ${g}$ by ${V}$: following a similar calculation as above, for any ${x\in G}$,

$\displaystyle \begin{array}{rcl} & V^*g(x) V &= \sum_{\rho} d_\rho \sum_{i,j} (I\otimes e_i^*)\hat{f}(\rho)^* \hat{f}(\rho)(I\otimes e_j) \otimes e_i^* \rho(x) e_j ) \\ && = \mathop{\mathbb E}_{z,y}\, f(z)^*f(y) \sum_{\rho} d_\rho \sum_{i,j} (e_i^* \rho(z)^T \overline{\rho(y)} e_j) \big( e_i^* \rho(x) e_j \big) \\ && = \mathop{\mathbb E}_{z,y}\, f(z)^*f(y) \sum_{\rho} d_\rho \mathrm{Tr}\big( \rho(z)^T \overline{\rho(y)} {\rho(x)^T} \big) \\ && = \mathop{\mathbb E}_{z,y}\, f(z)^*f(y) \sum_{\rho} d_\rho \mathrm{Tr}\big( \rho(z^{-1}y x^{-1}) \big) \\ && = \mathop{\mathbb E}_{z}\, f(z)^*f(zx) , \end{array}$

where the last equality uses (5). It then follows that

$\displaystyle \begin{array}{rcl} &\mathop{\mathbb E}_{x}\, \big\langle f(x), V^*g(x) V \big\rangle_\sigma &= \mathop{\mathbb E}_{x,z} \mathrm{Tr}\big( f(x) f(zx)^* f(z)\sigma\big). \end{array}$

This relates correlation of ${f}$ with ${V^*gV}$ to the quality of ${f}$ as an approximate representation and proves the theorem. $\Box$

2.2. Application: the Weyl-Heisenberg group

In quantum information we care a lot about the Pauli group. For our purposes it will be be sufficient (and much more convenient, allowing us to avoid some trouble with complex conjugation) to consider the Weyl-Heisenberg group ${H}$, or “Pauli group modulo complex conjugation”, which is the ${8}$-element group ${\{\pm \sigma_I,\pm \sigma_X,\pm \sigma_Z,\pm \sigma_W\}}$ whose multiplication table matches that of the ${2\times 2}$ matrices

$\displaystyle \sigma_X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix},\qquad \sigma_Z= \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}, \ \ \ \ \ (6)$

${\sigma_I = \sigma_X^2 = \sigma_Z^2}$ and ${\sigma_W=\sigma_X\sigma_Z=-\sigma_Z\sigma_X}$. This group has four ${1}$-dimensional representations, uniquely specified by the image of ${\sigma_X}$ and ${\sigma_Z}$ in ${\{\pm 1\}}$, and a single irreducible ${2}$-dimensional representation, given by the matrices defined above. We can also consider the “${n}$-qubit Weyl-Heisenberg group” ${H^{(n)}}$, the matrix group generated by ${n}$-fold tensor products of the ${8}$ matrices identified above. The irreducible representations of ${H^{(n)}}$ are easily computed from those of ${H}$; for us the only thing that matters is that the only irreducible representation which satisfies ${\rho(-I)=-\rho(I)}$ has dimension ${2^n}$ and is given by the defining matrix representation (in fact, it is the only irreducible representation in dimension larger than ${1}$).

With the upcoming application to entanglement testing in mind, I will state a version of Theorem 4 tailored to the group ${H^{(n)}}$ and a specific choice of presentation for the group relations. Towards this we first need to recall the notion of Schmidt decomposition of a bipartite state (i.e. unit vector) ${\psi \in {\mathbb C}^d \otimes {\mathbb C}^d}$. The Schmidt decomposition states that any such vector can be written as

$\displaystyle \psi \,=\, \sum_i \,\sqrt{\lambda_i}\, u_i \otimes v_i, \ \ \ \ \ (7)$

for some orthonomal bases ${\{u_i\}}$ and ${\{v_i\}}$ of ${{\mathbb C}^d}$ (the “Schmidt vectors”) and non-negative coefficients ${\sqrt{\lambda_i}}$ (the “Schmidt coefficients”). The decomposition can be obtained by “reshaping” ${\psi = \sum_{i,j} \psi_{i,j} e_i \otimes e_j}$ into a ${d\times d}$ matrix ${K=(\psi_{i,j})_{1\leq i,j\leq d}}$ and performing the singular value decomposition. To ${\psi}$ we associate the (uniquely defined) positive semidefinite matrix

$\displaystyle \sigma \,=\, KK^* \,=\, \sum_i \lambda_i\,u_iu_i^*\; ; \ \ \ \ \ (8)$

note that ${\sigma}$ has trace ${1}$. The matrix ${\sigma}$ is called the reduced density of ${\psi}$ (on the first system).

Corollary 5 Let ${n,d}$ be integer, ${\varepsilon \geq 0}$, ${\psi \in {\mathbb C}^d \otimes {\mathbb C}^d}$ a unit vector, ${\sigma}$ the positive semidefinite matrix associated to ${\psi}$ as in (8), and ${f: \{X,Z\}\times \{0,1\}^n \rightarrow U({\mathbb C}^d)}$. For ${a,b\in\{0,1\}^n}$ let ${X(a)=f(X,a)}$, ${Z(b)=f(Z,b)}$, and assume ${X(a)^2=Z(b)^2=I_d}$ for all ${a,b}$ (we call such operators, unitaries with eigenvalues in ${\{\pm 1\}}$, observables). Suppose that the following inequalities hold: consistency

$\displaystyle \mathop{\mathbb E}_a \, \psi^* \big(X(a) \otimes X(a)\big) \psi \,\geq\,1-\varepsilon,\qquad \mathop{\mathbb E}_b \, \psi^* \big(Z(b) \otimes Z(b) \big)\psi\,\geq\,1-\varepsilon, \ \ \ \ \ (9)$

linearity

$\displaystyle \mathop{\mathbb E}_{a,a'} \,\big\|X(a)X(a')-X(a+a')\big\|_\sigma^2 \leq \varepsilon,\qquad\mathop{\mathbb E}_{b,b'}\, \big\|Z(b)Z(b')-Z(b+b')\big\|_\sigma^2 \leq \varepsilon, \ \ \ \ \ (10)$

and anti-commutation

$\displaystyle \mathop{\mathbb E}_{a,b} \,\big\| X(a)Z(b)-(-1)^{a\cdot b} X(a)Z(b)\big\|_\sigma^2\,\leq\,\varepsilon. \ \ \ \ \ (11)$

Then there exists a ${d'\geq d}$, an isometry ${V:{\mathbb C}^d\rightarrow {\mathbb C}^{d'}}$, and a representation ${g:H^{(n)}\rightarrow U_{d'}({\mathbb C})}$ such that ${g(-I)=-I_{d'}}$ and

$\displaystyle \mathop{\mathbb E}_{a,b}\, \big\| X(a)Z(b) - V^*g(\sigma_X(a)\sigma_Z(b))V \big\|_\sigma^2 \,=\, O(\varepsilon).$

Note that the conditions (10) and (11) in the corollary are very similar to the conditions required of an approximate representation of the group ${H^{(n)}}$; in fact it is easy to convince oneself that their exact analogue suffice to imply all the group relations. The reason for including only those relations is that they are the ones that it will be possible to test; see the next section for this. Condition (9) is necessary to derive the conditions of Theorem 4 from (10) and (11), and is also testable; see the proof.

Proof: To apply Theorem 4 we need to construct an ${(\varepsilon,\sigma)}$-representation ${f}$ of the group ${H^{(n)}}$. Using that any element of ${H^{(n)}}$ has a unique representative of the form ${\pm \sigma_X(a)\sigma_Z(b)}$ for ${a,b\in\{0,1\}^n}$, we define ${f(\pm \sigma_X(a)\sigma_Z(b)) = \pm X(a)Z(b)}$. Next we need to verify (3). Let ${x,y\in H^{(n)}}$ be such that ${x=\sigma_X(a_x)\sigma_Z(b_x)}$ and ${y=\sigma_X(a_y)\sigma_Z(b_y)}$ for ${n}$-bit strings ${(a_x,b_x)}$ and ${(a_y,b_y)}$ respectively. Up to phase, we can exploit successive cancellations to decompose ${(f(x)f(y)^*-f(xy^{-1}))\otimes I}$ as

$\displaystyle \begin{array}{rcl} &&\big( X(a_x)Z(b_x)X(a_y)Z(b_y) -(-1)^{a_y\cdot b_x} X(a_x+a_y) Z(b_x+b_y)\big)\otimes I \\ &&\qquad = X(a_x)Z(b_x)X(a_y)\big (Z(b_y)\otimes I - I\otimes Z(b_y)\big)\\ && \qquad\qquad+ X(a_x)\big(Z(b_x)X(a_y) - (-1)^{a_y\cdot b_x} X(a_y)Z(b_x)\big)\otimes Z(b_y)\\ && \qquad\qquad+(-1)^{a_y\cdot b_x} \big( X(a_x)X(a_y)\otimes Z(b_y)\big) \big( Z(b_x)\otimes I - I\otimes Z(b_x)\big)\\ && \qquad\qquad+ (-1)^{a_y\cdot b_x} \big( X(a_x)X(a_y)\otimes Z(b_y)Z(b_x) - X(a_x+a_y)\otimes Z(b_x+b_y)\big)\\ && \qquad\qquad+ (-1)^{a_y\cdot b_x} \big( X(a_x+a_y)\otimes I \big)\big(I\otimes Z(b_x+b_y) - Z(b_x+b_y)\otimes I\big). \end{array}$

(It is worth staring at this sequence of equations for a little bit. In particular, note the “player-switching” that takes place in the 2nd, 4th and 6th lines; this is used as a means to “commute” the appropriate unitaries, and is the reason for including (9) among the assumptions of the corollary.) Evaluating each term on the vector ${\psi}$, taking the squared Euclidean norm, and then the expectation over uniformly random ${a_x,a_y,b_x,b_y}$, the inequality ${\| AB\psi\| \leq \|A\|\|B\psi\|}$ and the assumptions of the theorem let us bound the overlap of each term in the resulting summation by ${O({\varepsilon})}$. Using ${\| (A\otimes I) \psi\| = \|A\|_\sigma}$ by definition, we obtain the bound

$\displaystyle \mathop{\mathbb E}_{x,y}\,\big\|f(x)f(y)^* - f(xy^{-1})\big\|_\sigma^2 \,=\, O({\varepsilon}).$

We are thus in a position to apply Theorem 4, which gives an isometry ${V}$ and exact representation ${g}$ such that

$\displaystyle \mathop{\mathbb E}_{a,b}\,\Big\| X(a)Z(b)- \frac{1}{2}V^*\big( g(\sigma_X(a)\sigma_Z(b)) - g(-\sigma_X(a)\sigma_Z(b))\big)V\Big\|_\sigma^2 \,=\, O({\varepsilon}). \ \ \ \ \ (12)$

Using that ${g}$ is a representation, ${g(-\sigma_X(a)\sigma_Z(b)) = g(-I)g(\sigma_X(a)\sigma_Z(b))}$. It follows from (12) that ${\|g(-I) + I \|_\sigma^2 = O({\varepsilon})}$, so we may restrict the range of ${V}$ to the subspace where ${g(-I)=-I}$ without introducing much additional error. $\Box$

3. Entanglement testing

Our discussion so far has barely touched upon the notion of entanglement. Recall the Schmidt decopmosition (7) of a unit vector ${\psi \in {\mathbb C}^d\otimes {\mathbb C}^d}$, and the associated reduced density matrix ${\sigma}$ defined in (8). The state ${\psi}$ is called entangled if this matrix has rank larger than ${1}$; equivalently, if there is more than one non-zero coefficient ${\lambda_i}$ in (7). The Schmidt rank of ${\psi}$ is the rank of ${\sigma}$, the number of non-zero terms in (7). It is a crude, but convenient, measure of entanglement; in particular it provides a lower bound on the local dimension ${d}$. A useful observation is that the Schmidt rank is invariant under local unitary operations: these may affect the Schmidt vectors ${\{u_i\}}$ and ${\{v_i\}}$, but not the number of non-zero terms.

3.1. A certificate for high-dimensional entanglement

Among all entangled states in dimension ${d}$, the maximally entangled state ${\phi_d}$ is the one which maximizes entanglement entropy, defined as the Shannon entropy of the distribution induced by the squares of the Schmidt coefficients:

$\displaystyle \phi_d \,=\, \frac{1}{\sqrt{d}} \sum_{i=1}^d\, e_i\otimes e_i,$

with entropy ${\log d}$. The following lemma gives a “robust” characterization of the maximally entangled state in dimension ${d=2^n}$ as the unique common eigenvalue-${1}$ eigenvector of all operators of the form ${\sigma_P \otimes \sigma_P}$, where ${\sigma_P}$ ranges over the elements of the unique ${2^n}$-dimensional irreducible representation of the Weyl-Heisenberg group ${H^{(n)}}$, i.e. the Pauli matrices (taken modulo ${\sqrt{-1}}$).

Lemma 6 Let ${\varepsilon\geq 0}$, ${n}$ an integer, ${d=2^n}$, and ${\psi\in {\mathbb C}^d \otimes {\mathbb C}^d}$ a unit vector such that

$\displaystyle \mathop{\mathbb E}_{a,b}\, \psi^* \big(\sigma_X(a) \sigma_Z(b)\otimes \sigma_X(a) \sigma_Z(b) \big) \psi \geq 1-\varepsilon. \ \ \ \ \ (13)$

Then ${|\psi^*\phi_{2^n}|^2 \geq 1-\varepsilon}$. In particular, ${\psi}$ has Schmidt rank at least ${(1-\varepsilon) 2^n}$.

Proof: Consider the case ${n=1}$. The “swap” matrix

$\displaystyle S = \frac{1}{4}\big(\sigma_I \otimes \sigma_I + \sigma_X \otimes \sigma_X + \sigma_Z \otimes \sigma_Z + \sigma_W \otimes \sigma_W\big)$

squares to identity and has a unique eigenvalue-${1}$ eigenvector, the vector ${\phi_2 = (e_1\otimes e_1 + e_2\otimes e_2)/\sqrt{2}}$ (a.k.a. “EPR pair”). Thus ${\psi^* S \psi \geq 1-\varepsilon}$ implies ${|\psi^* \phi|^2 \geq 1-\varepsilon}$. The same argument for general ${n}$ shows ${|\psi^* \phi_{2^n}|^2 \geq 1-\varepsilon}$. Any unit vector ${u}$ of Schmidt rank at most ${r}$ satisfies ${|u^* \phi_{2^n}|^2 \leq r2^{-n}}$, concluding the proof. $\Box$

Lemma 6 provides an “experimental road-map” for establishing that a bipartite system is in a highly entangled state:

• (i) Select a random ${\sigma_P = \pm\sigma_X(a)\sigma_Z(b) \in H^{(n)}}$;
• (ii) Measure both halves of ${\psi}$ using ${\sigma_P}$;
• (iii) Check that the outcomes agree.

To explain the connection between the above “operational test” and the lemma I should review what a measurement in quantum mechanics is. For our purposes it is enough to talk about binary measurements (i.e. measurements with two outcomes, ${+1}$ and ${-1}$). Any such measurement is specified by a pair of orthogonal projections, ${M_+}$ and ${M_-}$, on ${{\mathbb C}^d}$ such that ${M_++M_- = I_d}$. The probability of obtaining outcome ${\pm}$ when measuring ${\psi}$ is ${\|M_\pm \psi\|^2}$. We can represent a binary measurement succinctly through the observable ${M=M_+-M_-}$. (In general, an observable is a Hermitian matrix which squares to identity.) It is then the case that if an observable ${M}$ is applied on the first half of a state ${\psi\in{\mathbb C}^d\otimes{\mathbb C}^d}$, and another observable ${N}$ is applied on the second half, then the probability of agreement, minus the probability of disagreement, between the outcomes obtained is precisely ${\psi^*(M\otimes N)\psi}$, a number which lies in ${[-1,1]}$. Thus the condition that the test described above accepts with probability ${1-\varepsilon}$ when performed on a state ${\psi}$ is precisely equivalent to the assumption (13) of Lemma 6.

Even though this provides a perfectly fine test for entanglement in principle, practitioners in the foundations of quantum mechanics know all too well that their opponents — e.g. “quantum-skeptics” — will not be satisfied with such an experiment. In particular, who is to guarantee that the measurement performed in step (ii) is really ${\sigma_P\otimes\sigma_P}$, as claimed? To the least, doesn’t this already implicitly assume that the measured system has dimension ${2^n}$?

This is where the notion of device independence comes in. Briefly, in this context the idea is to obtain the same conclusion (a certificate of high-dimensional entanglement) without any assumption on the measurement performed: the only information to be trusted is classical data (statistics generated by the experiment), but not the operational details of the experiment itself.

This is where Corollary 5 enters the picture. Reformulated in the present context, the corollary provides a means to verify that arbitrary measurements “all but behave” as Pauli measurements, provided they generate the right statistics. To explain how this can be done we need to provide additional “operational tests” that can be used to certify the assumptions of the corollary.

3.2. Testing the Weyl-Heisenberg group relations

Corollary 5 makes three assumptions about the observables ${X(a)}$ and ${Z(b)}$: that they satisfy approximate consistency (9), linearity (10), and anti-commutation (11). In this section I will describe two (somewhat well-known) tests that allow to certify these relations based only on the fact that the measurements generate statistics which pass the tests.

Linearity test:

• (a) The referee selects ${W\in\{X,Z\}}$ and ${a,a'\in\{0,1\}^n}$ uniformly at random. He sends ${(W,a,a')}$ to one player and ${(W,a)}$, ${(W,a')}$, or ${(W,a+a')}$ to the other.
• (b) The first player replies with two bits, and the second with a single bit. The referee accepts if and only if the player’s answers are consistent.

As always in this note, the test treats both players simultaneously. As a result we can (and will) assume that the player’s strategy is symmetric, and is specified by a permutation-invariant state ${\psi\in {\mathbb C}^d \otimes {\mathbb C}^d}$ and a measurement for each question: an observable ${W(a)}$ associated to questions of the form ${(W,a)}$, and a more complicated four-outcome measurement ${\{W^{a,a'}\}}$ associated with questions of the form ${(W,a,a')}$ (It will not be necessary to go into the details of the formalism for such measurements).

The linearity test described above is exactly identical to the BLR linearity test described earlier, but for the use of the basis label ${W\in\{X,Z\}}$. The lemma below is a direct analogue of Lemma 1, which extends the analysis to the setting of players sharing entanglement. The lemma was first introduced in a joint paper with Ito, where we used an extension of the linearity test, Babai et al.’s multilinearity test, to show the inclusion of complexity classes NEXP${\subseteq}$MIP${^*}$.

Lemma 7 Suppose that a family of observables ${\{W(a)\}}$ for ${W\in\{X,Z\}}$ and ${a\in\{0,1\}^n}$, generates outcomes that succeed in the linearity test with probability ${1-\varepsilon}$, when applied on a bipartite state ${\psi\in{\mathbb C}^d\otimes {\mathbb C}^d}$. Then the following hold: approximate consistency

$\displaystyle \mathop{\mathbb E}_a \, \psi^* \big(X(a) \otimes X(a)\big) \psi \,=\,1-O(\varepsilon),\qquad \mathop{\mathbb E}_b \, \psi^* \big(Z(b) \otimes Z(b) \big)\psi\,\geq\,1-O(\varepsilon),$

and linearity

$\displaystyle \mathop{\mathbb E}_{a,a'} \,\big\|X(a)X(a')-X(a+a')\big\|_\sigma^2 = O(\varepsilon),\qquad\mathop{\mathbb E}_{b,b'}\, \big\|Z(b)Z(b')-Z(b+b')\big\|_\sigma^2 \,=\, O({\varepsilon}).$

Testing anti-commutation is slightly more involved. We will achieve this by using a two-player game called the Magic Square game. This is a fascinating game, but just as for the linearity test I will treat it superficially and only recall the part of the analysis that is useful for us (see e.g. the paper by Wu et al. for a description of the game and a proof of Lemma 8 below).

Lemma 8 (Magic Square) The Magic Square game is a two-player game with nine possible questions (with binary answers) for one player and six possible questions (with two-bit answers) for the other player which has the following properties. The distribution on questions in the game is uniform. Two of the questions to the first player are labelled ${X}$ and ${Z}$ respectively. For any strategy for the players that succeeds in the game with probability at least ${1-\varepsilon}$ using a bipartite state ${\psi\in{\mathbb C}^d\otimes {\mathbb C}^d}$ and observables ${X}$ and ${Z}$ for questions ${X}$ and ${Z}$ respectively, it holds that

$\displaystyle \big\|\big( (XZ+ZX)\otimes I_d \big)\psi\big\|^2 \,=\, O\big(\sqrt{\varepsilon}\big). \ \ \ \ \ (14)$

Moreover, there exists a strategy which succeeds with probability ${1}$ in the game, using ${\psi=\phi_4}$ and Pauli observables ${\sigma_X \otimes I_2}$ and ${\sigma_Z\otimes I_2}$ for questions ${X}$ and ${Z}$ respectively.

Based on the Magic Square game we devise the following “anti-commutation test”.

Anti-commutation test:

• (a) The referee selects ${a,b\in\{0,1\}^n}$ uniformly at random under the condition that ${a\cdot b=1}$. He plays the Magic Square game with both players, with the following modifications: if the question to the first player is ${X}$ or ${Z}$ he sends ${(X,a)}$ or ${(Z,b)}$ instead; in all other cases he sends the original label of the question in the Magic Square game together with both strings ${a}$ and ${b}$.
• (b) Each player provides answers as in the Magic Square game. The referee accepts if and only if the player’s answers would have been accepted in the game.

Using Lemma 8 it is straightforward to show the following.

Lemma 9 Suppose a strategy for the players succeeds in the anti-commutation test with probability at least ${1-\varepsilon}$, when performed on a bipartite state ${\psi \in {\mathbb C}^d \otimes {\mathbb C}^d}$. Then the observables ${X(a)}$ and ${Z(b)}$ applied by the player upon receipt of questions ${(X,a)}$ and ${(Z,b)}$ respectively satisfy

$\displaystyle \mathop{\mathbb E}_{a,b:\,a\cdot b=1} \,\big\| X(a)Z(b)-(-1)^{a\cdot b} Z(b)X(a)\big\|_\sigma^2\,=\,O\big(\sqrt{\varepsilon}\big). \ \ \ \ \ (15)$

3.3. A robust test for high-dimensional entangled states

We are ready to state, and prove, our main theorem: a test for high-dimensional entanglement that is “robust”, meaning that success probabilities that are a constant close to the optimal value suffice to certify that the underlying state is within a constant distance from the target state — in this case, a tensor product of ${n}$ EPR pairs. Although arguably a direct “quantization” of the BLR result, this is the first test known which achieves constant robustness — all previous ${n}$-qubit tests required success that is inverse polynomially (in ${n}$) close to the optimum in order to provide any meaningful conclusion.

${n}$-qubit Pauli braiding test: With probability ${1/2}$ each,

• (a) Execute the linearity test.
• (b) Execute the anti-commutation test.

Theorem 10 Suppose that a family of observables ${W(a)}$, for ${W\in\{X,Z\}}$ and ${a\in\{0,1\}^n}$, and a state ${\psi\in{\mathbb C}^d\otimes {\mathbb C}^d}$, generate outcomes that pass the ${n}$-qubit Pauli braiding test with probability at least ${1-\varepsilon}$. Then ${d= (1-O(\sqrt{\varepsilon}))2^n}$.

As should be apparent from the proof it is possible to state a stronger conclusion for the theorem, which includes a characterization of the observables ${W(a)}$ and the state ${\psi}$ up to local isometries. For simplicity I only recorded the consequence on the dimension of ${\psi}$.

Proof: Using Lemma 7 and Lemma 9, success with probability ${1-\varepsilon}$ in the test implies that conditions (9)(10) and (11) in Corollary 5 are all satisfied, up to error ${O(\sqrt{\varepsilon})}$. (In fact, Lemma 9 only implies (11) for strings ${a,b}$ such that ${a\cdot b=1}$. The condition for string such that ${a\cdot b=0}$ follows from the other conditions.) The conclusion of the corollary is that there exists an isometry ${V}$ such that the observables ${X(a)}$ and ${Z(b)}$ satisfy

$\displaystyle \mathop{\mathbb E}_{a,b}\, \big\| X(a)Z(b) - V^*g(\sigma_X(a)\sigma_Z(b))V \big\|_\sigma^2 \,=\, O(\sqrt{\varepsilon}).$

Using again the consistency relations (9) that follow from part (a) of the test together with the above we get

$\displaystyle \mathop{\mathbb E}_{a,b}\, \psi^* (V\otimes V)^* \big( \sigma_X(a)\sigma_Z(b)\otimes \sigma_X(a)\sigma_Z(b)\big)(V\otimes V)\psi \,=\, 1-O(\sqrt{\varepsilon}).$

Applying Lemma 6, ${(V\otimes V)\psi}$ has Schmidt rank at least ${(1-O(\sqrt{\varepsilon}))2^n}$. But ${V}$ is a local isometry, which cannot increase the Schmidt rank. $\Box$