**The result **

Let’s start by recalling Mahadev’s result. She shows that from any quantum computation, specified by a polynomial-size quantum circuit , it is possible to efficiently compute a *classical-verifier quantum-prover protocol*, i.e.~a prescription for the actions of a classical probabilistic polynomial-time verifier interacting with a quantum prover, that has the following properties. For simplicity, assume that produces a deterministic outcome when it is executed on qubits initialized in the state (any input can be hard-coded in the circuit). At the end of the protocol, the verifier always makes one of three possible decisions: “reject”; “accept, 0”; “accept, 1”. The *completeness* property states that for any circuit there is a “honest” behavior for the prover that can be implemented by a polynomial-time quantum device and that will result in the verifier making the decision “accept, ”, where is the correct outcome, with probability . The *soundness* property states that for any behavior of the quantum prover in the protocol, either the probability that the verifier returns the outcome “accept, ” is negligibly small, or the quantum prover has the ability to break a post-quantum cryptographic scheme with non-negligible advantage. Specifically, the proof of the soundness property demonstrates that a prover that manages to mislead the verifier into making the wrong decision (for any circuit) can be turned into an efficient attack on the learning with errors (LWE) problem (with superpolynomial noise ratio).

The fact that the protocol is only sound against computationally bounded provers sets it apart from previous approaches, which increased the power of the verifier by allowing her to dispose of a miniature quantum computer, but established soundness against computationally unbounded provers. The magic of Mahadev’s result is that she manages to leverage this sole assumption, computational boundnedness of the prover, to tie a very tight “leash” around its neck, by purely classical means. My use of the word “leash” is not innocent: informally, it seems that the cryptographic assumption allows Mahadev to achieve the kind of feats that were previously known, for classical verifiers, in the model where there are two quantum provers sharing entanglement. I am not sure how far the analogy extends, and would like to explore it further; this has already started with a collaboration with Brakerski, Christiano, Mahadev and Vazirani that led to a single-prover protocol for certifiable randomness expansion. Nevertheless, the main open question left open by Mahadev’s work remains whether the computational assumption is even necessary: could a similar result hold, where the honest prover can perform the required actions in quantum polynomial-time, but the protocol remains sound against arbitrarily powerful provers? (Experts will have recognized that the existence of a protocol where the honest prover is as powerful as PSPACE follows from the classical results that BQP is in PSPACE, and that PSPACE=IP. Unfortunately, we currently don’t expect even a supercharged AWS cloud to be able to implement PSPACE-complete computations.)

** Encoding computation in ground states **

Let’s get to business: how does this work? Fix a quantum circuit that the verifier is interested in. Assume the description of known to both the verifier and the prover. As earlier, assume further that when is executed on a state initialized to a measurement of the output qubit of the circuit returns either the outcome or the outcome , deterministically. The verifier wishes to determine which case holds.

The first step that the verifier performs is a classical polynomial-time reduction from this *circuit output decision problem* to the following *Hamiltonian energy decision problem*. In the Hamiltonian energy decision problem the input is the description of a pair of classical polynomial-time randomized circuits. The first circuit, , takes as input a random string , and returns a string . The second circuit, , takes as input a string of the kind returned by the first circuit, as well as an -bit string , and returns a “decision bit” . The goal of the verifier is to distinguish between the following two cases. Either there exists an -qubit state such that, when a string is sampled according to (choosing a uniformly random as input), the qubits of are measured in the bases specified by (i.e.~the -th qubit is measured in the computational basis in case , and in the Hadamard basis in case ), the resulting -bit outcome satisfies with probability at least . Or, for any state , the same procedure results in with probability at most .

I called this problem the Hamiltonian energy decision problem because the circuits and implicitly specify a Hamiltonian, whose minimal energy the verifier aims to approximate. Note that the Hamiltonian is not required to be local, and furthermore it may involve an average of exponentially many terms (as many as there are random strings ). The problem is still in QMA, because the verifier is efficient. It is not hard to show that the problem is QMA-hard. What the formulation above buys us, compared to using the usual QMA-complete formulation of the local Hamiltonian problem, is the constant energy gap — which comes at the cost of exponentially many terms and loss of locality. (Open question: I would like to know if it is possible to achieve a constant gap with only one of these caveats: local with exponentially many terms, or nonlocal with polynomially terms.) Of course here we only care that the problem is BQP-hard, and that the witness can be computed by a BQP prover; this is indeed the case. We also don’t really care that there is a constant gap – the soundness of the final protocol could be amplified by other means – but it is convenient that we are able to assume it.

The reduction that achieves this is a combination of Kitaev’s history state construction with some gadgetry from perturbation theory and an amplification trick. The first step reduces the verification that returns outcome (resp. ) on input to the verification that a local Hamiltonian (computed from ) has ground state energy exponentially close to (resp. at least some positive inverse polynomial). The second step consists in applying perturbation theory to reduce to the case where is a weighted linear combination of terms of the form and , where , are the Pauli and operators on the -th and -th qubit respectively. The final step is an amplification trick, that produces a nonlocal Hamiltonian whose each term is a tensor product of single-qubit and observables and has ground state energy either less than or larger than (when the Hamiltonian is scaled to be non-negative with norm at most ).

These steps are fairly standard. The first two are combined in a paper by Fitzsimons and Morimae to obtain a protocol for “post-hoc” verification of quantum computation: the prover prepares the ground state of an local Hamiltonian whose energy encodes the outcome of the computation, and sends it to the verifier one qubit at a time; the verifier only needs to perform single-qubit and measurements to estimate the energy. The last step, amplification, is described in a paper with Natarajan, where we use it to obtain a multi-prover interactive proof system for QMA.

For the remainder of this post, I take the reduction for granted and focus on the core of Mahadev’s result, a verification protocol for the following problem: given a Hamiltonian of the form described in the previous paragraph, decide whether the ground state energy of is smaller than , or larger than .

** Stitching distributions into a qubit **

In fact, for the sake of presentation I’ll make one further drastic simplification, which is that the verifier’s goal has been reduced to verifying the existence of a *single-qubit* state , whose existence is claimed by the prover. Specifically, suppose that the prover claims that it has the ability to prepare a state such that , and , for real parameters . In other words, that the Hamiltonian has minimal energy at most . How can one verify this claim? (Of course we could do it analytically\ldots{}but that approach would break apart as soon as expectation values on larger sets of qubits are considered.)

We could ask the prover to measure in the basis, or the basis, repeatedly on identical copies of , and report the outcomes. But how do we know that all these measurements were performed on the same state, and that the prover didn’t choose e.g. to report the -basis outcomes, and to report the -basis outcomes? We need to find a way to prevent the prover from measuring a different state depending on the basis it is asked for — as well as to ensure the measurement is performed in the right basis.

** Committing to a qubit **

The key idea in Mahadev’s protocol is to use cryptographic techniques to force the prover to “commit” to the state in a way that, once the commitment has been performed, the prover no longer has the liberty to “decide” which measurement it performs on the commited qubit (unless it breaks the cryptographic assumption).

I described the commitment scheme in the companion post here. For convenience, let me quote from that post. Recall that the scheme is based on a pair of trapdoor permutations that is *claw-free*. Informally, this means that it is hard to produce any pair such that .

The commitment phase of the protocol works as follows. Starting from a state of its choice, the prover is supposed to perform the following steps. First, the prover creates a uniform superposition over the common domain of and . Then it evaluates either function, or , in an additional register, by controlling on the qubit of . Finally, the prover measures the register that contains the image of or . This achieves the following sequence of transformations:

where is the measured image. The string is the prover’s *commitment string*, that it reports the verifier.

The intuition for this commitment procedure is that it introduces asymmetry between prover and verifier: the prover knows (it had to report it to the verifier) but not and (this is the claw-free assumption on the pair ), which seems to prevent it from recovering the original state , since it does not have the ability to “uncompute” and . In contrast, the verifier can use the trapdoor information to recover both preimages.

In a little more detail, how is this used? Note that at this point, from the verifier’s point of view the only information that has been received is the prover’s commitment string . In general there are multiple ways a prover could have come up with a value : for example, by selecting an and returning . Or, by directly selecting an arbitrary string . At this stage of the protocol, any of these strategies look fine.

Let’s modify the commitment phase by adding a little test. With some probability, the verifier, upon receiving the commitment string , decides to challenge the prover by asking it to report a valid preimage of , either under or under (to the prover’s choice). Since both and are presumed to be hard to invert, the only way the prover can answer this challenge is if it already “knows” a valid preimage — or at a minimum, if it has a superposition on preimages that it can measure when tested. Thus the fact that the prover is required to succeed in the commitment test, when it is performed, guarantees that after the prover has returned the commitment string we may without loss of generality assume that the prover’s state can be written as

where we have purposefully spelled out the two possible preimages that the prover could return if challenged. Note that aside from the fact that it gives the ability to obtain or , this format does not make any assumption on ; in particular the register containing the preimage can be entangled with other private registers of the prover.

We have defined a four-message commitment protocol: the verifier sends the security parameters to prover; the prover sends a commitment string back; an optional one-round preimage test is executed. Now is the time to give a first definition for the single qubit to which the prover has “committed” by returning . This *committed qubit* is the state that we ultimately aim to show has the claimed expectation values under and measurements.

Let be the qubit obtained from by erasing and (which is possible given knowledge of ) and returning the first qubit of the resulting state. (Later we will slightly modify this definition, but it is a good placeholder to get us started.) Note that the verifier does not know the state ; in fact, strictly speaking is not present on the prover’s workspace either. The point is that exists, and this is all we need. Our remaining task is to find a way for the verifier to extract from the prover measurement outcomes that are distributed as would be a measurement of in the or basis, without the prover having the ability to deviate. If the verifier can do this, for a basis of her choice, she can choose a basis at random, estimate the expectation value, and check the prover’s claim (the values or ).

As already mentioned, the key point that we’ll use in order to achieve this is that at the end of the commitment phase, the verifier has obtained some leverage over the prover: given and the trapdoor information, the verifier can recover both and . In contrast, the prover, while it holds the state , is not able to freely operate on it. Without the trapdoor, it can no longer uncompute and to recover the initial state , and so it can’t obviously apply, say, the unitary on that would amount to performing a single-qubit rotation on .

** Measuring in the computational basis **

We need to explain how the verifier extracts measurement outcomes in the X (Hadamard) or Z (computational) basis from the prover. For each basis there is a small sub-protocol. At the end of the sub-protocol the verifier records a single bit, that it considers is the outcome obtained by a measurement of the committed qubit, , in the corresponding basis. We call this bit the verifier’s “decoded bit” for that basis.

The protocol for extracting the outcome of a measurement in the computational basis is straightforward. Recall that by definition the prover’s state after the commitment phase has ended is the state in (1). Moreover, recall that we made a choice of basis for the provers’ space such that when the prover is challenged for a preimage of , it measures the first qubits of in the computational basis and returns the outcome. Now observe that the first bit of this outcome is with probability , and with probability . This is exactly the distribution of the outcome of a measurement of the committed qubit in the computational basis, by definition! Thus to extract a measurement outcome in the computational basis the verifier simply executes the preimage test and records the first bit returned by the prover as the decoded bit.

** Measuring in the Hadamard basis **

Extracting a measurement outcome in the Hadamard basis is more delicate. Recall the form of the prover’s state in (1). Given our definition of the committed qubit , the natural way to obtain a measurement of in the Hadamard basis, starting from , is to first erase the register containing and , and then perform a Hadamard measurement of the first qubit. But even an honest prover cannot accomplish this, as it does not have the trapdoor information that would allow to erase and (of course we purposefully set things up this way). What the prover *can* do, however, is measure all qubits of the register containing and in the Hadamard basis. The result of this measurement is an -bit string . The corresponding post-measurement state is, up to global phase,

where the addition is taken bitwise, modulo . Observe that this state is almost exactly the committed qubit — except for a phase flip, , applied on the first qubit. If the prover measures the remaining qubit in the Hadamard basis, the phase flip leads to a bit flip on the outcome of the measurement. So the verifier can ask the prover to report both and ; if she recourds the decoded bit then this bit matches the outcome of a measurement of in the Hadamard basis.

This completes the description of the measurement sub-protocol for the Hadamard basis. It is clear that a honest prover, performing the actions described above, will induce the verifier into recording the correct outcome. Now of course in general the prover may act in an arbitrary way! It could report any values for : the verifier accepts any outcomes on faith. How could this possibly work out? There is magic in Mahadev’s proof.

** Malicious provers **

Let’s assume, as we already have, that the prover is arbitrary but that, if tested in the commitment phase, it succeeds with certainty. According to the discussion around (1) this implies that at the end of the commitment phase the prover holds a state of the form . Moreover, by definition, when asked for a computational basis measurement the prover measures the first qubits of in the computational basis and reports the outcome; the verifier records the first bit as its decoded bit.

As we already argued, our earlier definition of the committed qubit ensures that the verifier’s decoded bit for the case of a computational basis measurement matches the outcome of a measurement of in the computational basis. Unfortunately for the case of a Hadamard basis measurement we are in trouble. Since the prover may in principle report an arbitrary pair there is no chance to argue that this matches (in distribution) the outcome of a measurement of in the Hadamard basis. To find a state that is consistent with the verifier’s decoded bit in both bases we need to change our definition of the committed qubit to take into account the prover’s action in the case it is asked for a Hadamard measurement.

Recall that the main leverage that the verifier has over the prover is that, while the prover does have the possibility of reporting arbitrary outcomes , it *does not* have control over the verifier’s decoding, i.e.~the operation . Let’s work a little bit and spell out the distribution of the verifier’s Hadamard basis decoded bit, . Towards this it is convenient to think of the prover in the following way: the prover first applies an arbitrary unitary “attack” on , then “honestly” measures the first qubits in the Hadamard basis, and finally reports the -bit outcome . An arbitrary -bit-outcome measurement can always be expressed in this way. With this setup we can write the probability that the decoded bit is some value as

Before we can proceed we should say a little more about the computational assumptions that are placed on the pair of functions . Earlier we mentioned this pair of functions should be claw-free, but in fact a little more is needed — though all requirements can ultimately be met by a construction based on the Learning With Errors problem. Rather than state the exact assumptions, I will mention two important consequences. The first is that the pair of functions is “collapsing”, a notion introduced by Unruh in his investigations of collision-resistance against quantum attacks. In our context this property implies that it is computationally hard to distinguish between an arbitrary superposition over preimages, as in , and the “collapsed” state obtained by measuring the control register (the first qubit). The second is that for any -bit string that can be obtained as the outcome of an arbitrary, but computationally efficient, measurement on the collapsed state, the bit is computationally indistinguishable from uniform. (This is analogous to a “hardcore bit” property, since encodes information about both preimages simultaneously, and such information should not be accessible if the pair is claw-free.)

These two assumptions taken together justify the following two modifications to the expression for in (2), that will lead to a computationally indistinguishable distribution. First, we can “collapse” the first qubits of by measuring them in the computational basis. Second, we can replace the bit by a uniformly random bit . Using that , the expression simplifies to

where the outermost were inserted thanks to the first assumption (the collapsing property), and the innermost come from commuting past the Hadamard. I should clarify that obtaining (3) formally requires more care. In particular, I made use of computational indistinguishability in an expression that involves a quantity that is hard to compute (the parity ). This is illegal, and to work around the difficulty Mahadev has to introduce some additional ingenious manipulations that I am skipping here.

Note the key effect that the random operator has in (3): it effectively trivializes the action of the prover’s “attack” on the first qubit *with respect to the computational basis*. Thus the result of this argument is that we have managed to argue that the verifier’s decoded bit associated with the Hadamard basis is *computationally indistinguishable* from the outcome of a Hadamard measurement on the state

where we expanded the first qubit of the unitary as , and represents all registers except the first qubit. Note that the second term involves an on the first qubit, which has no effect on a measurement in the Hadamard basis. Thus, can be updated to a state where we have “erased” the operator on the first qubit. Moreover, by definition, a measurement of the first (and only) qubit of in the computational basis yields an outcome distributed exactly as it would on . In particular, it is consistent with the verifier’s decoded bit in the computational basis measurement protocol.

We are done! The state is a well-defined single-qubit state such that the distribution of decoded bits recorded by the verifier for either basis is computationally indistinguishable from the distribution of outcomes of a measurement of in the same basis. Note that may not “exist” at any point of the protocol. But this is besides the point: as long as is a well-defined quantum state, and the verifier correctly records decoded measurement outcomes, this eventually leads to a valid certificate for the prover’s claim that the XZ Hamiltonian that encodes the computation has low enough energy.

Phew. Catch your breath, read this post again (and please do ask for clarifications as needed), and then move on to the beautiful paper, whose introduction already has more depth than I could provide here, and whose body fills in all the remaining gaps. (This includes how to deal with states that are more than a single qubit, an issue that my presentation of the single-qubit case may make seem more thorny than it is — in fact, it is possible to express the argument given here in a way that makes it relatively straightforward to extend to multiple qubits, though there are some technical issues, explained in Mahadev’s paper.) And then – use the idea to prove something!

]]>The past two years, and possibly even more so the coming couple years, may well be remembered as the moment when quantum computing entered the mainstream. Most of us have heard of IBM’s quantum computer in the cloud, of Google’s effort in , and of Microsoft’s naturally fault-tolerant \href. Some of us might also have encountered a few of the dozens of startups promising everything from quantum hardware to quantum learning, that seem to be appearing out of nowhere, raising capital in just a few months.

It is an interesting question, better left for wiser times, whether these events will be remembered as the initial sparks of a revolution in computing, or as the height of a “quantum bubble”. Bubble or no bubble, quantum information science is here to stay: while current developments make topics such as the computational power of small-scale quantum computers, the possibilities for testing quantum mechanics, all the more exciting, quantum cryptography, the theory of quantum error-correction, the ever-increasing applications of “quantum techniques” to problems from theoretical computer science, do not hinge on the success of current experiments.

In guise of teaser, our plan for the school is roughly as follows. Each day will have about 6 hours of lecture, a couple hours of informal “TA sessions” (to learn a language, one needs to practice it!), and some time for social interaction. This is a fairly heavy schedule, but if these are the 3.5 days you are going to spend learning about quantum information in your career, we want them to be useful. What this means is that we’ll simultaneously aim to cover the basics, so as to establish a common language, while quickly zooming in to a selection of the most interesting questions, such as the power of alternate models of quantum computation, the theory of quantum error-correction and fault-tolerance, or problems in quantum testing and quantum delegation.

In a little more detail, and although you should not treat this as contractual information, here is a sketch of our program for the school:

**Day 1:** Introduction to quantum information: one qubit, qubits, the quantum circuit model, simple algorithms and computational speed-ups in the query model. Introduction to quantum complexity: the class QMA and the local Hamiltonian problem.

**Day 2:** Protocols for delegating quantum computations. The adiabatic model of computation and its equivalence to the circuit model. Quantum error-correction, stabilizer codes, and fault tolerance.

**Day 3:** Restricted models of computation (shallow circuits, commuting circuits). Testing quantum systems. The quantum PCP conjecture and connection to many-body entanglement. Multi-prover interactive proofs with multiple provers sharing entanglement. Quantum linearity testing.

**Day 4:** More restricted models of computation. Quantum optimization algorithms. Stoquastic Hamiltonians. Quantum Monte-Carlo and simulation.

If you’re not an expert in quantum information, a lot of these topics might not make much sense a priori. This is why you should come! Our goal in these 3.5 days is to summarize what we believe ought to be the highlights of a couple semesters’ worth of graduate courses in quantum information. Aside from the basics in the first day, each lecture will cover a topic of current interest, giving you the ability to understand the importance of recent progress, and start thinking about some of the more TCS-friendly problems. Towards this, we’ll highlight as many open problems as we can think of (and fit in the alloted time), and allow ample time for questions, discussions, and hands-on exercise sessions. Join us: register here!

]]>It is easiest to organize the discussion in chronological order: I will go through all the steps (and missteps), from the initial invitation email to the final notification to authors (probably safer to stop there – the next step would drag me into a discussion of the authors’ reaction, the consequences of which even the use of aliases may not save me from).

You just received that glowing email — “Dear XX, would you be interested to serve as PC chair for ConfY’18?”. Followed by the obligatory series of flattering comments. Such an honor… who would refuse? But don’t jump on that reply-send button just now. Here are a few points to take into consideration before making the decision.

First off, carefully consider the reviewing schedule. The dates of the conference are likely decided already, giving you a good sense of when the submission and notification deadlines will fall. The period in-between represents two to four months of your working life. Are you ready to give them up? I estimate that most days within that period you will have to allocate one to two hours’ work to the PC reviewing process (the load is not evenly spread: during the reviewing phase, it depends how many papers you assign yourself to review; during the discussion phase, it depends on how active you are, whether there is an in-person meeting, etc.). This is a serious commitment, comparable in load to taking on an additional 12-week teaching job. So if you’re already planning on teaching two courses during the same period – think twice.

A second point to consider discussing upfront with the steering committee (SC) is your “mission”. The SC probably has its own idea of the scope of the conference (there might even be a charter), how many papers they would like to be accepted, what justifies a “ConfY-worthy paper”, etc. How rigid are they going to be regarding these points? How much interference can you expect — do you have full latitude in deciding final acceptances (should be)? How flexible is the final number of accepts?

Last but not least, make sure this is something you *want* to do. How good is ConfY? Does it serve a specific purpose that you value? How often have you attended, or served on the PC? Do you feel competent to make decisions across all areas covered by the conference? Check the past couple years’ accepts. Many conferences are broader than we think, just because when we attend we tend to unconsciously apply a selective bias towards those talks for which we can at least parse the title. This time you’ll have to understand the contents of every single one of the submitted (let alone accepted) papers. So again, is this something you {\textit want} to do?

**Selecting the PC.** Now that the fatal decision has been made, my first piece of advice is all too simple: *seek advice*. Your first task is to form a PC. This is clearly the most influential decision you will make, both in terms of the quality and content of the final program, as well as the ease with which you and your “team” will get there. Choosing a PC is a delicate balancing act. A good mix of seniority and young blood is needed: seniority for the experience, the perspective, and the credibility; young blood for the energy, the taste for novelty, the muscle power. It is a good idea to involve a PC member from the previous installment of the conference; this may in particular help with the more difficult cases of resubmission.

I was fortunate to receive multiple recommendations from the SC, past conference chairs, and colleagues. While you obviously want to favor diversity and broad representation of topic areas, I also recommend selecting PC members with whom one has a personal connection. My experience has been that the amount of effort any one person is willing to put into the PC process varies hugely. It is inevitable that some PC members will eventually drift away. The more connection you have to them the easier it will be to handle irresponsiveness or divergences of opinion.

The more important comment I will make, one which I wish I had been more keenly aware of, is to *know your PC*. You will eventually select a team of researchers with complementary qualities, not only in terms of the areas that they are familiar with but also in more human terms: some will be good at responding to “quick opinion” calls on difficult papers, while others will have insightful comments about the overall balance of papers in the emerging list of accepted papers, or generate thoughtful advice on the handling of the more tricky cases, etc. At multiple points in the process you will need help; it is crucial to know the right person to turn to, lest you waste precious days or make ill-informed decisions.

With a list of names in hand, you are ready to send out invitations. (Before doing so, consider forming a rough schedule for the reviewing process. This will be needed for PC members to decide whether they will be sufficiently available during the requisite periods.) In my experience this part went smoothly. About of those on my initial list accepted the invitation (thanks!!). Filling in the remaining slots took a little more time. A small tip: if a researcher does not respond to your invitation within a reasonable delay, or is slow to decide whether to join or not, don’t push too hard: while you need a strong PC, you also need a responsive PC. It is not a good idea to start off in a situation where someone is “doing you a favor” by accepting the invitation as a result of some heavy arm-twisting.

**Drafting a CFP.** The second main item on your pre-submission agenda is the drafting of a call for papers (CFP). This may be done in cooperation with the SC. CFP from previous years can serve as a starting point. Check with last year’s PC chair if they were happy with the wording used, or if they have a posteriori recommendations: did they omit an important area of interest? Were the submission instructions, including formatting guidelines, clear?

A good CFP balances out two conflicting desirata: first, it should make your life easier by ensuring that submissions follow a reasonably homogeneous format, and are presented in a way that facilitates the reviewing process; second, it should not place an unreasonable burden on the authors who, as we all know, have better things to do (and will read the instructions, if they ever read them, no earlier than 23:59 in any timezone – making an overly precise CFP a sure recipe for disaster).

One place where precision *is* needed is in the formulation of the requirements for rigor and completeness. Are full proofs expected, or will a short 3-page abstract suffice? Or should it be both – a short abstract clearly presenting the main ideas, together with a full paper providing precise complete proofs? Be warned that, whatever the guidelines, they will be stretched, forcing you into quick judgment calls as to whether a submission fits the CFP guidelines.

You should also pay attention to the part of the CFP that concerns the scope of the conference: although for all I know this is all but ignored by most authors, and varies little from year to year, it does play an important role in carving out an inch of originality and specificity for the conference.

Another item on the CFP is the “key dates” that will bound the time available for the reviewing process: the submission deadline and the notification date. Here again there are conflicting requirements: the submission date should be as late as possible (to ensure accepted papers are as fresh as possible by the time the conference takes place), the reviewing phase as long as possible (you’re going to need it…), and the notification as early as possible (so there is time to compile proceedings, when they exist, and for authors to make travel arrangements). In my experience as PC member the time allocated for reviewing almost invariably felt too long – yes, I did write *too long*. However much time is allocated for the reviewing phase invariably ends up divided into procrastination and actual reviewing effort (obviously the actual reviewing gets under way too late for it to be completed by the reviewing deadline, which typically gets overstretched by some ). I suggest that a good calendar should allocate a month for collecting reviews, and a month for discussion. This is tight but sufficient, and will ensure that everyone remains engaged throughout. A month for reviewing allows a week for going through papers and identifying those for which external help should be sought; 2-3 weeks for actual reviewing; and a week for collecting reviews, putting the scores together, and scrambling through the last-minute calls for help. Similarly, a month of discussion would allow a week for score homogenization, two weeks to narrow down on the (say) borderline papers, and a final week to make those tough ultimate decisions. Tight, but feasible. Remember: however much time you allocate, will be taken up!

Now, as good a calendar you may have come up with, *plan for delays*. In my case I typically informed PC members that “reviews have to be completed by April 29th” and “the discussion phase will start on May 1st”. The “hidden” three days in-between the two dates were more than needed to track down missing reviews. Don’t ask PC members to complete a task the day you need the task completed, as it simply won’t happen: people have busy schedules, operate in different (and sometimes shifting) timezones, and have other deadlines to deal with. To respect your PC you ought to give them a precise calendar that you will follow, so they are able to plan ahead; but you also need to allow for the unavoidable time conflicts, last-minute no-shows, and other unpredictable events.

One last item before you break off. To set up the submissions webpage you’ll need to decide on a reviewing management software. I (quite mistakenly) didn’t give much thought to this. As PC member I had had a decent experience with easychair, and was under the impression that it was the most commonly used software – and would therefore be easiest to work with for the PC. Even though things went, on the whole, fairly smoothly, I had more than one occasion to regret the decision. The topic would deserve a blog post in itself, and I won’t expand here. Just make sure you carefully consider how easy the software will make different parts of the reviewing process, such as computing statistics, tracking missing reviews, ordering papers based on various criteria, allowing an efficient tagging system to keep track of memos or tentative decisions, handling communication with authors (including possibly the compilation of proceedings), etc.

Alright, so you went for a stroll and enjoyed your most leisurely conference submission deadline ever – as PC chair, you’re probably not allowed to submit – but the bell has rung, the submission sever closed…now it’s your turn!

**The last few hours.** Actually, maybe this wasn’t quite your most leisurely submission deadline after all. I was advised to elect a “midnight anywhere on earth” deadline, as this supposedly made the guideline easier to comprehend for everyone. Not only do I now have strong evidence that I am not the only one to find this denomination absurdly confusing – where on earth is this place anyways, anywhere on earth?? – I would in any case strongly suggest setting a deadline that falls at a reasonable time in the PC chair’s timezone. You *will* get email from people unable to access the submission server (for whatever reason), unsure whether their submission fits the guidelines, asking whether they can get an extension, etc. It is more helpful if you can deal with such email as they arrive, rather than the next day.

**Paper bidding.** Before reviewing can get under way you need to assign papers to PC members. And before you can assign papers, PC members need to express their preferences. The resulting allocation is critical. It determines how many headaches you will face later on: how many papers will have low confidence reviews, how many closely related papers will have been reviewed by disjoint sets of PC members, how many papers will live on the enthusiastic scores of expert *sub*reviewers. I found this phase challenging. An automatic assignment can be completed in milliseconds, but doesn’t take into account related submissions or expertise of PC members aside from their declared preference, which is a single noisy bit of information. I highly recommend (I realize I am “highly recommending” a lot of things for a first-timer – I only wish I had been told some of these ahead of time!) taking the process very seriously, and spending enough time to review, and tweak, the automatic assignment before it is made final.

**Refereeing.** Each PC member now has a healthy batch of papers assigned, and a deadline by which to submit reviews. What kind of guidelines can you give to make the process as smooth as possible? Discrepancies in scores are always an issue: whichever reviewing software you use, it is bound to produce some kind of score-based ranking; this initial ranking, although it will change during the discussion phase, induces a huge bias in final decisions (this effect is exacerbated for conferences, such as QCRYPT, where there is no in-person meeting). I don’t have a magic solution to this, but establishing clear guidelines in terms of the significance and expected proportion for each numerical score helps. I eventually found it necessary to prod outliers to modify their scores. This is one of the things easychair did not make particularly easy, forcing me to download data in Excel format and run some basic home-made scripts on the spreadsheet.

Aside from scoring, it is useful to include precise guidelines on the use of sub-referees and conflicts of interest (COIs). I allowed sub-refereeing but insisted that the final opinion should be the PC member’s. (It is not ok to copy-paste a sub-review while barely having gone through it!) Unfortunately sub-reviewers tend to be experts, and experts tend to be overly enthusiastic: watch out for papers that received three high scores, each of which with high confidence: easychair will rank those right at the top, but they may well be worth a second look.

Regarding COIs, I did not set overly strict rules (with the idea that “everyone knows when it is appropriate to declare a COI”), and regretted it. It is simply too uncomfortable to realize at a late stage that this very enthusiastic review was written by a PC member who happens to be a close collaborator of one of the authors, but chose not to disclose the COI. Do you discard the review? I don’t know. It depends: maybe the source of the COI played a role in the PC member’s vocal defense of the paper, and maybe not. Better not let it happen. It is not necessarily that even weak COI should forbid reviewing, but rather that COIs should be made explicit. As long as everyone states their position, things are in the open and can be taken into account.

**Discussion.** With all the reviews in (dream on… some reasonable fraction of the reviews in) begins the second phase of the reviewing process, the discussion phase. Success of this phase rests almost entirely on engagement of the PC chair and a few dedicated, dynamic PC members. Among PCs I have sat on the most satisfying were ones where the chair visibly spent large amounts of energy in the stimulation of online discussion. This is no trivial task: we all lead busy lives, and it is easy to let things slip; papers with high scores get in, low scores get out; a few days to discuss the few in the middle and we’ll be done…not so! Unfortunately, the initial ranking is bound to be *abysmal*. It is necessary to work to straighten things up. Some basic tricks apply: search for papers with high discrepancies in scores, low confidence, missing, very short, or uninformative reviews, etc. It is useful to individually prod PC members to keep the discussion going. This is a place where the “know your PC” recommendation comes in: for each submission, you need to be able to identify who will be able to clarify the arguments in favor and against the paper; who will have the technical expertise to clarify the relationship between papers X and Y, etc. It’s an exhausting, but highly rewarding process: I learned a lot by listening to my colleagues and trying to grasp at times rather subtle – and opinionated – arguments that could reach quite far from my expertise.

**Decisions!** The discussion has been going on for a couple weeks, and you already only have little time left: it is time to start making decisions. Proceeding in phases seems popular, and effective. It helps to progressively sharpen the acceptance threshold. As long as there are too many papers in play it is very hard to get a sense of where the boundary will lie; typically, far too many papers will have positive scores and enthusiastic proponents than can ultimately be accepted.

However much ahead of time you get started, the real decisions will take place in the last few days. I found it helpful to set a clear calendar for the process, marking days when decisions would be made, identifying clear categories (accept, accept?, discuss!, etc.), and setting explicit targets for each phase (accept X/reject Y many more papers, etc.), even if I wasn’t always able to meet them. It is also important that the PC as a whole be aware of the target number of papers that is to be accepted. I have frequently been on PC where the chair gave us the information that “we will accept all great papers”, only to learn later that a hard limit had (of course) been set. Conversely, I’ve also been extremely annoyed at last-minute decisions along the lines of, well, we accepted about as much we could, but there’s 4 left undecided cases, and, well, they’re all really good, so why don’t we just stretch the program a bit and accept all 4 at the last minute. To me this is the PC not doing its job… be prepared to make difficult decisions! Make it clear to the PC (and to yourself) what your goal is. Is it to serve the authors, the conference attendees, the advancement of science – all of the above (good luck)?

This was fun. Exhausting, but fun. Of course not all authors (or PC members) were happy. There will be complaints. And some of them will be justified: there is no perfect allocation. Mistakes happen. We did our best!

Some tasks lie down the road. Put a program together. Help select a best (student) paper. Gather statistics for the business meeting. But first things first: take a deep breath. This was fun.

]]>Last week Anand Natarajan from MIT presented our joint work on “A Quantum Linearity Test for Robustly Verifying Entanglement” at the STOC’17 conference in Montreal. Since we first posted our paper on the quant-ph arXiv, Anand and I discovered that the test and its analysis could be reformulated in a more general framework of tests for group relations, and rounding of approximate group representations to exact group representations. This reformulation is stimulated by a beautiful paper by Gowers and Hatami on “Inverse and stability theorems for approximate representations of finite groups”, which was first pointed to me by William Slofstra. The purpose of this post is to present the Gowers-Hatami result as a natural extension of the Blum-Luby-Rubinfeld linearity test to the non-abelian setting, with application to entanglement testing. (Of course Gowers and Hatami are well aware of this — though maybe not of the application to entanglement tests!) My hope in doing so is to make our result more accessible, and hopefully draw some of my readers from theoretical computer science into a beautiful area.

I will strive to make the post self-contained and accessible, with no quantum information background required — indeed, most of the content is purely — dare I say elegantly — mathematical. In the interest of being precise (and working out better parameters for our result than appear in our paper) I include essentially full proofs, though I may allow myself to skip a line or two in some of the calculations.

Given the post remains rather equation-heavy, here is a pdf with the same contents; it may be more convenient to read.

I am grateful to Anand, and Oded Regev and John Wright, for helpful comments on a preliminary version of this post.

**1. Linearity testing**

The Blum-Luby-Rubinfeld linearity test provides a means to certify that a function is close to a linear function. The test can be formulated as a two-player game:

**BLR linearity test:**

- (a) The referee selects uniformly at random. He sends the pair to one player, and either , , or (chosen uniformly at random) to the other.
- (b) The first player replies with two bits, and the second player with a single bit. The referee accepts if and only if the player’s answers satisfy the natural consistency constraint.

This test, as all others considered here, treats both players symmetrically. This allows us to restrict our attention to the case of players who both apply the same strategy, an assumption I will systematically make from now on.

Blum et al.’s result states that any strategy for the players in the linearity test must provide answers chosen according to a function that is close to linear. In this section I will provide a slight “matrix-valued” extension of the BLR result, that follows almost directly from the usual Fourier-analytic proof but will help clarify the extension to the non-abelian case.

**1.1. Matrix-valued strategies**

The “classical” analysis of the BLR test starts by modeling an arbitrary strategy for the players as a pair of functions (for the second player, who receives a single string as query) and (for the first player, who receives a pair of strings as query). In doing so we are making an assumption: that the players are deterministic. More generally, we should allow “probabilistic strategies”, which can be modeled via “probabilistic functions” and respectively, where is an arbitrary probability space which plays the role of shared randomness between the players. Note that the usual claim that “probabilistic strategies are irrelevant because they can succeed no better than deterministic strategies” is somewhat moot here: the point is not to investigate success probabilities — it is easy to pass the BLR test with probability — but rather derive structural consequences from the assumption that a certain strategy passes the test. In this respect, enlarging the kinds of strategies we consider valid can shed new light on the strengths, and weaknesses, of the test.

Thus, and with an eye towards the “quantum” analysis to come, let us consider an even broader set of strategies, which I’ll refer to as “matrix-valued” strategies. A natural matrix-valued analogue of a function is , where is the set of Hermitian matrices that square to identity (equivalently, have all eigenvalues in ); these matrices are called “observables” in quantum mechanics. Similarly, we may generalize a function to a function . Here we’ll impose an additional requirement: any pair in the range of should be such that and commute. The latter condition is important so that we can make sense of the function as a strategy for the provers: we should be able to ascribe a probability distribution on outcomes to any query sent to the players. This is achieved by defining

where for any observable we denote and the projections on the and eigenspaces of , respectively (so and ). The condition that and commute ensures that this expression is always non-negative; moreover it is easy to check that for all it specifies a well-defined probability distribution on . Observe also that in case we recover the classical deterministic case, for which with our notation . If all and are simultaneously diagonal matrices we recover the probabilistic case, with the role of (the shared randomness) played by the rows of the matrices (hence the normalization of ; we will see later how to incorporate the use of non-uniform weights).

With these notions in place we establish the following simple lemma, which states the only consequence of the BLR test we will need.

Lemma 1Let be an integer, , and and a matrix strategy for the BLR test, such that players determining their answers according to this strategy (specifically, according to (1)) succeed in the test with probability at least . Then

Introducing a normalized inner product on the space of matrices with complex entries (the designates the conjugate-transpose), the conclusion of the lemma is that .

*Proof:* Success with probability in the test implies the three conditions

To conclude, use the triangle inequality as

where denotes the dimension-normalized Frobenius norm. Expanding each squared norm and using the preceding conditions and for all proves the lemma.

**1.2. The BLR theorem for matrix-valued strategies**

Before stating a BLR theorem for matrix-valued strategies we need to define what it means for such a function to be *linear*. Consider first the case of probabilistic functions, i.e. such that all are diagonal, in the same basis. Any such whose every diagonal entry is of the form for some *which may depend on the row/column number* will pass the BLR test. This shows that we cannot hope to force to be a single linear function, we must allow “mixtures”. Formally, call linear if for some decomposition of the identity, i.e. the are pairwsie orthogonal projections such that . Note that this indeed captures the probabilistic case; in fact, up to a basis change it is essentially equivalent to it. Thus the following may come as a surprise.

Note the role of here, and the lack of control on (more on both aspects later). Even if is a deterministic function , i.e. , the function returned by the theorem may be matrix-valued. In this case the isometry is simply a unit vector , and expanding out the squared norm in the conclusion of the theorem yields the equivalent conclusion

where we expanded using our definition of a linear matrix-valued function. Note that defines a probability distribution on . Thus by an averaging argument there must exist an such that for a fraction at least of all : the usual conclusion of the BLR theorem is recovered.

*Proof:* The proof of the theorem follows the classic Fourier-analytic proof of Bellare et al. Our first step is to define the isometry . For a vector , define

where is the matrix-valued Fourier coefficient of at and an arbitrary orthonormal basis of . An easily verified extension of Parseval’s formula shows (recall for all ), so that : is indeed an isometry.

Next, define the linear probabilistic function by , where forms a partition of identity. We can evaluate

where the last equality follows by expanding the Fourier coefficients and noticing the appropriate cancellation. Together with (2), this proves the theorem.

At the risk of sounding yet more pedantic, it might be useful to comment on the relation between this proof and the usual argument. The main observation in Bellare et al.’s proof is that approximate linearity, expressed by (2), implies a lower bound on the sum of the *cubes* of the Fourier coefficients of . Together with Parseval’s formula, this bound implies the existence of a large Fourier coefficient, which identifies a close-by linear function.

The proof I gave decouples the argument. Its first step, the construction of the isometry depends on , but does not use anything regarding approximate linearity. It only uses Parseval’s formula to argue that the isometry is well-defined. A noteworthy feature of this step is that the function on the extended space is always well-defined as well: given a function , it is always possible to consider the linear matrix-valued function which “samples according to ” and then returns . The second step of the proof evaluates the correlation of with the “pull-back” of , and observes that this correlation is precisely our measure of “approximate linearity” of , concluding the proof without having had to explicitly notice that there existed a large Fourier coefficient.

**1.3. The group-theoretic perspective**

Let’s re-interpret the proof we just gave using group-theoretic language. A linear function is, by definition, a mapping which respects the additive group structure on , namely it is a representation. Since is an abelian group, it has irreducible -dimensional representations, given by the characters . As such, the linear function defined in the proof of Theorem 2 is nothing but a list of all irreducible representations of .

The condition (2) derived in the proof of the theorem can be interpreted as the condition that is an “approximate representation” of . Let’s make this a general definition. For -dimensional matrices and such that is positive semidefinite, write

where we use to denote the conjugate-transpose. The following definition considers arbitrary finite groups (not necessarily abelian).

Definition 3Given a finite group , an integer , , and a -dimensional positive semidefinite matrix with trace , an -representation of is a function , the unitary group of matrices, such that

where the expectation is taken under the uniform distribution over .

The condition (3) in the definition is very closely related to Gowers’s norm

While a large Gowers norm implies closeness to an affine function, we are interested in testing linear functions, and the condition (3) will arise naturally from our calculations in the next section.

If , the product should be written additively as , so that the condition (2) is precisely that is an -representation of , where . Theorem 2 can thus be reformulated as stating that for any -approximate representation of the abelian group there exists an isometry and an exact representation of on such that is well-approximated by the “pull-back” of to . In the next section I will make the words in quotes precise and generalize the result to the case of arbitrary finite groups.

**2. Approximate representations of non-abelian groups**

**2.1. The Gowers-Hatami theorem**

In their paper Gowers and Hatami consider the problem of “rounding” approximate group representations to exact representations. I highly recommend the paper, which gives a thorough introduction to the topic, including multiple motivations. Here I will state and prove a slightly more general, but quantitatively weaker, variant of their result inspired by the somewhat convoluted analysis of the BLR test given in the previous section.

Theorem 4 (Gowers-Hatami)Let be a finite group, , and an -representation of . Then there exists a , an isometry , and a representation such that

Gowers and Hatami limit themselves to the case of , which corresponds to the dimension-normalized Frobenius norm. In this scenario they in addition obtain a tight control of the dimension , and show that one can always take in the theorem. I will give a much shorter proof than theirs (the proof is implicit in their argument) that does not seem to allow to recover this estimate. (It is possible to adapt their proof to keep a control of even in the case of general , but I will not explain this here.) Essentially the same proof as the one sketched below has been extended to some classes of infinite groups by De Chiffre, Ozawa and Thom in a recent preprint.

Note that, contrary to the BLR theorem, where the “embedding” is not strictly necessary (if is small enough we can identify a single close-by linear function), as noted by Gowers and Hatami Theorem 4 does not in general hold with . The reason is that it is possible for to have an approximate representation in some dimension , but no exact representation of the same dimension: to obtain an example of this, take any group that has all non-trivial irreducible representations of large enough dimension, and create an approximate representation in e.g. dimension one less by “cutting off” one row and column from an exact representation. The dimension normalization induced by the norm will barely notice this, but it will be impossible to “round” the approximate representation obtained to an exact one without modifying the dimension.

The necessity for the embedding helps distinguish the Gowers-Hatami result from other extensions of the linearity test to the non-abelian setting, such as the work by Ben-Or et al. on non-Abelian homomorphism testing (I thank Oded Regev for pointing me to the paper). In that paper the authors show that a function , where and are finite non-abelian groups, which satisfies , is -close to a homomorphism . The main difference with the setting for the Gowers-Hatami result is that since is finite, Ben-Or et al. use the Kronecker function as distance on . This allows them to employ combinatorial arguments, and provide a rounding procedure that does not need to modify the range space (). In contrast, here the unitary group is infinite.

The main ingredient needed to extend the analysis of the BLR test is an appropriate notion of Fourier transform over non-abelian groups. Given an irreducible representation , define

In case is abelian, we always have , the tensor product is a product, and (4) reduces to the usual definition of Fourier coefficient. The only properties we will need of irreducible representations is that they satisfy the relation

for any . Note that plugging in (the identity element in ) yields .

*Proof:* } As in the proof of Theorem 2 our first step is to define an isometry by

where the direct sum ranges over all irreducible representations of and is the canonical basis. Note what does: it “embeds” any vector into a direct sum, over irreducible representations , of a -dimensional vector of matrices. Each (matrix) entry of this vector can be thought of as the Fourier coefficient of the corresponding entry of the vector associated with . If and ranges over this recovers the isometry defined in the proof of Theorem 2. And indeed, the fact that is an isometry again follows from the appropriate extension of Parseval’s formula:

where for the second line we used the definition (4) of and for the third we used (5) and the fact that takes values in the unitary group.

Following the same steps as in the proof of Theorem 2, we next define

a direct sum over all irreducible representations of (hence itself a representation). Lets’ first compute the “pull-back” of by : following a similar calculation as above, for any ,

where the last equality uses (5). It then follows that

This relates correlation of with to the quality of as an approximate representation and proves the theorem.

**2.2. Application: the Weyl-Heisenberg group**

In quantum information we care a lot about the Pauli group. For our purposes it will be be sufficient (and much more convenient, allowing us to avoid some trouble with complex conjugation) to consider the Weyl-Heisenberg group , or “Pauli group modulo complex conjugation”, which is the -element group whose multiplication table matches that of the matrices

and . This group has four -dimensional representations, uniquely specified by the image of and in , and a single irreducible -dimensional representation, given by the matrices defined above. We can also consider the “-qubit Weyl-Heisenberg group” , the matrix group generated by -fold tensor products of the matrices identified above. The irreducible representations of are easily computed from those of ; for us the only thing that matters is that the only irreducible representation which satisfies has dimension and is given by the defining matrix representation (in fact, it is the only irreducible representation in dimension larger than ).

With the upcoming application to entanglement testing in mind, I will state a version of Theorem 4 tailored to the group and a specific choice of presentation for the group relations. Towards this we first need to recall the notion of *Schmidt decomposition* of a bipartite state (i.e. unit vector) . The Schmidt decomposition states that any such vector can be written as

for some orthonomal bases and of (the “Schmidt vectors”) and non-negative coefficients (the “Schmidt coefficients”). The decomposition can be obtained by “reshaping” into a matrix and performing the singular value decomposition. To we associate the (uniquely defined) positive semidefinite matrix

note that has trace . The matrix is called the *reduced density* of (on the first system).

Corollary 5Let be integer, , a unit vector, the positive semidefinite matrix associated to as in (8), and . For let , , and assume for all (we call such operators, unitaries with eigenvalues in , observables). Suppose that the following inequalities hold: consistency

Then there exists a , an isometry , and a representation such that and

Note that the conditions (10) and (11) in the corollary are very similar to the conditions required of an approximate representation of the group ; in fact it is easy to convince oneself that their exact analogue suffice to imply all the group relations. The reason for including only those relations is that they are the ones that it will be possible to test; see the next section for this. Condition (9) is necessary to derive the conditions of Theorem 4 from (10) and (11), and is also testable; see the proof.

*Proof:* To apply Theorem 4 we need to construct an -representation of the group . Using that any element of has a unique representative of the form for , we define . Next we need to verify (3). Let be such that and for -bit strings and respectively. Up to phase, we can exploit successive cancellations to decompose as

(It is worth staring at this sequence of equations for a little bit. In particular, note the “player-switching” that takes place in the 2nd, 4th and 6th lines; this is used as a means to “commute” the appropriate unitaries, and is the reason for including (9) among the assumptions of the corollary.) Evaluating each term on the vector , taking the squared Euclidean norm, and then the expectation over uniformly random , the inequality and the assumptions of the theorem let us bound the overlap of each term in the resulting summation by . Using by definition, we obtain the bound

We are thus in a position to apply Theorem 4, which gives an isometry and exact representation such that

Using that is a representation, . It follows from (12) that , so we may restrict the range of to the subspace where without introducing much additional error.

**3. Entanglement testing**

Our discussion so far has barely touched upon the notion of entanglement. Recall the Schmidt decopmosition (7) of a unit vector , and the associated reduced density matrix defined in (8). The state is called *entangled* if this matrix has rank larger than ; equivalently, if there is more than one non-zero coefficient in (7). The *Schmidt rank* of is the rank of , the number of non-zero terms in (7). It is a crude, but convenient, measure of entanglement; in particular it provides a lower bound on the local dimension . A useful observation is that the Schmidt rank is invariant under local unitary operations: these may affect the Schmidt vectors and , but not the number of non-zero terms.

**3.1. A certificate for high-dimensional entanglement**

Among all entangled states in dimension , the *maximally entangled state* is the one which maximizes entanglement entropy, defined as the Shannon entropy of the distribution induced by the squares of the Schmidt coefficients:

with entropy . The following lemma gives a “robust” characterization of the maximally entangled state in dimension as the unique common eigenvalue- eigenvector of all operators of the form , where ranges over the elements of the unique -dimensional irreducible representation of the Weyl-Heisenberg group , i.e. the Pauli matrices (taken modulo ).

*Proof:* Consider the case . The “swap” matrix

squares to identity and has a unique eigenvalue- eigenvector, the vector (a.k.a. “EPR pair”). Thus implies . The same argument for general shows . Any unit vector of Schmidt rank at most satisfies , concluding the proof.

Lemma 6 provides an “experimental road-map” for establishing that a bipartite system is in a highly entangled state:

- (i) Select a random ;
- (ii) Measure both halves of using ;
- (iii) Check that the outcomes agree.

To explain the connection between the above “operational test” and the lemma I should review what a measurement in quantum mechanics is. For our purposes it is enough to talk about binary measurements (i.e. measurements with two outcomes, and ). Any such measurement is specified by a pair of orthogonal projections, and , on such that . The probability of obtaining outcome when measuring is . We can represent a binary measurement succinctly through the *observable* . (In general, an observable is a Hermitian matrix which squares to identity.) It is then the case that if an observable is applied on the first half of a state , and another observable is applied on the second half, then the probability of agreement, minus the probability of disagreement, between the outcomes obtained is precisely , a number which lies in . Thus the condition that the test described above accepts with probability when performed on a state is precisely equivalent to the assumption (13) of Lemma 6.

Even though this provides a perfectly fine test for entanglement in principle, practitioners in the foundations of quantum mechanics know all too well that their opponents — e.g. “quantum-skeptics” — will not be satisfied with such an experiment. In particular, who is to guarantee that the measurement performed in step (ii) is really , as claimed? To the least, doesn’t this already implicitly assume that the measured system has dimension ?

This is where the notion of *device independence* comes in. Briefly, in this context the idea is to obtain the same conclusion (a certificate of high-dimensional entanglement) *without* any assumption on the measurement performed: the only information to be trusted is classical data (statistics generated by the experiment), but not the operational details of the experiment itself.

This is where Corollary 5 enters the picture. Reformulated in the present context, the corollary provides a means to *verify* that arbitrary measurements “all but behave” as Pauli measurements, provided they generate the right statistics. To explain how this can be done we need to provide additional “operational tests” that can be used to certify the assumptions of the corollary.

**3.2. Testing the Weyl-Heisenberg group relations**

Corollary 5 makes three assumptions about the observables and : that they satisfy approximate consistency (9), linearity (10), and anti-commutation (11). In this section I will describe two (somewhat well-known) tests that allow to certify these relations based only on the fact that the measurements generate statistics which pass the tests.

**Linearity test:**

- (a) The referee selects and uniformly at random. He sends to one player and , , or to the other.
- (b) The first player replies with two bits, and the second with a single bit. The referee accepts if and only if the player’s answers are consistent.

As always in this note, the test treats both players simultaneously. As a result we can (and will) assume that the player’s strategy is symmetric, and is specified by a permutation-invariant state and a measurement for each question: an observable associated to questions of the form , and a more complicated four-outcome measurement associated with questions of the form (It will not be necessary to go into the details of the formalism for such measurements).

The linearity test described above is exactly identical to the BLR linearity test described earlier, but for the use of the basis label . The lemma below is a direct analogue of Lemma 1, which extends the analysis to the setting of players sharing entanglement. The lemma was first introduced in a joint paper with Ito, where we used an extension of the linearity test, Babai et al.’s multilinearity test, to show the inclusion of complexity classes NEXPMIP.

Lemma 7Suppose that a family of observables for and , generates outcomes that succeed in the linearity test with probability , when applied on a bipartite state . Then the following hold: approximate consistency

and linearity

Testing anti-commutation is slightly more involved. We will achieve this by using a two-player game called the Magic Square game. This is a fascinating game, but just as for the linearity test I will treat it superficially and only recall the part of the analysis that is useful for us (see e.g. the paper by Wu et al. for a description of the game and a proof of Lemma 8 below).

Lemma 8 (Magic Square)The Magic Square game is a two-player game with nine possible questions (with binary answers) for one player and six possible questions (with two-bit answers) for the other player which has the following properties. The distribution on questions in the game is uniform. Two of the questions to the first player are labelled and respectively. For any strategy for the players that succeeds in the game with probability at least using a bipartite state and observables and for questions and respectively, it holds that

Moreover, there exists a strategy which succeeds with probability in the game, using and Pauli observables and for questions and respectively.

Based on the Magic Square game we devise the following “anti-commutation test”.

**Anti-commutation test:**

- (a) The referee selects uniformly at random under the condition that . He plays the Magic Square game with both players, with the following modifications: if the question to the first player is or he sends or instead; in all other cases he sends the original label of the question in the Magic Square game together with both strings and .
- (b) Each player provides answers as in the Magic Square game. The referee accepts if and only if the player’s answers would have been accepted in the game.

Using Lemma 8 it is straightforward to show the following.

Lemma 9Suppose a strategy for the players succeeds in the anti-commutation test with probability at least , when performed on a bipartite state . Then the observables and applied by the player upon receipt of questions and respectively satisfy

**3.3. A robust test for high-dimensional entangled states**

We are ready to state, and prove, our main theorem: a test for high-dimensional entanglement that is “robust”, meaning that success probabilities that are a constant close to the optimal value suffice to certify that the underlying state is within a constant distance from the target state — in this case, a tensor product of EPR pairs. Although arguably a direct “quantization” of the BLR result, this is the first test known which achieves constant robustness — all previous -qubit tests required success that is inverse polynomially (in ) close to the optimum in order to provide any meaningful conclusion.

**-qubit Pauli braiding test:** With probability each,

- (a) Execute the linearity test.
- (b) Execute the anti-commutation test.

Theorem 10Suppose that a family of observables , for and , and a state , generate outcomes that pass the -qubit Pauli braiding test with probability at least . Then .

As should be apparent from the proof it is possible to state a stronger conclusion for the theorem, which includes a characterization of the observables and the state up to local isometries. For simplicity I only recorded the consequence on the dimension of .

*Proof:* Using Lemma 7 and Lemma 9, success with probability in the test implies that conditions (9), (10) and (11) in Corollary 5 are all satisfied, up to error . (In fact, Lemma 9 only implies (11) for strings such that . The condition for string such that follows from the other conditions.) The conclusion of the corollary is that there exists an isometry such that the observables and satisfy

Using again the consistency relations (9) that follow from part (a) of the test together with the above we get

Applying Lemma 6, has Schmidt rank at least . But is a local isometry, which cannot increase the Schmidt rank.

]]>Before jumping to unitary correlation matrices, let’s — rather pedantically — introduce vector correlation matrices. Most of you are already familiar with this simple object: a vector correlation matrix is an Hermitian matrix with complex entries such that there exists an integer and unit vectors such that for all . In other words: a Gram matrix with diagonal entries equal to .

A natural question is, given a vector correlation matrix , what is the minimal dimension in which there exists vectors achieving the specified correlations? Clearly , the dimension of the span of the vectors; moreover the identity matrix implies that is sometimes necessary.

If we allow -approximations, we can do better: the Johnson-Lindenstrauss lemma implies that is sufficient (and necessary) to find unit vectors such that for each . And if we only require the approximation to hold on the average over the choice of and , then no dependence on is necessary: suffices.

This is all good and well. Now onto the interesting stuff!

Define a unitary correlation matrix to be an an Hermitian matrix with complex entries such that there exists an integer and unitary matrices such that for all . Considering block matrices shows that the set of unitary correlation matrices is convex.

By forgetting the unitary structure of the we see that a unitary correlation matrix is automatically a vector correlation matrix; in particular it is positive semidefinite with all diagonal entries equal to . While the latter is a characterization of vector correlation matrices, however, as soon as (and not before) there exists vector correlation matrices that are not unitary correlation matrices. This is not completely trivial to see, and appears in a paper by Dykema and Juschenko; it is a nice exercise to work out. Now for the main question:

(): Dimension reduction for unitaries. Let and be given. Does there exist an explicit such that for every unitary correlation matrix there are -dimensional unitaries such that

While the analogue question for vectors is trivial for , and a fundamental result in geometry for , extremely little is known on the question for unitaries. Virtually the only general statement that can be made is that, at least, some bound exists. This follows by a simple compactness argument, but does not yield any meaningful bound on the growth of as a function of and . In fact no explicit bound, however large, is known to hold in general. Let’s explore the problem a bit.

A nice feature of question () is that it is reasonably robust, in the sense that different natural formulations of the question can be shown equivalent, up to simple variations on the precise scaling of . For example, one can relax the constraint of being unitary to the sole requirement that the matrices have all singular values at most . At the opposite end of the spectrum one can consider a more structured problem which considers correlations between projection matrices (so all eigenvalues are or ). Both these variants can be shown equivalent to the unitary case via some simple reductions.

The one variant which makes a substantial difference is the case of correlation matrices with real entries. A beautiful result of Tsirelson shows that any extremal real correlation matrix can be realized exactly, by Hermitian matrices having all eigenvalues , in dimension , and this bound is tight; relatively precise bounds of the form are known for small enough . (Note that even though projection matrices are Hermitian, and thus give rise to real correlations, Tsirelson’s result does not imply a positive answer for the case of projections as the dimension- matrices recovered via Tsirelson’s construction will in general be Hermitian, but not projectors, even when the original matrices were.)

**Quantum games. **One can arrive at question () by asking about the minimal dimension of near-optimal strategies in a quantum two-player game. Experts will immediately see the connection, and I will not elaborate on this. Roughly, the easy observation is that correlations that are achievable by entangled players in a nonlocal game take the form

where is a unit vector in (the entanglement), is a complex matrix that can be computed from , and “observables”, i.e. Hermitian operators that square to identity describing the players’ measurement operators. (A more general formulation considers projections, rather than observables.) In case is the so-called “maximally entangled state”, and we recover precisely an entry from a correlation matrix. (The case of a general state gives rise to a slight variant of question , to which I am not sure whether it is equivalent or not.)

Arriving at the question from this “physical” angle, it seems like it “ought” to have a reasonable answer: certainly, if one fixes the size of the game, and an approximation error , then there must exist some dimension that suffices to implement an -optimal strategy. No such result is known. If anything existing signs seem to point in the negative direction: for instance, Slofstra very recently showed that there exists a fixed, constant-sized game such that the optimal winning probability of can only be achieved in the limit of infinite dimension (but it does seem to be the case that, for this game, -optimal strategies can be found in dimension ). Note that this result implies that the set of correlation matrices of projections is not closed.

**Connes’ conjecture.** A different, though related, way to arrive at question () is via the famous “Connes embedding conjecture” in the theory of algebras. Connes’ embedding conjecture states, rather informally, that any separable factor (i.e. a von Neumann algebra with trivial center that is infinite-dimensional as a vector space, but has a finite faithful trace) embeds into a suitable ultrapower of the hyperfinite factor . Kirchberg showed that the conjecture is equivalent to the following statement.

Theorem. The validity of Connes’ conjecture for some factor is equivalent to the following: For all , and unitaries there is a and unitaries , such that

where is the trace on .

This formulation is close to question (), except for two important differences: first, we assume that the target correlations are achievable in finite dimension . This makes the problem easier, and would make it trivial if we were not to introduce a second important difference, which is that we ask for explicit bounds on . As a result I do not know of any formal implication between () and Connes’ conjecture, in either direction.

**Graph limits. **Finally, for the combinatorialist let me mention an analogous (though, as far I can tell, not directly related) question, formulated by Aldous and Lyons in the context of their study of limits of bounded-degree graphs. The distance between two finite graphs of the same constant degree (but not necessarily the same number of vertices) can be measured via the sampling distance : , where denotes the total variation distance between the distributions on rooted -neighborhoods obtained by sampling a random vertex from (resp. ) and considering the sub-graph induced on all vertices at distance at most from the sampled vertex. With this notion in place, Question 10.1 in Aldous and Lyons’ paper on unimodular random networks asks the following:

(Aldous-Lyons:) For every there is an integer such that for every (finite) graph there is a graph on vertices such that .

In page 1458 the authors mention that the validity of their conjecture for the special class of Cayley graphs would imply that all finitely generated groups are sofic (very roughly, can be embedded into finite-dimensional permutation groups). Even though we do not know of an example of a group that is not sofic, this would be a very surprising result. In particular, it would imply Connes’s Embedding Conjecture for group von Neumann algebras, since the latter is known to hold for sofic groups.

Unfortunately this is going to be one of the shortest, most boring developments in musical history: there is too little to say! I could describe multiple failed attempts. In particular, naïve attempts at dimension reduction, inspired by Johnson-Lindenstrauss or other standard techniques, or incremental “gradient-descent” type of progressive block diagonalization procedures, all seem doomed to fail.

Aside from Tsirelson’s result for real correlation matrices, the one case for which we were able to find a cute proof is the case of permutation correlation matrices, where each is assumed to be a permutation matrix. The fact that permutations are sparse seems to make it easier to operate on them by “shifting entries around”; unitaries have a more rigid structure. The proof uses a simple combinatorial argument, with the heaviest hammer being Hall’s theorem guaranteeing the existence of a perfect matching, which is used to simultaneously re-organize the “” entries in a subset of the permutation matrices while preserving all correlations. The upper bound on we obtain is of order , which may be the right order.

More is known in terms of negative results, i.e. lower bounds on . Such bounds abound in the theory of nonlocal games, where they go by the name of “dimension witness”. The best known results I am aware of imply that should grow at least like , which is good for very small , and also , which holds for smaller than a universal constant (the two bounds are obtained from different families of correlations; see here for the former and here for the latter). An interesting consequence of the (proof of) the second bound, which appears in joint work with Natarajan, is that even an -approximation on average (over the entries of C) requires large dimension. This implies that no “oblivious” rounding technique, as in the Johnson-Lindenstrauss lemma, will work: such a technique would guarantee small approximation error on average independently of .

There has been a lot of progress recently on lower bounds, stimulated by works on quantum non-local games. This includes a beautiful framework of games for checking “non-commutative” analogues of linear equations over , developed by Cleve and Mittal and Ji; extensions of the framework to testing finitely presented groups by Slofstra; a development of approaches based on operator systems by Paulsen and co-authors, and many others. But no upper bounds! Get to work: things can’t remain this way.

]]>When we wrote the survey three summers ago, the latest word on the CSP-qPCP (see Conjecture 1.3 here for a precise formulation) had been given in a paper by Brandao and Harrow. BH showed, using information-theoretic arguments, that the constraint graphs associated with constant-gap QMA-hard instances of the local Hamiltonian problem had to satisfy “non-expansion” requirements seemingly at odds with the expansion properties of graphs associated with what are often considered the hardest instances of classical CSPs. Intuitively, their argument uses the monogamy of quantum correlations to argue that highly expanding constraint graphs place such strong demands on entanglement that there is always a product state whose energy is not far from the minimum. Although not strictly a no-go result, their theorem indicates that QMA-hard instances must be based on constraint graphs with markedly different spectral properties than those associated with the hardest instances of classical CSP.

For the time being it seems like any proof, or disproof, of the conjecture remains out of reach. Instead of focusing directly on qPCP, it may be more fruitful to develop the objects that are expected to play an important role in the proof, such as (quantum) low-density parity check codes (qLDPC) and (quantum) locally testable codes (qLTC). Two recent works make progress on this front.

The no low-energy trivial states (NLTS) conjecture was proposed by Freedman and Hastings as a “complexity-free” analogue of CSP-qPCP. The NLTS conjecture states that there exist local Hamiltonians such that all low-energy (within an additive constant, times the norm of the Hamiltonian, from the minimum) states are “non-trivial”, in the sense that they cannot be generated by a constant-depth quantum circuit applied on a product state. Equivalently, all states that are the output of a constant-depth quantum circuit must have energy a constant above the minimum. NLTS Hamiltonian are good candidates for qPCP as they provide local Hamiltonian for which many obvious classical certificates for the minimal energy of the Hamiltonian (such as the description of a small circuit which generates a low-energy state) are essentially ruled out.

An earlier version of the Eldar-Harrow manuscript claimed a construction of NLTS Hamiltonian, but the paper was recently updated, and the claim retracted. The current manuscript establishes a moderately weaker (though strictly incomparable) result, that the authors call NLETS, for “no low-*error* trivial states”. The main result of EH is a relatively simple, explicit construction of a family of local Hamiltonians that have no non-trivial “ground state -impostor”. An -impostor is a state that has the same reduced density matrix as a ground state on a fraction of the qubits, but may differ arbitrarily on the remaining fraction. Using that the Hamiltonian is local, impostors necessarily have low energy, but the converse is not true, so that NLETS rules out non-triviality for a more restricted class of states than NLTS. For that restricted class of states, however, the non-triviality established by EH is sronger than required by NLTS: they show that no -impostor can even be well-approximated (within inverse-polynomial trace distance) by logarithmic-depth, instead of just constant-depth, quantum circuits.

Let’s see if I can give some basic intuition on their construction; for anything substantial see the paper, which gives many angles on the result. Consider first first a classical repetition code encoding bit into bits. This can be made into a locally testable code by enforcing pairwise equality of bits along the edges of a constant-degree expanding graph on vertex set . Now allow me a little leap of faith: imagine there existed a magic quantum analogue of this classical repetition code, where equality between pairs of qubits is enforced not only in the (computational) basis, but also in the (Hadamard) basis. Of course such a thing does not exist: the constraints would force *any* pair of qubits (linked by the expander) to form an EPR pair, a requirement that strongly violates monogamy. But let’s *imagine*. Then I claim that we would essentially be done. Why? We need two more observations.

The first key observation made by EH is that any ground state of this imaginary code would have the following property: if you measure all qubits of the state in the same basis, either or , then for at least one of the two possible choices the measurement outcomes will be distributed according to a distribution on -bit strings that places a large (constant) weight on at least two well-isolated (separated by at least the minimum distance) subsets of the Hamming cube. Note that this does not hold of the classical repetition code: the distribution which all- codeword is, well, concentrated. But if we were to measure the associated quantum state in the Hadamard basis, we would get a very spread distribution, with constant mass on two sets that are at distance apart (I realize the equation I wrote is not quite correct! Don’t think too hard about it; obviously my “magical quantum repetition code” does not exist). The reason the distribution obtained in at least one of the two bases must be spread out is due to the uncertainty principle: if the distribution is localized in the basis it must be delocalized in the basis, and vice-versa. And the reason it should be concentrated on isolated clumps is that we are measuring a codeword, which, for our magic example, can only lead to outcomes that are supported on the set .

To conclude we need the second observation, which is that trivial states do *not* have this property: measuring a trivial state in any product basis will always lead to a highly expanding distribution, which in particular cannot have large mass on well-isolated subsets. This is obviously true for product states, and requires a bit of work to be carried through logarithmically many layers of a quantum circuit; indeed this is where the main technical work of the paper lies.

So the argument is complete…except for the fact that the required magic quantum repetition code does not exist! Instead, HE find a good make-do by employing a beautiful construction of quantum LDPC codes due to Tillich and Zemor, the “hypergraph product”. The hypergraph product takes as input any pair of classical linear codes and returns a quantum “product” CSS code whose locality, distance and rate properties can be related to those of the original codes. The toric code can be case as an example of a hypergraph product code; see Section 3 in the paper for explanations. Unfortunately, the way the distance of the product code scales with other parameters prevents TZ from obtaining good enough qLDPC for the CSP-QPCP; they can “only” obtain codes with constant weight and constant rate, but distance .

In the context of NL(E)TS, and even more so qPCP, however, distance may not be the most relevant parameter. EH’s main construction is obtained as the hypergraph product of two expander-based repetition codes, which as a code only has logarithmic distance; nevertheless they are able to show that the robustness derived from the repetition code, together with the logarithmic distance, are enough to separate -impostors from logarithmic-depth trivial states.

Quantum low-density parity-check codes (qLDPC) already made a showing in the previous sections. These families of codes are of much broader interest than their possible role in a forthcoming proof of qPCP, and constructions are being actively pursued. For classical codes the situation is largely satisfactory, and there are constructions that simultaneously achieve constant rate and linear distance with constant-weight parity checks. For quantum codes less is known. If we insist on constant-weight stabilizers then the best distance is (e.g. Freedman et al.), a notch above the TZ construction mentioned earlier. The most local construction that achieves linear distance requires stabilizers of weight (e.g. Bravyi and Hastings).

A recent paper by Hastings makes progress on constructions of qLDPC – assuming a geometrical conjecture on the volume of certain surfaces defined from lattices in . Assuming the conjecture, Hastings shows the existence of qLDPC with distance and logarithmic-weight stabilizers, a marked improvement over the state of the art. Although as discussed earlier even linear-distance, constant-weight, qLDPC would not imply the CSP-qPCP nor NLTS (the resulting Hamiltonian may still have low-energy eigenstates that are not at a small distance from codewords), by analogy with the classical case (and basic intuition!), constructions of such objects should certainly facilitate any attempt at a proof of the conjectures. Moreover, qLDPC suffice for the weaker NLETS introduced by EH, as the latter only makes a statement about -impostors, i.e. states that are at a constant distance from codewords. To obtain the stronger implication to NLTS, the proper notion is that of local testability: errors should be detected by a fraction of parity checks proportional to the distance of the error from the closest codeword (and not just *some* parity check).

Hastings’ construction follows the topological approach to quantum error correcting codes pioneered by Freedman and Kitaev. Although the latter introduced codes whose properties depend on the surface they are embedded in, at best I could tell the formal connection between homology and error correction is made in a comprehensive paper by Bombin and Martin-Delgado. The advantage of this approach is that properties of the code, including rate and distance, can be tied to geometric properties of the underlying homology, reducing the construction of good codes to that of manifolds with the right properties.

In addition to the (conjectural) construction of good qLDPC, almost as an afterthought Hastings provides an unconditional construction of a quantum locally testable code (qLTC), albeit one which encodes two qubits only. Let’s try to visualize this, starting from the helpful warm-up provided by Hastings, a high-dimensional, entangled, locally-testable code…which encodes zero qubit (the code space is one-dimensional). Of course this is trivial, but it’s a warm-up!

The simplest instance to visualize has six physical qubits. To follow the forthcoming paragraphs, take a piece of paper and draw a large tetrahedron. If you didn’t mess up your tetrahedron should have six edges: these are your qubits. Now the parity checks are as follows. Each of the four faces specifies an -stabilizer which acts on the three edges forming the face. Each of the four vertices specifies a -stabilizer which acts on the three edges that touch the vertex. The resulting eight operators pairwise commute, and they specify a unique (entangled) state in the -dimensional physical space.

Next we’d like to understand “local” testability. This means that if we fix a set of edges, and act on each of them using an error, then the resulting operator should violate (anti-commute) with a fraction of -stabilizers that is proportional to the *reduced* weight of the error, i.e. its distance to the closest operator which commutes with all -stabilizers. To see which stabilizers “detect” the error , we recall that and which overlap at an even number of locations commute. Therefore a stabilizer will detect if and only if it lies in its *boundary* : the set of vertices which touch an odd number of edges in . This is our syndrome; it has a certain cardinality. To conclude we need to argue that can be modified into a set with no boundary, , and such that is as small as possible – ideally, it should involve at most as many edges as the size of the boundary . Here is how Hastings does it: for each vertex in the boundary, introduce an edge that links it to some fixed vertex – say the top-most one in your tetrahedron. Let be the resulting set of edges. Then you can check (on the picture!) that is boundary-less. Since we added at most as many edges as vertices in the boundary (if the top-most vertex was part of the boundary it doesn’t contribute any edge), we have proven local testability with respect to errors; errors are similar.

This was all in three dimensions. The wonderful thing is that the construction generalizes in a “straightforward” way to dimensions. Consider an -element universe . Qubits are all subsets of of size ; there are exponentially many of these. -stabilizers are defined for each -element subset; each acts on all of its -element supersets. Symmetrically, -stabilizers are defined for each -element set; each acts on all of its -element subsets. Thus the code is local: each stabilizer has weight , which is logarithmic in the number of qubits. It remains to check local testability; this follows using precisely the same argument as above (minus the picture…).

This first construction encodes zero qubits. How about getting a couple? Hastings gives a construction achieving this, and remains (poly-logarithmically) locally testable. The idea, very roughly, is to make a toric code by combining together two copies of the code described above. The number of encoded qubits will become non-trivial and local testability will remain. Unfortunately, just as for the toric code, the distance of the result code only scales as . To construct his code Hastings uses a slightly different cellulation than the above-described one. I am not sure precisely why the change is needed, and I defer to the paper for more details. (Leverrier, Tillich and Zemor had earlier provided a construction, based on the TZ hypergraph product, with linear rate, square root distance, and local testability up to the minimal distance, i.e. for all errors of reduced weight at most .)

Although the geometric picture takes some effort to grasp, I find these constructions fascinating. Given the Brandao-Harrow objections to using the most “straighforward” expander constructions to achieve CSP-qPCP, or even NLTS, it seems logical to start looking for combinatorial structures that have more subtle properties and lie at the delicate boundary where both robustness (in terms of testability) and entanglement (non-triviality of ground states) can co-exist without challenging monogamy.

]]>Fifteen Caltech students, with a roughly equal mix of physics/CS/EE backgrounds, followed the course till the end (we started at ~20). We had a great time, but integration with the online course proved more challenging than I anticipated. Let me say why, in the hope that my experience could be useful to others (including myself, if I repeat the course).

The EdX content was released in 10 weekly updates, on Tuesdays. Since on-campus classes took place Tuesdays and Thursdays, I asked Caltech students to review the material (videos+lecture notes+quizzes) made available online on a given Tuesday by the following Tuesday’s class. I would then be able to structure the class under the assumption that the students had at least some minimal familiarity with the weeks’ concepts. This would allow for a more relaxed, “conversational” mode: I would be able to react to difficulties encountered by the students, and engage them in the exploration of more advanced topics. That was the theory. Some of it worked out, but not quite as well as I had hoped, and this for a variety of reasons:

**There was a large discrepancy in the students’ level of preparation**. Some had gone through lecture notes in detail, watched all videos, and completed all quizzes. Although some aspects of the week’s material might still puzzle them, they had a good understanding of the basics. But other students had barely pulled up the website, so that they didn’t even really know what topics were covered in a given week. This meant that, if I worked under the assumption that students already had a reasonable grasp of the material, I would loose half the class; whereas if I assumed they had not seen it at all I would put half the class to sleep. As an attempted remedy I enforced some minimal familiarity with the online content by requiring that weekly EdX quizzes be turned in each Tuesday before class. But these quizzes were not hard, and the students could (and did) get away with a very quick scan through the material.- As all students, but, I hear, even more so here,
**Caltech undergraduates generally (i) do not show up in class, and (ii) if per chance they happen to land in the right classroom, they certainly won’t participate**. In an attempt to*encourage*attendance I made homeworks due right before the Tuesday 10:30am class, the idea being that students would probably turn in homeworks at the last minute, but then they would at least be ready for class. Bad idea: as a result, students ended up spending the night on the homework, dropping it off at 10:29:59… only to skip class so as to catch up on sleep!Slightly over half of the registered students attended any given class, a small group of 8-10 on average. This made it harder to keep participation up. On the whole it still went pretty well, and with a little patience, and insistence, I think I eventually managed to instore a reasonably relaxed atmosphere, where students would naturally raise questions, submit suggestions, etc. But we did not reach the stage of all-out participation I had envisioned. **The material was not easy**. This is partially a result of my inexperience in teaching quantum information; as all bad teachers do I had under-estimated the effort it takes to learn the basis of kets, bras and other “elementary” manipulations, especially when one has but an introductory course in undergraduate linear algebra as background. Given this, I am truly amazed that the 15 students actually survived the class; they had to put in*a lot*of work. Lucky for me there are bright undergraduates around!We ended the course with projects, on which the students did amazingly well. In groups of 2-3 they read one or more papers in quantum cryptography, all on fairly advanced topics we had not covered in class (such as relativistic bit commitment, quantum homomorphic encryption, quantum bitcoin, and more!), wrote up a short survey paper outlining some criticisms and ideas they had about what they had read, and gave an invariably excellent course presentation. From my perspective, this was certainly a highlight of the course.

Given these observations on what went wrong (or at least sub-optimally), here are a few thoughts on how the course could be improved, mostly for my own benefit (I hope to put some of these to good practice in a year or two!). This should be obvious, but: **the main hurdle in designing a “flipped classroom” is to figure out how to work with the online content**:

- First there is a scheduling difficulty. Some students complained that by having to go through the weekly videos and lecture notes
*prior*to the discussion of the material in class they simultaneously had to face two weeks’ worth of content at any given time. Scheduling of online material was decided based on other constraints, and turned out to be highly sub-optimal: each week was released on a Tuesday, which was also the first day of class, so that it was unreasonable to ask the students to review the material before that week’s classes….pushing it to the next week, and resulting in the aforementioned overlap. A much better schedule would have been to e.g. release material online on Friday, and then have class on Tuesdays and Thursdays. This would have led to a larger overlap and less schizophrenia. - Then comes the problem of “complementarity”. What can be done in class that does not replicate, but instead enriches, then online material? This is made all the more difficult by the inevitable heterogeneity in the student’s preparation. An effort has to be made to limit this by finding ways to enforce the student’s learning of the material. For instance, each class could be kick-started by a small presentation by one of the students, based on one of the online problems, or even by re-explaining (or, explaining better!) one of the week’s more challenging videos. This should be made in a way that the students find it valuable, both for the presenter and the listeners; I don’t want the outcome to be that no one shows up for class.
- Student-led discussions usually work best. They love to expose their ideas to each other, and improve upon them. This forces them to be active, and creative. The best moments in the class where when the discussion really picked up and the students bounced suggestions off each other. The existence of the online material should facilitate this, by giving a common basis on which to base the discussion. My approach this time wasn’t optimal, but based on the experience I think it is possible to do something truly engaging. But it won’t work by itself; one really has to design incentive-based schemes to get the process going.

Success of the online course is rather hard to judge. At the end of the course there were about 8000 officially registered students. Of these, EdX identified ~500 as “active learners” over the last few weeks (dropping from ~1500 over the first few weeks, as is to be expected). I think an active learner is roughly someone who has at least watched some parts of a video, answered a quizz or problem, participated in a forum, etc.

About 100 students pursued an official certificate, which means that they paid ~50$ to have their success in the course officially registered. I couldn’t figure out how many students have actually “passed” the class, but I expect the number to be around 200: most of the certified students plus a few others who didn’t want to pay for the certificate but still turned in most homeworks. This is a fair number for a challenging specialized course, I am pretty happy with it. The high initial enrollment numbers, together with anecdotal evidence from people who got in touch directly, indicate that there certainly is demand for the topic. The most active students in the course definitely “wanted in”, and we had lots of good questions on the forum. And, many, many typos were fixed!

How satisfied were the students with the course? We ran an “exit survey”, but I don’t have the results yet; I can write about them later (hoping that a significant enough number of students bother to fill in the survey). We also had pre- and mid-course survey. Some of the more interesting questions had to do with how students learn. In my opinion this is the main challenge in designing a MOOC: how to make it *useful*? Will the students learn anything by watching videos? Anecdotal evidence (but also serious research, I hear) suggests not. Reading the lecture notes? Maybe, but that requires time and dedication – basically, to be an assiduous learner already. Just as “in-the-classroom” learning, it is the problem-solving that students are brought to engage in that can make a difference. Students like to be challenged; they need to be given an active role. In the mid-course survey many of the positive comments had to do with “Julia lab” assignments that were part of the course, and for which the students had to do some simple coding that let them experiment with properties of qubits, incompatible measurements, etc. In the pre-course survey students also indicated a marked preference for learning via solving problems rather than by watching videos.

So** a good online MOOC should be one that actively engages the student’s problem-solving skills**. But this is not easy! Much harder than recording a video in front of a tablet & webcam. Even though I was repeatedly told about it before-hand, I learned the lesson the hard way: homework questions have to be vetted *very thoroughly.* There is no end to a student’s creativity in misinterpreting a statement – let alone 1000 students’. Multiple-choice questions may sound straightforward, but they’re not: one has to be very careful that there is exactly one straight correct answer, while at the same time not making it too obvious which is that answer; when one has a solution in mind it is easy not to realize that other proposed supposedly wrong solutions could in fact be interpreted as correct. The topic of cryptography makes this particularly tricky: we want the students to reason, be creative, devise attacks, but the multiple-choice limits us in this ability. Luckily we had a very dedicated, and creative, team of TAs, both in Delft and In Caltech, and by working together they compiled quite a nice set of problems; I hope they get used and re-used.

It’s too early (or too late) for conclusions. This was a first, and I hope there’ll be a second. The medium is a challenge, but it’s worth reaching out: we teach elite topics to elite students at elite institutions, but so many more have the drive, interest, and ability to learn the material that it would be irresponsible to leave them out. MOOCs may not be the best way to expand the reach of our work, but it is one way…to be improved!

It was certainly a lot of fun. I owe **a huge thank you to all the students**, in the classroom and online, who suffered through the course. I hope you learned a lot! Second in line were **the TAs**, at Caltech as well as Delft, who did impressive work, coping simultaneously with the heavy online and offline duties. They came up with a great set of resources. Last but not least, behind the scenes, the **video production** and **online learning teams**, from Delt and Caltech, without whose support none of this would have been made possible. Thanks!

There are many possible measures against which to evaluate the experience. An easy one is raw numbers. Online there are a bit over 7,200 students enrolled. But how many are “active”? The statistics tools provided by EdX report 1995 “active” students last week – under what definition of “active”? EdX also reports that 1003 students “watched a video”, and 861 “tried a problem”. What is an active student who neither watched a video nor tried a problem – they clicked on a link? In any case, the proportion seems high; from what I heard a typical experience is that about 2-5% of registered students will complete any given online course. Out of 7,000, this would bring the number of active students by the end of the course at at a couple hundred, a number I would certainly consider a marked success, given the specialized topic and technically challenging material.

At Caltech there are 20 students enrolled in CS/Phys 120. Given the size of our undergraduate population I also consider this to be a rather high number (but the hard drop deadline has not passed yet!). It’s always a pleasure to see our undergraduate’s eagerness to dive into any exciting topic of research that is brought to their attention. I don’t know the enrollment for TU Delft, but they have a large program in quantum information so the numbers are probably at least twice as high.

Numbers are numbers. How about enthusiasm? You saw the word cloud we collected in Week 0. Here is one from Week 2 (“What does “entanglement” evoke in you right now?”; spot the “love” and “love story”; unfortunately only 1% of responses for either!). Some of the students speak up when prompted for simple feedback such as this, but the vast majority remain otherwise silent, so that involvement is hard to measure. We do have a few rather active participants in the discussion forums, and it’s been a pleasure to read and try to answer their creative questions each day – dear online learners, if you read this, thanks for your extremely valuable comments and feedback, which help make the course better for everyone! It’s amazing how even as we were learning qubits some rather insightful questions, and objections, were raised. It’s clear that people are coming to the course from a huge range of backgrounds, prompting the most unexpected reactions.

A similar challenge arises in the classroom. Students range from the freshmen with no background in quantum information (obviously), nor in quantum mechanics or computer science, to more advanced seniors (who form the bulk of the class) to graduate students in Caltech’s Institute for Quantum Information and Matter (IQIM). How to capture everyone’s attention, interest, imagination? The topic of cryptography helps -there is so much to be fascinated with. I started the course by discussing the problem of quantum money, which has the advantage of easily capturing one’s imagination, and for which there is a simple quantum scheme with a clear advantage over classical (cherry on top, the scheme is based on the famous “BB84 states” that will play a major role in the class via their use for quantum key distribution). So newcomers to quantum information could learn about qubits, kets and bras, while others could fend off their impatience by imagining new schemes for public-coin quantum money.

This is not an easy line to thread however, and given the composition of the class I eventually decided to err on the side of caution. Don’t repeat it, but this is my first time even teaching a full class on quantum information, and the basic concepts, not to mention the formalism, can be quite tricky to pick up. So we’re going to take it slow, and we’ll see how far we get. My hope is that the “flipped classroom” format should help needy but motivated students keep afloat by making all the online material available before it is discussed in class. Since the online course has only been going on for a couple weeks I can’t yet report on how well this will work out; my initial impression is that it is not given that the in-class students actually do spend enough time with the online material. I am yet to find the proper way to incentivize this: quizzes? rewards? The best reward should be that they manage to follow the course

In the coming weeks we’ll start making our way towards quantum key distribution and its analysis. Entanglement, measuring uncertainty, privacy amplification, BB84 and Eckert, and device independence. Quite a program, and it’s certainly nice to attempt it in such pleasant company!

]]>Note the top contender: let’s see if we live up to their expectations!

It’s been a fun first week. We released “Week 0” of the material a month ahead of the official start date, so that those students not yet familiar with the basics of quantum information (what is a qubit?) would have enough time to digest the fundamental notions we’ll be using throughout. (Out of the ~5500 registered students, ~1250 are currently marked my EdX as “active” and 560 have watched at least one video. Keep it coming!)

An unexpected benefit of opening up the platform ahead of time is that it is giving us the time to experiment with (read: debug) some of the tools we plan to use throughout. A first is EdX’s own possibilities for interaction with the students, an example of which is pictured above (“Use the space below to enter a few words that best characterize your expectations for this class”).But we’re also using a couple add-ons:

The first is a system replacing EdX’s discussion forums,called Askalot. Cute name – how can you resist. The main benefit of Askalot is that it provides more structure to the discussions, which can be characterized as question/bug report/social discussion/etc, can be up/down-voted, marked as correct/invalidated by the instructors, etc. The students are having fun already, introducing themselves, complaining about bugs in the quizzes, and, of course, about Askalot itself! (Thanks go to Ivan Srba, one of the creators of Askalot, for being extremely responsive and fixing a host of minor bugs overnight – not to mention throwing in the extra feature requested by the students.)

A second is called DALITE. The idea of DALITE is to encourage students to provide an explanation justifying their answer to a question. Indeed, one of the drawbacks of the online platform is the need for automatic grading of assignments, which greatly limits how the student can be engaged in the problem-solving exercise, mostly limited to multiple-choice or simple numeric answers. DALITE (which grew out of, and is still, a serious research project in online learning) introduces a fun twist: the student is asked to type in a “rationale” for her choice. Of course there is no way we could grade such rationales. But here is the idea: once the student has entered her explanation, she is shown the rationale provided by another student (or the instructor) for a different answer, and asked whether she would like to reconsider her decision. The student can choose to change her mind, or stick with her initial choice; she is asked to explain why. It’s really fun to watch the answers provided (“What would happen if quantum mechanics allowed cloning of arbitrary states?”), the change of minds that take place, and the rationale that incentivized said change of mind. (Thanks to Sameer Bhatagnar for helping us set up the tool and explaining its many possibilities, and to Joseph Williams for suggesting its use in the first place.)

We’ll see how these pan out in the longer run. I’m definitely eager to experiment with ways to make the MOOC experience a better learning experience for the students. I’ll let you know how it goes. Suggestions welcome!

PS: 47% students 25 and under, 78% from outside US, is good, but 16% female is not…come on!

]]>

** Why? **

More often than not the first question I get asked is – why? *Why* did you decide to do this?? As a teacher you’re used to torturing a couple dozen students each semester… but now, not only do you want to ramp it up and torture thousands at a time, but you also want to torture *yourself*???

Let’s see. I can find at least two reasons. The first is rather selfish: I’m curious. I want to give it a try. My interest in the possibilities of online education, as is the case for many of us, started around 2011, during the “big boom” that followed the success of Sebastian Thrun and Peter Norvig’s Artificial Intelligence MOOC at Stanford. It is through discussing the experience of my Ph.D. advisor, Umesh Vazirani, who taught one of the early MOOCs (and still the only one on quantum computing), that I got more seriously interested in the medium. As everything he does, Umesh took the course deep at heart, and taught it passionately, striving to pack his signature “nuggets” of insight into the rigid format of a 10-minute video meant to be accessed “massively online”. Umesh had great hopes for the medium, and his course was wildly successful. It has certainly been cited as their main source of education in quantum information by many a Caltech Ph.D. applicant!

I have the impression that the initial explosion of interest in MOOCs that took place in 2012-1014 has subsided somewhat, and the medium may be going through a phase of soul-searching: given that it is now clear MOOCs will not cure all the world’s ills, can they at least be useful for *something*? The “online education” medium certainly has its own challenges. As a friend working in the psychology of learning put it somewhat bluntly, “students don’t learn *anything* from videos – don’t waste your time” (too late, Joseph – I had started recording already). There is some truth to this (though as long as we don’t forget the 2x button there’s still a chance the students might condescend to skimming through some fraction of the videos in a quick hunt for hints towards the pset solutions).

The point though is that broadly interpreted “online education” should have much more potential for engaging the students than simply dumping video clips on them . This is what I’m interested in exploring: the extent to which it is possible to set up a stimulating, useful interactive process for the students, with some necessary “downtime” spent reading notes and working problems on their own, and some “uptime” spent watching videos but also participating in forums, thinking through stimulating questions, checking out other student’s answers, etc.

The second reason is perhaps less self-centered. I think we picked the right topic. A two-word title, each of which bound to associate with a whole range of exciting, if nebulous, concepts in the mind of any young apprentice-scientist, what is not to like? Quantum cryptography has recently (and somewhat justifiably) been drawing a lot of attention, from the expanding range of start-ups to the NSA through Chinese rockets. But surprisingly few resources are available to whomever seeks a serious introduction to the subject. While great books on quantum computing and quantum information are popping up, none gives much attention to quantum cryptography. is no book on quantum cryptography, few if any lecture notes (I searched, and the only consistent notes I could find are from a course taught by Unruh in 2014), few surveys (see this recent one by Broadbent and Schaffner, which however remains at a very high level), and a scientific literature that is not easy to navigate.

** What? **

So this is what Stephanie & myself set to do, share our enthusiasm for quantum cryptography with you, young enthusiasts! What will the course be about? If you care you should watch the promo video, sign up, and take the course!

We hope to use the ten weeks we have to take the absolute beginner in quantum information and cryptography to the level where she has a solid conceptual *and* mathematical understanding of quantum cryptography. A large chunk of the course will be devoted to the description and security analysis of protocols for quantum key distribution (QKD). This includes the thorny notions associated with measuring security with respect to quantum adversaries, and the recent paradigm of device independence. Beyond QKD, we also aim to give a broad overview of other feats of quantum cryptography, including primitives in two-party cryptography such as bit commitment or coin flipping, the noisy storage model, position-based cryptography, and delegated computation. Unfortunately we will not have enough time to cover quantum attacks on classical cryptosystems or post-quantum cryptography, leaving space for a companion “quantum cryptanalysis” course – volunteers?.

A word of caution: a solid background in linear algebra will be required for the course. Or at least, a will to put in the serious effort that will be required to follow the material. Making the course accessible does not mean dumbing it down, and the less mathematically inclined might find it challenging. The upshot, though, is that whomever sticks around will find it an intellectually rewarding experience, as the course will bring you to a stage where you’re basically ready to start doing research in the area. Not to scare everyone away: the course will start slow, with a “Week 0” of background material rolling out now, and the first couple weeks devoted to an introduction to the important notions of quantum information. My a priori assessment is that any 3rd or 4th year Caltech undergrad should have no problem taking the course. We’ll see how it plays out, but you should certainly try!

I will be teaching the course “inverted classroom” style at Caltech (as will Stephanie in Delft). The students will be asked to watch the videos, and more generally go through the online content, ahead of regular class meetings. In class, depending on the level of the students and their understanding of the material we will provide additional explanations or – as I hope will be possible – go further and develop our own projects extending the online material in directions of interest to the students.

This is a new experience for me, and I already stumbled through some of the beginners’ mistakes. But it’s fun! And, I hope, worth it. I’ll keep you updated. If anyone has experience (as I’m sure some of you do), unsolicited advice is more than welcome!

]]>