## Foundations of randomness, Day 3

Last day! Wednesday started with a bang. Yevgeniy Dodis spent the whole Tuesday evening (our workshop dinner at Lanzerac!) complaining that we weren’t leaving him enough time to prepare for his talk…only to spend an equal amount of time during his talk complaining that he wasn’t going to be able to cover half of the material he had prepared! (To view things positively, this says something about the level of interaction during the talks — Yevgeniy was flooded with questions; quite a show of force for a post-workshop dinner morning.)

### Dodis – Randomness in cryptography

I regret not having saved a picture of the table Yevgeniy drew on the whiteboard for his talk. There were four columns: no randomness, weak randomness, local or public randomness, and perfect randomness. For each column Yevgeniy listed four cryptographic scenario: soundness (for which classes of computational problems are there sound interactive proof systems), authentication (secure implementation of primitives such as message authentication or digital signatures), privacy (tasks such as bit commitment, secret sharing, or even public- or private-key encryption) and extraction (is deterministic extraction of uniform randomness possible). The scenario are listed in increasing order of difficulty, and the question is how far each kind of randomness lets us go down the list.

The first and last columns are straightforward to fill in. In the first, corresponding to a world with no randomness, there is very little one can do. In terms of soundness it is possible to verify languages in NP, but no more; cryptographic tasks are clearly impossible (determinism does not allow for uncertainty — at least for this talk!), and certainly extraction does not make much sense. If, on the other hand, perfect randomness is available many feats can be accomplished. It is a milestone of complexity theory that all languages in PSPACE have interactive proof systems; the hard work of cryptographers over the past couple decades demonstrates an impressive range of cryptographic tasks that can be implemented with or without computational assumptions. And of course extraction is not a problem.

Things become interesting with the second column. Suppose given access to a source of randomness taken from a family of sources which have a guaranteed min-entropy rate, but from which it is not possible to deterministically extract uniformly random bits (such as Santha-Vazirani sources). Can such a source still be used to implement non-trivial cryptographic tasks?

Here Yevgeniy drew our attention to an interesting distinction between authentication and privacy. Indeed, the latter is much more demanding in terms of the “quality” of the randomness used. For an adversary to break an authentication task, such as a message authentication code (MAC), it is required that the adversary be able to forge a certain object — e.g. a fake signature for an unknown message for which another valid signature is known. In contrast, for the adversary to break a privacy task, such as encryption, it is sufficient for the adversary to distinguish between two distributions — encryptions of ${0}$ and encryptions of ${1}$. Thus at a high level it is not too surprising that it should be possible to implement authentication tasks in a way that is secure as long as the sources of randomness used to implement the task has a guaranteed min-entropy rate. Privacy tasks, on the other hand, seem to require much more structure from the random source. In fact, if one thinks about it the task of secure encryption is very closely related to that of extraction, and one may readily expect that privacy can only be achieved if the family of sources allows for deterministic extraction. This connection is made formal in a paper by Bosley and Dodis.

Returning to his second column, Yevgeniy introduced the notion of an ${(t,\delta)}$-expressive source, which informally is a class of sources such that for any two efficiently computable functions ${f}$ and ${g}$, if ${f}$ and ${g}$ differ on a significant (at least ${2^{-t}}$) fraction of inputs (measured under the uniform distribution) then there is a source in the family such that the expected statistical distance, when evaluated on that source, between ${f}$ and ${g}$ is at least ${\delta}$. That is, a family is expressive if it “separates” pairs of functions that differ significantly. Now the main results are:

1. Many common families of sources are expressive — including all families from which deterministic extraction is possible, but also e.g. Santha-Vazirani sources and various generalizations thereof.
2. No privacy task can be achieved if the only available randomness is taken from an (a priori unknown) source from an expressive family. Proof sketch: consider an encryption scheme. Of course encryptions of ${0}$ and encryptions of ${1}$ should be different most of the time (otherwise decryption will fail). Therefore by the definition of an ${(t,\delta)}$-expressive family, for say ${t=0}$ or ${t=1}$, there should be a source in the family under which both types of encryption can be efficiently distinguished — contradicting security.
3. Some authentication tasks, including MACs and digital signature schemes, can be implemented from some specific expressive families, such as Santha-Vazirani sources or their generalization block min-entropy sources.

The proof of the second point above is trivial, but the result is quite interesting, highlighting the importance of making the right definitions! I am not familiar at all with this area, but it seems that the notion of expressive family of sources is a meaningful one to have in mind. Now I only wish Yevgeniy had time to say something about the third column (public vs private randomness)…unfortunately it will have to be for another time.

### Fawzi – Exams

The following talk was given by Omar Fawzi, soberly entitled “Exams” (last day…it had to come!). Omar considers the following scenario. Suppose an exam consists of ${n}$ possible questions ${X_i}$, and a student has some information, or knowledge, ${B}$ based on which she attempts to answer the questions that are asked to her. Suppose the student achieves a certain score when asked about a random subset of ${k}$ of the ${n}$ possible questions — technically, given indices ${\{i_1,\ldots,i_k\}}$ the student produces guesses ${A_{i_1}(B),\ldots,A_{i_k}(B)}$ such that ${A_{i_j}(B) = X_j}$ for a fraction ${\Delta}$ of indices ${j\in\{1,\ldots,k\}}$, on average over the (uniformly random) choice of the ${k}$ indices and the student’s randomness (including the random variable ${B}$ and the map ${A}$).

Can we deduce that the student simultaneously “knows about” a fraction ${\Delta}$ of all possible questions in the exam — if she was asked about all ${n}$ possible questions at once, would she typically achieve the same score as when asked only ${k}$ of them? To make this a little more precise and see why it is not obvious, consider the following simple scenario. The correct answers ${X_i}$ are either all ${0}$, or all ${1}$, each with probability ${1/2}$. The student’s information ${B}$ consists of two bits; the first bit ${B_1 = X_i}$ (for all ${i}$), and the second bit ${B_2}$ is an independent coin flip. The student’s strategy is to always answer ${B_1\oplus B_2}$. Note that this student is not “optimal”, as she has “good days” (${B_2=0}$) on which she gets everything right, and “bad days” on which she scores ${0}$. Our task is not to extract an optimal student, but to reproduce this particular student’s behavior at the scale of the whole exam: we want a strategy that gets the whole exam right half the days, and the whole exam wrong the remaining time — not a strategy that gets half the exam right on every day! So you see that the most natural “rounding” procedure, which would decompose the set of ${n}$ questions into ${(n/k)}$ groups of ${k}$ and look up the student’s answers to each group, will not work. One needs to take into account the randomness in ${B}$ and in the student’s strategy to extract a global strategy that is consistent with the local behavior.

The scenario is reminiscent of what are known as “direct product tests” in the PCP literature, and there is a long sequence of works on the topic, see e.g. Dinur and Reingold, Impagliazzo et al. and Dinur and Steurer. As far as I could tell there are differences that make the problems incomparable: for instance, direct product tests attempt to match two consistent local views to a global object, whereas here the exam is guaranteed to pre-exist; direct product tests consider complete agreement instead of fractional agreement ${\Delta}$ here; etc. But the problems are similar enough that you’d expect techniques used to approach the one could be useful for the other as well. For anyone interested, see Omar’s slides for a more precise formulation of the problem he considers.

The situation is even more interesting in the quantum setting. Suppose the exam remains classical (though one could consider quantum exams as well…), but the student’s information ${B}$ is a quantum state. Then all we know is that for every set of ${k}$ questions the student has a measurement on ${B}$ that produces answers with a certain success rate. Can these measurements be combined in a way to provide answers to all questions in the exam simultaneously? Since any measurement will necessarily affect the student’s state we can’t simply repeatedly measure ${B}$, and monogamy prevents us from taking many copies of the state (which couldn’t all simultaneously be correlated with ${X}$ in the appropriate way). The problem has a very similar flavor to the one faced when proving security of randomness extractors against quantum side information. Again there are subtle differences, but one may expect some of the techniques developed in that area (such as the work of Koenig and Renner on sampling-based extractors) to be useful here.

The day ended with two afternoon talks each highlighting a number of open questions in device-independent randomness certification, and some leeway that the speakers have recently achieved. This left us with a solid set of take-home problems to think about!

### Pironio – Some open questions on DI randomness generation

The first talk was given by Stefano Pironio, who picked three of his favorite open questions in device-independent randomness, and had time to tell us about two of them in some detail.

Stefano’s first question asks whether it is possible to construct deterministic extractors for quantum sources. Although it is unclear whether this would have any application, it is a very natural question. The main observation is that the “raw” random bits that are produced in a device-independent randomness extraction procedure (say, the outputs of Alice’s device in a protocol which sequentially repeats the CHSH game) not only have a guaranteed min-entropy rate, but they also have additional structure that comes from the way they are produced. What this structure is, and whether it is sufficient to guarantee deterministic extraction, is precisely the content of the problem. Stefano showed us a specific inequality that he could show always holds, and hoped would suffice: if ${A_i\in\{-1,1\}}$ is the ${i}$-th bit output by the source, then the average of all correlators

$\displaystyle \frac{1}{2^n}\Big| \,\sum_{S\subseteq [n]}\, \big|\langle \Pi_{i\in S} A_i \rangle\big|\, \Big| \,\leq\, 2^{-\delta n},$

for some small ${\delta}$ depending on the CHSH violation observed. In fact Stefano claimed the inequality is already enough to show, via the probabilistic method, that a deterministic extractor should exist; thus it remains to find one…This is an appealing question, and I wouldn’t be surprised if it can be solved by the pseudorandomness experts out there!

The second problem Stefano presented has to do with the relation between randomness and entanglement. Suppose that the devices used in a randomness expansion procedure share ${n}$ “ebits”, or entangled bits. How many bits of randomness can be generated? This is both of theoretical and practical interest: entanglement is hard to come by, and of course we want to get the most out of whatever we have. All known protocols generate ${O(n)}$ random bits from ${n}$ ebits, but there is no reason for this to be a hard limit; intuitively entanglement is required for testing the devices, but less directly so for generating randomness (see also the next talk by Acin). And indeed, Stefano showed us that it is possible to go further: he has a protocol which generates ${n^{1+c}}$ randomn bits, for some small ${c}$, using ${n}$ ebits.

The last question, on which we unfortunately did not have time to go into, is of obtaining guarantees on the randomness that hold against “post-quantum” adversaries, limited only by the no-signaling principle. This is a fascinating problem on which very little is known. There are no-go results for privacy amplification in this setting, but they only apply in rather stringent scenario; moreover they pose no immediate obstacle to guaranteeing entropy, rather than uniform randomness. The problem is challenging because the general non-signaling framework gives us very little to work with: a family of conditional distributions, but no such thing as a state, a measurement, a post-measurement state, etc.; very few tools are available. On the other hand it makes the problem mathematically very clean, almost “classical” (no more Hilbert spaces, no more quantum!). I only wish I had more intuition for it!

### Acin –  How much randomness can be certified in quantum and general non-signalling theories?

In the last talk of the workshop Antonio Acin asked the question — “How much randomness can be certified in quantum and general non-signaling theories?” — and gave us some rather counter-intuitive answers. As already mentioned in Stefano’s talk, the prevailing intuition is that the amount of randomness that can be extracted is directly related to the amount of “quantumness” present in the devices, where “quantumness” would be directly related to entanglement, since entanglement is clearly the “non-classical” feature exploited here. But this need not be so: even though entanglement is necessary for certifying the device, it is not a priori required that it serve as a resource that gets depleted as more and more randomness gets produced.

As a first result Toni showed how more randomness could be extracted from a single measurement, on a single entangled EPR pair, by considering general POVM measurements instead of projective ones. It is known (and not hard to show) that making POVM measurements on an EPR pair can only yield at most one bit of randomness per site. Toni introduced a four-outcome Bell inequality such that a maximal violation of the inequality requires the use of a non-projective four-outcome measurement on one half of the EPR pair, and a state such that each of the four outcomes has equal probability ${1/4}$ when the measurement is performed — thus yielding two uniformly random bits from a single qubit!

This is unexpected, but perhaps not groundbreaking: one bit, two bits, ok. But then Toni dramatically scaled things up. His second result is that by performing repeated measurements on a single, constant-size entangled state (I think in his scheme this is again just a pair of qubits) one could generate an infinite sequence of random bits, containing an amount of entropy which certifiably goes to infinity — provided the requisite Bell violation is guaranteed.

If you’re wondering about the relationship between this result and the one presented in the talk by Henry Yuen on infinite randomness expansion, there are some important differences. First of all Toni’s result applies under the assumption that the device maximally violates a certain Bell inequality, but there are no bounds on the accuracy required: thus in principle testing for a sufficiently-close-to-optimal violation could require an arbitrarily large number of runs; in Henry’s scheme the testing is fully accounted for. On the other hand Henry’s scheme requires the devices to have an unbounded number of EPR pairs available, whereas the main point of Toni’s scheme is that a single pair can be re-used over and over again.

The result is a proof of concept (another limitation is that the Bell inequality requires to perform a measurement on one of the devices that has an exponential, in the number of random bits produced, number of possible settings — the other device being measured sequentially using measurements with two settings), but I again find it rather unexpected. The whole area of device independence is turning out to be a great testbed for many of our (bad) intuitions about quantum mechanics and nonlocality!

Toni ended his talk with a third intriguing result, relating algorithmic randomness to information-theoretic randomness as we usually understand it. The main observation is the following. Suppose a classical Turing machine is used to prepare a sequence of qubits where each qubit is either initialized in one of the two possible computational basis states, or each qubit is initialized in one of the two possible basis states in the conjugate basis, the Hadamard basis. Then there is an algorithmic procedure that always succeeds in distinguishing which is the case! Note that the Turing machine may very well be producing a sequence that for all practical matters appears fully random, so that one can for instance show that no procedure which makes a single choice of basis and measures all qubits in that basis will be able to tell if it got the right basis or not. Instead the trick is to use both bases: measure half the qubits in the computational basis, and half in the Hadamard basis. In this case, one of the sequences obtained will be uniformly random for sure, and the other will be the sequence produced by the Turing machine — and as it turns out, it is possible to distinguish the two algorithmically. See the paper for more details, and further results along those lines.

The day ended with a very brief “open problems session”: Toni wrote down some of the questions that were raised during the workshop and we all commit to solve by the next iteration. I’ll leave you with a picture of the board. There are certainly many other questions we could ask, some of which are interspersed in these blog posts; feel free to suggest more!