## FOCS 2012

The conference hotel

FOCS 2012 just took place in New Brunswick, and it was a lot of fun. Instead of commenting on the food, which was already debated enough on site, here are a few randomly selected highlights from the talks.

Constructive Discrepancy Minimization by Walking on the Edges (paper). Raghu Meka was the uncontested star of the conference, with five papers…four of which he presented himself, including having to run from the “track A” room to the “track B” one in the middle of a session in order to give two of his talks!

This particular paper was my personal favorite. The goal is to find an algorithmic variant of Spencer’s famous “Six Standard Deviations Suffice” theorem, which can be formulated as follows. Suppose given ${n}$ subsets ${S_j}$ of ${\{1,\ldots,n\}}$. Then there exists a choice of signs ${x_i\in\{\pm 1\}}$ such that, for each ${j}$, ${|\sum_{i\in S_j} x_i | \leq 6\sqrt{n}}$. The main result of Sachar Lovett and Raghu Meka is a randomized polynomial-time algorithm that, given the sets ${S_j}$, finds a sequence ${(x_i)}$ whose discrepancy ${\max_j|\sum_{i\in S_j} x_i | }$ is bounded by a constant times ${\sqrt{n}}$ (but the constant may be bigger than ${6}$…).

Their algorithm is very nice. Consider the polytope ${\mathcal{P}\subseteq {\mathbb R}^n}$ that is bounded by two different types of faces: the “cube” faces, which enforce ${|x_i|\leq 1}$, and the “discrepancy” faces, of the form ${|\sum_{i\in S_j} x_i |\leq 100\sqrt{n}}$. The algorithm is a random walk that proceeds as follows. Start at the origin. At every step, move by a random Gaussian amount: ${\mathcal{N}(0,1)}$ along each of the ${n}$ coordinate axes. Whenever the walk hits a face (or gets very close), recurse inside the subspace defined by that face.

How does one analyze this walk? The main point is to show that many “cube” faces will be hit before too many “discrepancy” faces are. Indeed, every time we hit a “cube” face we are happy, as we have found a coordinate ${x_i}$ that can be set to ${\pm 1}$ (instead of having a fractional value). Provided we can show that say half of the ${x_i}$ can be determined in this way in a reasonable time we can then recursively apply the algorithm on the remaining coordinates to eventually determine all of them.

Fall colors in New Brunswick

The difficulty is that, if we hit too many “discrepancy” faces first, then there is a risk the random walk will get stuck without ever determining enough of the ${x_i}$. (One also needs an argument saying that the walk does keep hitting faces. This follows from the fact that the ${\ell_2}$ norm of the vector defining the walk’s current position must keep increasing, on expectation, at every step of the random walk.) To show this does not happen, it suffices to observe that the “discrepancy” faces are much further than the “cube” faces: indeed, if ${x}$ is a random Gaussian vector of norm ${\|x\|^2 = n}$, then ${\mathrm{E}[|x_i|]=\sqrt{2/\pi}}$, but ${E[|\sum_{i\in S_j} x_i|] \approx \sqrt{|S_j|}}$ (up to a small constant). Of course one needs to do a more precise analysis of the probability that the constraint specified by either type of face is satisfied, but that’s the main idea — beautifully simple!

Finding Correlations in Subquadratic Time (paper). Greg Valiant talked about some very nice algorithmic work on finding isolated pairs of correlated vectors. Suppose given ${n}$ random ${d}$-dimensional boolean vectors ${x_1,\ldots,x_n}$ (each coordinate is chosen to be ${0}$ or ${1}$ independently with probability ${1/2}$), all but two of which were chosen independently. The remaining two vectors, say ${x_{j_1},x_{j_2}}$, are ${r}$-correlated in that for every ${i\in\{1,\ldots,d\}}$, ${\Pr(x_i=y_i)=1/2+r}$ for some small (constant) ${r}$. How much time does one need to identify the hidden ${r}$-correlated pair ${(x_{j_1},x_{j_2})}$? It is not obvious at first how one could avoid computing all pairwise Hamming distances, which would naively take ${\Theta(n^2d)}$ time. Previous approaches based on e.g. hashing gave algorithms in time ${O(n^{2-\Omega(r)})}$. Greg gave an algorithm that runs in time ${O(n^{1.62\ldots}\mathrm{poly}(1/r))}$ (provided ${d}$ is large enough; for ${d}$ smaller than ${n^{0.61}}$ or so the runtime becomes a bit worse, but remains sub-quadratic).

The main idea is simple: Let ${X}$ be the ${d\times n}$ matrix having ${(i,j)}$-th entry ${(-1)^{(x_j)_i}}$. Then the matrix ${X^TX}$ allows one to easily spot vectors that are ${r}$-correlated: most off-diagonal entries will have absolute value concentrated around ${\sqrt{d}}$, while the two entries corresponding to the inner-product of the ${r}$-correlated pair will have a much larger absolute value ${\approx 2rd}$. The obvious problem, of course, is that computing ${X^TX}$ requires at least ${n^2}$ operations. Note however that the entries corresponding to the correlated vectors are much much larger than any other entry: thinking of ${r}$ as a constant and of ${d}$ as being polynomial in ${n}$, even a substantially smaller difference would be noticeable. The idea is then to bundle the columns of ${X}$ into large groups (of size roughly ${n^{1/3}}$ each), and apply the previous algorithm to the matrix ${Z}$ whose columns are the sum (over the reals) of the columns of ${X}$ in each group. The size of a group is chosen so that it will still be possible to spot the pair of columns of ${Z}$ that contain the ${r}$-correlated pair. The calculation of ${Z^T Z}$ is much less expensive than that of ${X^TX}$, and leads to the improvement in the runtime.

What if the initial dimension ${d}$ is not large enough? The solution is to artificially increase it by adding rows obtained as the XOR of randomly chosen subsets of the pre-existing rows. This operation will degrade the correlation between the ${r}$-correlated vectors, but the approach outlined above is robust enough that one can afford making ${r}$ much smaller.

Greg also described an application to the following problem: given ${n}$ arbitrary unit vectors in ${{\mathbb R}^d}$, find a pair of vectors whose inner product is at least ${(1-\varepsilon)}$ as large as the largest inner product, among all pairs. See his paper for details!

Coffee break

The dynamics of influence systems (paper). Bernard Chazelle gave a beautifully illustrated talk on “The dynamics of influence systems”. Although the main thread of reasoning was a bit hard to follow for me (maybe the pretty pictures were too distracting), Chazelle made his main point clearly. Informally, an influence system is a set of ${n}$ agents evolving in ${{\mathbb R}^d}$. At every step, each agent is influenced by his neighbors only (one must specify some notion of a neighborhood graph). Based on what these neighbors have to say, and on her own decision process, the agent decides where to move at the next step. The main point is that the system is restricted with respect to whom influences whom (only neighbors), but very liberal as to what kind of update rules are allowed. In particular, all agents can use different rules: this is analogous to a system of ${n}$ physical particles with local interactions but such that each particle obeys its own personal set of physical laws.

Chazelle’s main result (informally) is that any such system very quickly evolves to a periodic state. Hence the “take home” message: however complex we believe our decision-taking process is, however much we think we learn from past mistakes, history is bound to repeat itself 🙂

Geometric Complexity Theory V (paper). Ketan Mulmuley‘s talk on GCT V was very punchy and energizing; he managed to throw in a surprising number of jokes (and grand claims :-)). He started off with a fundamental classification problem in algebraic geometry (I will not describe the problem…), presenting it as a special case of a more general class of well-studied problems. The general problem is known to be “wild”, while the special case is only known as being “hopeless”, or “impossible”. “Wild” has a precise meaning in algebraic geometry, and Mulmuley described it as the analogue of “NP-complete” in complexity theory. So Mulmuley asked, what should the analogue of ”hopeless/impossible” in TCS be? (In algebraic geometry this means the problem has been studied since Hilbert by the best experts, who have not been able to make any substantial progress on it :-))

Mulmuley presented his main result a bit provocatively. He introduced the “Noether normalization lemma” (NNL), and stated the following results: (1) If “determinant identity testing” (DIT) has a black-box derandomization, then NNL is in P, and (2) If the Generalized Riemann Hypothesis (GRH) is true, then NNL is in the third level of PH. (Currently, the best known is that NNL is in EXPSPACE.) (1) and (2) are formal theorems, but Mulmuley then used them to argue that derandomizing DIT might be harder than proving GRH! This seems a bit of a stretch (who said GRH did not imply that NNL is in P as well?), and Mulmuley admitted he didn’t have any formal argument supporting that claim, though I’m sure he has more evidence than he gave us. In any case (1) is strong indication that derandomizing DIT may be even much harder than it currently is perceived to be, as NNL has already resisted the assaults of mathematicians for over a century.

Lower bounds on Information Complexity via Zero-Communication Protocols and Applications (paper). Virginie Lerays gave a nice talk on a paper I already briefly discussed, so I will not repeat my comments (but still wanted to point it out, since I really like the paper :-))

How to construct quantum random functions (paper). Marc Zhandry presented his result on establishing security of pseudorandom functions against “quantum superposition attacks” from the assumption that there exists one-way functions that are resistant to similar attacks. The starting point for his work is the observation that the security of standard constructions of the former from the latter, such as the GGM construction, is proved through a reduction based on a simple hybrid argument. That hybrid argument heavily relies on the fact that the attacker can only query the function at a polynomial number of values. A quantum adversary, however, may make queries in superposition, and hence potentially gain information about all values of the function just from one query (though of course he can’t just recover all values: at some point he has to measure). This breaks the hybrid argument, and Zhandry’s main result is a new technique to overcome this issue.

Zhandry presented the following as his main technical claim. Let ${\mathcal{D}}$ be a fixed distribution over a certain finite set ${S}$. Define two “random” functions ${f}$ and ${g}$ from ${\{0,1\}^n}$ to ${S}$ as follows. For ${f}$, each ${f(x)}$ is chosen independently at random from ${\mathcal{D}}$. For ${g}$, we first draw ${m}$ independent samples from ${\mathcal{D}}$, and then define each ${g(x)}$ as being a random value among the samples we chose. Then, as long as ${m=\mathrm{poly}(n,q)}$ is large enough, a quantum algorithm making at most ${q}$ queries cannot distinguish ${f}$ from ${g}$.

Briefly glancing at the paper, Zhandry seems to have a more general theorem that relates how well a ${q}$-query quantum algorithm can distinguish two oracles, whose outputs are chosen independently according to two distributions ${\mathcal{D}_1}$ and ${\mathcal{D}_2}$, from how well it can distinguish two sets of samples from the distributions. A priori the algorithm would have more power in the case of oracles, since it can query them at points of its choosing, but Zhandry shows that this extra power is in fact not helpful when the number of queries remains small.