## Quantum marginals

As I mentioned in the last post, last week Mathias Christandl, Michael Walter and Andreas Winter were organizing a workshop entitled “Quantum Marginals” as part of the semester-long programme Mathematical Challenges in Quantum Information at the Newton Institute. In his introductory talk Michael emphasized that the quantum marginal problem can be approached from a wide array of areas of mathematics and computer science, ranging from information theory to complexity theory through representation theory, algebraic geometry, random matrix theory, convex analysis and more. In this post I will first attempt to explain what the quantum marginal problem is, and then describe two or three of these connections, following some of the talks held during the workshop.

The classical marginal problem

The marginal problem can already be stated as an interesting problem on classical distributions. Broadly speaking, the question is the following: which families of distributions, defined on possibly intersecting sets of variables, can be realized as the marginals of a joint underlying distribution? More precisely, for integers ${n}$ and ${d}$ suppose given a family of distributions ${\{p_{S_i}\}}$ defined on ${\{1,\ldots,d\}^{S_i}}$ respectively, where ${S_i \subset\{1,\ldots, n\}}$. When does there exist a distribution ${p}$ on ${\{1,\ldots,d\}^n}$ such that for every ${i}$, ${p_{S_i}}$ is obtained as the marginal of ${p}$ on the variables in ${S_i}$?

While the problem is trivial if ${|S_i|=1}$ for every ${i}$, this seems to be essentially the only easy case. Consider for instance the problem when each ${|S_i|\leq 2}$. Clearly, a necessary condition for the existence of ${p}$ is that marginals sharing a same variable are consistent. But this is not sufficient: for instance, one can derive additional constraints by expanding out an expression such as ${\text{Var}(x_1+x_2+x_3)\geq 0}$ in terms of pairwise correlations only, placing further constraints on the marginals. In fact, it is easy to show that the decision problem for ${|S_i|\leq 2}$ is NP-hard; I’ll leave the simple reduction from any two-local constraint satisfaction problem as an exercise.

The marginal problem has a long history (Klyachko traces it back to work of Hoefling in the 40s), but it seems that very few general statements are known. I found it hard to find modern references on the topic; maybe this is because, as the foreword to the proceedings of a (somewhat) recent conference on the topic put it,

“The subject matter of these conference proceedings comes in many guises. Some view it as the study of probability distributions with fixed marginals; those coming to the subject from probabilistic geometry see it as the study of copulas; experts in real analysis think of it as the study of doubly stochastic measures; functional analysts think of it as the study of Markov operators; and statisticians say it is the study of possible dependence relations between pairs of random variables. All are right since all these topics are isomorphic.”

Among all these points of view the closest to us is probably that taken by statisticians, who think more generally of the marginal problem as one of causal inference: given empirical observations of the correlations between certain subsets of a collection of random variables, can one reconstruct links of causal effect between the random variables (for instance by constructing a graphical model that reproduces the observed correlations)? The latter is a topic of very active research in machine learning research, where one tries to identify the most relevant variables based only on partial information about their joint distribution (motivating scenario for this abound in e.g. biology).

The quantum marginal problem

The quantum marginal problem, in turn, asks about the following: given a set of density matrices ${\rho_{S_i}}$ on ${\otimes_{j\in S_i} {\mathbb C}^d}$ (i.e. ${d^{|S_i|}\times d^{|S_i|}}$ positive semidefinite matrices with trace ${1}$), where as before each ${S_i\subset\{1,\ldots,n\}}$, when does there exist a joint density matrix ${\rho}$ on ${({\mathbb C}^d)^{\otimes n}}$ having all the ${\rho_{S_i}}$ as its reduced densities (i.e. the partial trace ${{\mbox{\rm Tr}}_{\overline{S_i}}(\rho) = \rho_{S_i}}$)? While in classical scenario the assumption that only certain marginal distributions are observed may seem like an artificial constraint imposed by so-called “experimental conditions”, in quantum theory it arises very naturally, as we are used to the fact that certain observables can be jointly measured (e.g. measurements on isolated particles) while others cannot (e.g. position and momentum of the same electron).

A good deal of motivation for studying the quantum marginal problem comes from a particular variant called the ${N}$-representability problem. The latter can be traced back to some observations made by quantum chemists in the 40s, though its importance was first properly understood in the 60s, when it was first explicitly stated as a major open problem by Coulson (see Chapter 1 of the book Reduced Density Matrices for more historical background). In modern terms, the ${N}$-representability problem asks the following: given a single two-body reduced density matrix ${\rho_{12}}$ on ${{\mathbb C}^d\otimes {\mathbb C}^d}$, when does there exist an ${N}$-body density ${\rho}$ on ${({\mathbb C}^d)^{\otimes N}}$ that is supported on the antisymmetric subspace of ${({\mathbb C}^2)^{\otimes N}}$ and has ${\rho_{12}}$ as reduced density on any two systems? (Note that the condition that ${\rho}$ is antisymmetric is important to make the problem nontrivial, as otherwise of course any single reduced density can be realized as the partial trace of a larger density matrix.)

The relevance of the ${N}$-representability problem for quantum chemistry relates to the fact that many elementary chemical systems can be characterized as a system of ${N}$ identical particles subject to permutation-invariant two-body interactions. The particles will often be fermions, which are subject to the Pauli exclusion principle and thus follow the fermionic exchange rule: the state of a system obtained by exchanging two fermions picks up a minus sign (hence the state must be ${0}$ as soon as here are two identical particles — apparently, the idea of relating the exclusion principle to antisymmetry dates back to Heisenberg and Dirac). This explains the condition of antisymmetry placed on ${\rho}$. Next, the assumption that the ${N}$ particles are subject to permutation-invariant two-body interactions implies that the Hamiltonian of the system can be expressed as ${H = \sum_{i,j} H_{i,j}}$, where here all ${H_{i,j}}$ terms are identical (except they act on different pairs of particles ${i,j}$). This permutation-invariance then implies that the energy of the system can be computed as ${\min_{\rho_{12}} {\mbox{\rm Tr}}(H_{12}\rho_{12})}$, where particles ${1}$ and ${2}$ are arbitrarily chosen representatives. This is a huge bounty for chemistry: one can apparently compute the minimal energy of a complex system of ${N}$ interacting particles by solving a constant-size problem!

Of course there is a hitch. Looking at it more closely, we only shifted the difficulty, from optimizing over a simple but exponentially large set (${N}$-particle antisymmetric density matrices) to that of a small but highly complex set (${2}$-particle density matrices that can be realized as the reduced density of an antisymmetric ${N}$-particle density matrix). As it turns out, there is not much hope for getting around the difficulty: just a few years ago the ${N}$-representability problem was shown QMA-hardby Yi-Kai Liu, this in spite of encouraging numerical evidence from the chemists, who have successfully developed a range of practical algorithms for specific instances of the problem.

The quantum marginal problem itself gained prominence in the quantum information literature in the early 2000s, stimulated by questions about the type of entanglement that can arise in bipartite, and more generally multipartite, quantum states. (The discovery that entanglement becomes more and more constrained as the number of particles, or sub-systems, increases, nowadays identified as themonogamy of entanglement, has had — and continues to have — major repercussions in all areas in which quantum mechanics plays a role, from condensed matter physics to quantum computing through cryptography and the physics of black holes.) Soon after Higuchi and Sudbery and Linden et al. discovered explicit examples of incompatible sets of reduced density matrices for three-qubit systems, Higuchi et al. and Bravyi were apparently the first to give a complete characterization of the set of single-qubit reduced density matrices that can arise from a single pure ${n}$-qubit state: the condition is that the smallest eigenvalue of each reduced density is at most the sum of the smallest eigenvalues of all other reduced densities. Note how there already arises a crucial distinction with the classical setting, where a “pure” (deterministic) distribution on ${n}$ bits always gives rise to “pure” marginals, and vice-versa. In contrast, in the quantum case “maximal knowledge of the whole does not necessarily include the maximal knowledge of its parts.” (Schrödinger; quote stolen from a nice review on the topic by Klyachko), and the problem of determining even which single-particle densities can be obtained from a global pure state is highly non-trivial (of course if one allows the global state to be mixed as well then here as in the classical case one has to include at least some two-particle reduced densities for the problem to become non-trivial).

Physical motivations aside, the quantum marginal problem is fundamentally one of convex geometry: to each marginal density one can associate a convex set of global densities that reduce to it, and the question is to characterize the cases when the intersection of those sets is non-empty. It is, however, such a complex and rich question that, depending on the type of settings one wishes to focus on (two systems, large dimension? qubits, many systems? generic scenario or highly specific constraints? constrained or unconstrained global states? etc.) the problem is best approached by (and of interest to) different areas of mathematics. I will give three examples taken from talks given during the workshop.

Constraints on the spectrum: representation theory

Strong connections between the quantum marginal problem and algebraic geometry and representation theory already arise when one considers the following variant of the two-body scenario: suppose ${\rho_{AB}}$ is a density matrix on ${{\mathbb C}^d\otimes {\mathbb C}^d}$ with marginals ${\rho_A}$ on the first copy of ${{\mathbb C}^d}$ and ${\rho_B}$ on the second. Which vectors of eigenvalues ${\vec{\lambda}_A}$ for ${\rho_A}$, ${\vec{\lambda}_B}$ for ${\rho_B}$, and ${\vec{\lambda}_{AB}}$ for the global density on ${\rho_{AB}}$, are compatible? As mentioned above, this is fundamentally a question of geometry: by considering all possible “lifts” of the ${\vec{\lambda}_A}$, ${\vec{\lambda}_B}$ respectively we can view all three sets as convex subsets of (the unit simplex in) ${{\mathbb R}^{d^2}}$, and the question is when these sets intersect.

In his talk Aram Harrow sketched how one could approach this question using the representation theory of the symmetric group and Schur-Weyl duality (see also this paper by Klyachko). Because the spectrum of the reduced densities is invariant under the action of the unitary group on either subsystem, we can decompose the space ${{\mathbb C}^d\otimes {\mathbb C}^d}$ on which ${\rho_{AB}}$ lives as a direct sum of the tensor product of irreducible representations corresponding to this action: ${{\mathbb C}^d\otimes {\mathbb C}^d=\oplus_{\lambda,\mu} H_{\lambda} \otimes H_{\mu} \otimes {\mathbb C}^{n_{\lambda}n_\mu}}$, where ${\lambda}$ (resp. ${\mu}$) ranges over labels for irreducible representations of the unitary group on ${H_A}$ (resp. ${H_B}$), ${n_\lambda}$ (resp. ${n_\mu}$) is the corresponding multiplicity, and ${\rho_{AB}}$ is block-diagonal in this decomposition. Since the tensor product of the unitary groups acting on the two copies of ${{\mathbb C}^d}$ is a subgroup of the full unitary group, each factor ${H_{\lambda} \otimes H_{\mu}}$ itself decomposes into a direct sum of irreducible representations ${H_\nu}$ for the full unitary group. The multiplicity with which ${H_\nu}$ appears in the decomposition of ${H_{\lambda} \otimes H_{\mu}}$ is called the Kronecker coefficient ${g(\lambda,\mu,\nu)}$. And now comes the magic: it turns out — although unfortunately Aram didn’t have time to explain this — that compatible sets of eigenvalues for ${\rho_{AB}}$, ${\rho_A}$ and ${\rho_B}$ bear a tight connection to those triples ${(\lambda,\mu,\nu)}$ for which the Kronecker coefficient is non-zero; somehow there is a way to read the latter as precisely the eigenvalues themselves (up to some normalization), though I admit how this works goes beyond my understanding of the situation (Klyachko refers to this paper of his for the resolution of a closely related problem on the spectrum of sums of Hermitian matrices, and to Heckman’s theorem for the present problem, but I am not familiar with either).

Aram also showed how one could throw a little Schur-Weyl duality into the picture: assuming the density ${\rho_{AB}}$ is invariant under permutation of the two subsystems (hence ${\rho_A=\rho_B}$), we can consider not only the action of the unitary group on ${H_A}$ and ${H_B}$ but also that of the group of permutations of the subsystems (${\mathcal{S}_2}$ in the case of two subsystems). Since this action commutes with that of the unitary group on individual factors one gets a more refined decomposition of ${{\mathbb C}^d\otimes {\mathbb C}^d}$ into irreducible representations, in which for instance all the multiplicities are ${0}$ or ${1}$. Again, studying which representations occur leads to further constraints on the possible marginal spectra.

The generic case: random matrix theory

What can we say about the marginal problem for the case of a “generic” state? Here by generic we simply mean a state generated according to a natural random distribution. Guillaume Aubrun and Stanislaw Szarek gave talks presenting results, obtained with Deping Ye (see also the short exposition), in which they investigated the following scenario: choose a pure state on ${{\mathbb C}^d \otimes {\mathbb C}^d \otimes {\mathbb C}^s}$ at random (according to the Haar measure); how entangled is its reduced state on ${{\mathbb C}^d \otimes {\mathbb C}^d}$? They can show the existence of a very sharp threshold: there is a constant ${c>0}$ such that, for any ${\epsilon>0}$, if ${s<(c-\epsilon)d^{3/2}}$ then with high probability (as ${d\rightarrow\infty}$) the state is entangled; if ${s>(1+\epsilon)cd^{3/2}}$ then it is separable. Note that the extremes make sense: when ${s}$ is small we do expect there to be entanglement between the two copies of ${{\mathbb C}^d}$ (e.g. if ${s=0}$ the state is a random pure state, which will be close to maximally entangled), whereas if ${s}$ becomes very large we expect the reduced density on ${{\mathbb C}^d\otimes {\mathbb C}^d}$ to be close to the totally mixed state, and the state is separable.

How to prove such a result? Guillaume and Stanislaw took the time to give us a very detailed sketch of the proof, and I highly recommend watching their respective talks. In the first talk Guillaume proved the “easy” part, which is the large ${s}$ regime, and in the second talk Stanislaw showed how Guillaume’s arguments could be refined to obtain a sharp threshold.

The main idea is the following. Let ${\mathcal{S}}$ be the set of separable states in ${{\mathbb C}^n = {\mathbb C}^d\otimes {\mathbb C}^d}$, and ${\rho\mapsto\|\rho\|_{\mathcal{S}}}$ denote the gauge of ${\mathcal{S}}$, the largest ${t>0}$ such that ${t\rho\in\mathcal{S}}$. Our goal can be reformulated as a two-step process: 1) show that ${\|\rho\|_{\mathcal{S}}}$ exhibits a sharp threshold: there is a value ${s_0}$ such that, if ${s\geq (1+\epsilon)s_0}$ (resp. ${s\leq (1+\epsilon)s_0}$) then ${\|\rho\|_{\mathcal{S}}\ll 1}$, i.e. the state is separable (resp. ${\|\rho\|_{\mathcal{S}}\gg 1}$, i.e. the state is entangled), with high probability, and 2) compute the expectation ${\textsc{E}[\|\rho\|_{\mathcal{S}}]}$, as a function of ${s}$, and determine as precisely as possible the ${s_0}$ for which it is close to ${1}$ (by point 1) we already know there will be a sharp transition).

Let’s focus on 1), as similar ideas are used for 2). Concentration of the function ${\rho\mapsto\|\rho\|_{\mathcal{S}}}$ around its median follows from “standard” arguments in concentration of measure, so the interesting part consists in showing the existence of an ${s_0}$ such that, for ${s}$ larger (resp. smaller) than ${s_0}$ the median is much smaller (resp. much larger) than ${1}$. The starting point for this consists in, first, arguing that the distribution of ${\rho - I/n}$ is very well approximated by that of a random matrix ${A}$ taken from the Gaussian Unitary Ensemble (GUE) (entries are i.i.d complex Gaussians, with, in this case, standard deviation ${1/(n\sqrt{s})}$), and second, relating ${\textsc{E}[\|A\|_{\mathcal{S}}]}$ to the mean width ${w(\mathcal{S}_0)}$, where ${\mathcal{S}_0 = \mathcal{S}-\{I/n\}}$. The mean width intuitively measures precisely what its name indicates: the average maximum distance (measured using the distance derived from the Hilbert-Schmidt inner product used to define the polar ${\mathcal{S}^\circ_0}$ of ${\mathcal{S}_0}$, which plays an important role in the remainder of the argument) of elements of ${\mathcal{S}_0}$ from its centroid. In order to estimate this mean width some fairly involved arguments from convex geometry, involving the use of the polar ${\mathcal{S}^\circ_0}$, kick in, but I shall refer you to the talks for more detail.

I found the structure of the argument very elegant because it starts with what, at first, looks (at least to me) like a completely intractable question — when is a random state separable?; uses concentration of measure to reformulate it as what now looks like a straightforward calculation of expectation — compute ${\textsc{E}[\|\rho\|_{\mathcal{S}}]}$; relates this expectation to a geometric parameter of ${\mathcal{S}}$, its mean width, which is fairly intuitive; and finally, uses a sequence of tools from the analysis of convex bodies to estimate the width (note this last step is the only one in which the precise nature of the set of separable states comes into place, the others applying to any “nice enough” convex body). And the result is surprisingly sharp: the fact that the threshold ${s_0}$ for ${s}$ is tight at least up to multiplicative ${1+o(1)}$ factors implies that simply giving one more qubit — just one, out of ${s}$, which can be arbitrarily large — to the environment marks the transition from states that are entangled with high probability, to states that are separable with high probability.

Causal inference: information theory

In his talk David Gross drew a larger picture around the quantum marginal problem by framing it in the context of causal inference. I already mentioned this classical problem: given a restricted set of observations (= some marginal distributions), can one reconstruct links of causation between the variables? Here we may as well assume that the marginals are a priori consistent with a global distribution (in fact they could be consistent with many global distributions); rather the goal is to infer properties of any matching global distribution just from the marginals themselves. Typically, one may desire to construct a directed acyclic graph whose nodes are the variables and edges indicate causation: the distribution of a given node should be independent of its ancestors when one conditions on the parents.

Unfortunately conditional independence imposes complicated (quadratic) constraints on the marginals, leading to a computationally hard inference problem. So instead of thinking of the random variables and their distributions as the unknowns of the problem, David advocated the use of the mutual information between subsets of variables as the fundamental unknown. A major advantage of this approach is that it turns conditional independence into a straightforward equality: ${I(X:Y|Z)=0}$. Moreover, it is easy to relax the condition to ${I(X:Y|Z)\leq \epsilon}$, adding robustness to the scenario while preserving linearity. Note however that, although the inequalities themselves are simple to write down, deciding whether a given set of inequalities is satisfiable or not remains a potentially hard problem, as in order to combine different inequalities to check compatibility one may need to expand each of the mutual information terms to take into account more and more variables, eventually requiring the introduction of all ${2^n}$ subset entropies. This approach is also a relaxation of the original problem, in the sense that not all information from the marginal distributions is captured in the entropies. Nevertheless, David demonstrated that in some interesting cases the marginal entropies suffice to distinguish whether a matching global distribution with some interesting property exists or not (his example was the question of whether a distribution on three variables requires the existence of a common (hidden) “ancestor”, or whether the three variables could have been pairwise generated from three independent ancestors).

Putting issues of causality aside, this approach suggests a similar relaxation of the marginal problem: before asking whether a set of marginals is consistent, one could start by asking the (in general weaker) question of whether the induced set of marginal entropies is consistent. Interestingly, one can apply this idea to the study of entropic versions of the Bell inequalities. Such study was apparently pioneered by Braunstein and Caves, who obtained the following “entropic CHSH inequality” :

$\displaystyle H(A_0B_0)+H(A_0B_1)+H(A_1B_0)-H(A_1B_1) \geq -2.$

(Note the mysterious mirror symmetry with the CHSH inequality itself… see the explanations in David Gross’s paper for explanations on its origin.) An advantage of this approach, explored in David’s paper in the setting of a different Bell inequality, is that it naturally allows one to introduce additional restrictions on the correlations between the random variables; he and co-authors consider e.g. the condition that they are generated using a certain limited amount of shared randomness.

Unfortunately David did not get to discussing the possibility of deriving entropic analogues of Tsirelson inequalities. Presumably these would still quantify the Shannon entropy of the outcome distribution of the result of applying arbitrary observables on an arbitrary state, with the additional restriction that some of the observables must be made on different subsystems. Or one could imagine inequalities involving the von Neumann entropy of the state itself, although I don’t see how the possibility for using different settings would be taken into account (and how a dependence on the dimension of the state would be avoided). Deriving such inequalities may not be an easy task… Indeed, I believe the marginal problem for quantum von Neumann entropies (finding a complete set of inequalities that characterize the cone of ${2^n}$-dimensional vectors of marginal entropies that are mutually consistent) is even less well understood than its classical counterpart.

I only discussed four of the many more talks accessible from the workshop’s online programme — go have a look! The third (and last) workshop taking place as part of the Newton semester will be a workshop on “Quantum Algorithms”, organized by Steve Brierley, Richard Jozsa and Andreas Winter in late November.

This entry was posted in Conferences, Quantum, Talks and tagged , , , , . Bookmark the permalink.

### 5 Responses to Quantum marginals

1. Ashley says:

Hi Thomas,

Thanks for a very nice, detailed and interesting post. A tiny correction to the very end: the Quantum Algorithms workshop isn’t organised by me, but by Steve Brierley (whose surname has two E’s in it :), Richard Jozsa and Andreas Winter.

• Thomas says:

Oops — just wanted to make sure you’d read till the end. Thanks for spotting it; fixed.

2. John Sidles says:

Boswell reports Johnson as saying of free will what might broadly be said also of quantum dynamical simulation:

Johnson: All theory is against the freedom of the will; all experience for it.’

Boswell: I did not push the subject any further. I was glad to find him [Johnson] so mild in discussing a question of the most abstract nature, involving theological tenets which he generally would not suffer to be in any degree opposed.

Similarly All theory is against the efficient simulation of quantum dynamics; all experience for it.’

3. elupus says:

Thank you for bringing this to our attention. I was only familiar with bits and pieces, but you (by virtue of the Newton seminars) establish wonderful connections between what I had thought to be unrelated studies. This renews many interests for me, and I’m grateful to you.