
In this paper, the analytical form of the quasilinear diffusion coefficients
is modified from the KennelEngelmann diffusion coefficients to guarantee the
positive definiteness of its bounce average in a toroidal geometry. By
evaluating the parallel inhomogeneity of plasmas and magnetic fields in the
trajectory integral, we can ensure the positive definiteness and help
illuminate some nonresonant toroidal effects in the quasilinear diffusion.
When the correlation length of the plasmawave interaction is comparable to the
magnetic field variation length, the variation becomes important and the
parabolic variation at the outermidplane, the innermidplane, and trapping
tips can be evaluated by Airy functions. The new form allows the coefficients
to include both resonant and nonresonant contributions, and the correlations
between the consecutive resonances and in many poloidal periods. The
positivedefinite form is implemented in a wave code TORIC and we present an
example for ITER using this form.

We consider the problem of quantum state certification, where one is given
$n$ copies of an unknown $d$dimensional quantum mixed state $\rho$, and one
wants to test whether $\rho$ is equal to some known mixed state $\sigma$ or
else is $\epsilon$far from $\sigma$. The goal is to use notably fewer copies
than the $\Omega(d^2)$ needed for full tomography on $\rho$ (i.e., density
estimation). We give two robust state certification algorithms: one with
respect to fidelity using $n = O(d/\epsilon)$ copies, and one with respect to
trace distance using $n = O(d/\epsilon^2)$ copies. The latter algorithm also
applies when $\sigma$ is unknown as well. These copy complexities are optimal
up to constant factors.

Given samples from an unknown distribution $p$ and a description of a
distribution $q$, are $p$ and $q$ close or far? This question of "identity
testing" has received significant attention in the case of testing whether $p$
and $q$ are equal or far in total variation distance. However, in recent work,
the following questions have been been critical to solving problems at the
frontiers of distribution testing:
Alternative Distances: Can we test whether $p$ and $q$ are far in other
distances, say Hellinger?
Robustness: Can we test when $p$ and $q$ are close, rather than equal? And
if so, close in which distances?
Motivated by these questions, we characterize the complexity of distribution
testing under a variety of distances, including total variation, $\ell_2$,
Hellinger, KullbackLeibler, and chisquared. For each pair of distances $d_1$
and $d_2$, we study the complexity of testing if $p$ and $q$ are close in $d_1$
versus far in $d_2$, with a focus on identifying which problems allow strongly
sublinear testers (i.e., those with complexity $O(n^{1  \gamma})$ for some
$\gamma > 0$ where $n$ is the size of the support of the distributions $p$ and
$q$). We provide matching upper and lower bounds for each case. We also study
these questions in the case where we only have samples from $q$ (equivalence
testing), showing qualitative differences from identity testing in terms of
when robustness can be achieved. Our algorithms fall into the classical
paradigm of chisquared statistics, but require crucial changes to handle the
challenges introduced by each distance we consider. Finally, we survey other
recent results in an attempt to serve as a reference for the complexity of
various distribution testing problems.

Minimization methods that search along a curvilinear path composed of a
nonascent nega tive curvature direction in addition to the direction of
steepest descent, dating back to the late 1970s, have been an effective
approach to finding a stationary point of a function at which its Hessian is
positive semidefinite. For constrained nonlinear programs arising from recent
appli cations, the primary goal is to find a stationary point that satisfies
the secondorder necessary optimality conditions. Motivated by this, we
generalize the approach of using negative curvature directions from
unconstrained optimization to nonlinear ones. We focus on equality constrained
problems and prove that our proposed negative curvature method is guaranteed to
converge to a stationary point satisfying secondorder necessary conditions. A
possible way to extend our proposed negative curvature method to general
nonlinear programs is also briefly discussed.

Recovering matrices from compressive and grossly corrupted observations is a
fundamental problem in robust statistics, with rich applications in computer
vision and machine learning. In theory, under certain conditions, this problem
can be solved in polynomial time via a natural convex relaxation, known as
Compressive Principal Component Pursuit (CPCP). However, all existing provable
algorithms for CPCP suffer from superlinear periteration cost, which severely
limits their applicability to large scale problems. In this paper, we propose
provable, scalable and efficient methods to solve CPCP with (essentially)
linear periteration cost. Our method combines classical ideas from FrankWolfe
and proximal methods. In each iteration, we mainly exploit FrankWolfe to
update the lowrank component with rankone SVD and exploit the proximal step
for the sparse term. Convergence results and implementation details are also
discussed. We demonstrate the scalability of the proposed approach with
promising numerical experiments on visual data.

In this paper, a reduced model of quasilinear diffusion by a small Larmor
radius approximation is derived to couple the Maxwell's equations and the
FokkerPlanck equation selfconsistently for ion cyclotron range of frequency
waves in a tokamak. The reduced model ensures the important properties of the
full model by KennelEngelmann diffusion, such as diffusion directions, wave
polarizations, and Htheorem. The kinetic energy change (Wdot) is used to
derive the reduced model diffusion coefficients for the fundamental damping and
the second harmonic damping to the lowest order of the finite Larmor radius
expansion. The quasilinear diffusion coefficients are implemented in a coupled
code (TORICCQL3D) with the equivalent reduced model of dielectric tensor. We
also present the simulations of the ITER minority heating scenario, in which
the reduced model is verified within the allowable errors from the full model
results.

Can we recover a complex signal from its Fourier magnitudes? More generally,
given a set of $m$ measurements, $y_k = \mathbf a_k^* \mathbf x$ for $k = 1,
\dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e.,
length$n$ complex vector)? This **generalized phase retrieval** (GPR) problem
is a fundamental task in various disciplines, and has been the subject of much
recent investigation. Natural nonconvex heuristics often work remarkably well
for GPR in practice, but lack clear theoretical explanations. In this paper, we
take a step towards bridging this gap. We prove that when the measurement
vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of
measurements is large enough ($m \ge C n \log^3 n$), with high probability, a
natural leastsquares formulation for GPR has the following benign geometric
structure: (1) there are no spurious local minimizers, and all global
minimizers are equal to the target signal $\mathbf x$, up to a global phase;
and (2) the objective function has a negative curvature around each saddle
point. This structure allows a number of iterative optimization methods to
efficiently find a global minimizer, without special initialization. To
corroborate the claim, we describe and analyze a secondorder trustregion
algorithm.

Following [OW16], we continue our analysis of: (1) "Quantum tomography",
i.e., learning a quantum state, i.e., the quantum generalization of learning a
discrete probability distribution; (2) The distribution of Young diagrams
output by the RSK algorithm on random words. Regarding (2), we introduce two
powerful new tools: (i) A precise upper bound on the expected length of the
longest union of $k$ disjoint increasing subsequences in a random length$n$
word with letter distribution $\alpha_1 \geq \alpha_2 \geq \cdots \geq
\alpha_d$; (ii) A new majorization property of the RSK algorithm that allows
one to analyze the Young diagram formed by the lower rows $\lambda_k,
\lambda_{k+1}, \dots$ of its output.
These tools allow us to prove several new theorems concerning the
distribution of random Young diagrams in the nonasymptotic regime, giving
concrete error bounds that are optimal, or nearly so, in all parameters. As one
example, we give a fundamentally new proof of the fact that the expected length
of the longest increasing sequence in a random length$n$ permutation is
bounded by $2\sqrt{n}$. This is the $k = 1$, $\alpha_i \equiv \frac1d$, $d \to
\infty$ special case of a much more general result we prove: the expected
length of the $k$th Young diagram row produced by an $\alpha$random word is
$\alpha_k n \pm 2\sqrt{\alpha_kd n}$.
From our new analyses of random Young diagrams we derive several new results
in quantum tomography, including: (i) Learning the eigenvalues of an unknown
state to $\epsilon$accuracy in Hellingersquared, chisquared, or KL distance,
using $n = O(d^2/\epsilon)$ copies; (ii) Learning the optimal rank$k$
approximation of an unknown state to $\epsilon$fidelity (Hellingersquared
distance) using $n = \widetilde{O}(kd/\epsilon)$ copies.

We consider the problem of recovering a complete (i.e., square and
invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$
with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is
sufficiently sparse. This recovery problem is central to theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per
column, under suitable probability model for $\mathbf X_0$.
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint, and hence is naturally
phrased in the language of manifold optimization. In a companion paper
(arXiv:1511.03607), we have showed that with high probability our nonconvex
formulation has no "spurious" local minimizers and around any saddle point the
objective function has a negative directional curvature. In this paper, we take
advantage of the particular geometric structure, and describe a Riemannian
trust region algorithm that provably converges to a local minimizer with from
arbitrary initializations. Such minimizers give excellent approximations to
rows of $\mathbf X_0$. The rows are then recovered by linear programming
rounding and deflation.

We consider the problem of recovering a complete (i.e., square and
invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$
with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is
sufficiently sparse. This recovery problem is central to theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per
column, under suitable probability model for $\mathbf X_0$. In contrast, prior
results based on efficient algorithms either only guarantee recovery when
$\mathbf X_0$ has $O(\sqrt{n})$ zeros per column, or require multiple rounds of
SDP relaxation to work when $\mathbf X_0$ has $O(n^{1\delta})$ nonzeros per
column (for any constant $\delta \in (0, 1)$). }
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint. In this paper, we provide a
geometric characterization of the objective landscape. In particular, we show
that the problem is highly structured: with high probability, (1) there are no
"spurious" local minimizers; and (2) around all saddle points the objective has
a negative directional curvature. This distinctive structure makes the problem
amenable to efficient optimization algorithms. In a companion paper
(arXiv:1511.04777), we design a secondorder trustregion algorithm over the
sphere that provably converges to a local minimizer from arbitrary
initializations, despite the presence of saddle points.

Is it possible to find the sparsest vector (direction) in a generic subspace
$\mathcal{S} \subseteq \mathbb{R}^p$ with $\mathrm{dim}(\mathcal{S})= n < p$?
This problem can be considered a homogeneous variant of the sparse recovery
problem, and finds connections to sparse dictionary learning, sparse PCA, and
many other problems in signal processing and machine learning. In this paper,
we focus on a **planted sparse model** for the subspace: the target sparse
vector is embedded in an otherwise random subspace. Simple convex heuristics
for this planted recovery problem provably break down when the fraction of
nonzero entries in the target sparse vector substantially exceeds
$O(1/\sqrt{n})$. In contrast, we exhibit a relatively simple nonconvex approach
based on alternating directions, which provably succeeds even when the fraction
of nonzero entries is $\Omega(1)$. To the best of our knowledge, this is the
first practical algorithm to achieve linear scaling under the planted sparse
model. Empirically, our proposed algorithm also succeeds in more challenging
data models, e.g., sparse dictionary learning.

In this note, we focus on smooth nonconvex optimization problems that obey:
(1) all local minimizers are also global; and (2) around any saddle point or
local maximizer, the objective has a negative directional curvature. Concrete
applications such as dictionary learning, generalized phase retrieval, and
orthogonal tensor decomposition are known to induce such structures. We
describe a secondorder trustregion algorithm that provably converges to a
global minimizer efficiently, without special initializations. Finally we
highlight alternatives, and open problems in this direction.

We consider the problem of recovering a complete (i.e., square and
invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb R^{n \times p}$
with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is
sufficiently sparse. This recovery problem is central to the theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals, and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per
column, under suitable probability model for $\mathbf X_0$. In contrast, prior
results based on efficient algorithms provide recovery guarantees when $\mathbf
X_0$ has only $O(n^{1\delta})$ nonzeros per column for any constant $\delta
\in (0, 1)$.
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint, and hence is naturally
phrased in the language of manifold optimization. To show this apparently hard
problem is tractable, we first provide a geometric characterization of the
highdimensional objective landscape, which shows that with high probability
there are no "spurious" local minima. This particular geometric structure
allows us to design a Riemannian trust region algorithm over the sphere that
provably converges to one local minimizer with an arbitrary initialization,
despite the presence of saddle points. The geometric approach we develop here
may also shed light on other problems arising from nonconvex recovery of
structured signals.

In the quantum state tomography problem, one wishes to estimate an unknown
$d$dimensional mixed quantum state $\rho$, given few copies. We show that
$O(d/\epsilon)$ copies suffice to obtain an estimate $\hat{\rho}$ that
satisfies $\\hat{\rho}  \rho\_F^2 \leq \epsilon$ (with high probability). An
immediate consequence is that $O(\mathrm{rank}(\rho) \cdot d/\epsilon^2) \leq
O(d^2/\epsilon^2)$ copies suffice to obtain an $\epsilon$accurate estimate in
the standard trace distance. This improves on the best known prior result of
$O(d^3/\epsilon^2)$ copies for full tomography, and even on the best known
prior result of $O(d^2\log(d/\epsilon)/\epsilon^2)$ copies for spectrum
estimation. Our result is the first to show that nontrivial tomography can be
obtained using a number of copies that is just linear in the dimension.
Next, we generalize these results to show that one can perform efficient
principal component analysis on $\rho$. Our main result is that $O(k
d/\epsilon^2)$ copies suffice to output a rank$k$ approximation $\hat{\rho}$
whose trace distance error is at most $\epsilon$ more than that of the best
rank$k$ approximator to $\rho$. This subsumes our above trace distance
tomography result and generalizes it to the case when $\rho$ is not guaranteed
to be of low rank. A key part of the proof is the analogous generalization of
our spectrumlearning results: we show that the largest $k$ eigenvalues of
$\rho$ can be estimated to tracedistance error $\epsilon$ using
$O(k^2/\epsilon^2)$ copies. In turn, this result relies on a new coupling
theorem concerning the RobinsonSchenstedKnuth algorithm that should be of
independent combinatorial interest.

The kmeans problem consists of finding k centers in the ddimensional
Euclidean space that minimize the sum of the squared distances of all points in
an input set P to their closest respective center. Awasthi et. al. recently
showed that there exists a constant c > 1 such that it is NPhard to
approximate the kmeans objective within a factor of c. We establish that the
constant c is at least 1.0013.

We show that for any odd $k$ and any instance of the MaxkXOR constraint
satisfaction problem, there is an efficient algorithm that finds an assignment
satisfying at least a $\frac{1}{2} + \Omega(1/\sqrt{D})$ fraction of
constraints, where $D$ is a bound on the number of constraints that each
variable occurs in. This improves both qualitatively and quantitatively on the
recent work of Farhi, Goldstone, and Gutmann (2014), which gave a
\emph{quantum} algorithm to find an assignment satisfying a $\frac{1}{2} +
\Omega(D^{3/4})$ fraction of the equations.
For arbitrary constraint satisfaction problems, we give a similar result for
"trianglefree" instances; i.e., an efficient algorithm that finds an
assignment satisfying at least a $\mu + \Omega(1/\sqrt{D})$ fraction of
constraints, where $\mu$ is the fraction that would be satisfied by a uniformly
random assignment.

We propose a batchwise monotone algorithm for dictionary learning. Unlike the
stateoftheart dictionary learning algorithms which impose sparsity
constraints on a samplebysample basis, we instead treat the samples as a
batch, and impose the sparsity constraint on the whole. The benefit of
batchwise optimization is that the nonzeros can be better allocated across the
samples, leading to a better approximation of the whole. To accomplish this, we
propose procedures to switch nonzeros in both rows and columns in the support
of the coefficient matrix to reduce the reconstruction error. We prove in the
proposed support switching procedure the objective of the algorithm, i.e., the
reconstruction error, decreases monotonically and converges. Furthermore, we
introduce a block orthogonal matching pursuit algorithm that also operates on
sample batches to provide a warm start. Experiments on both natural image
patches and UCI data sets show that the proposed algorithm produces a better
approximation with the same sparsity levels compared to the stateoftheart
algorithms.

In this work, we study the problem of testing properties of the spectrum of a
mixed quantum state. Here one is given $n$ copies of a mixed state
$\rho\in\mathbb{C}^{d\times d}$ and the goal is to distinguish whether $\rho$'s
spectrum satisfies some property $\mathcal{P}$ or is at least $\epsilon$far in
$\ell_1$distance from satisfying $\mathcal{P}$. This problem was promoted in
the survey of Montanaro and de Wolf under the name of testing unitarily
invariant properties of mixed states. It is the natural quantum analogue of the
classical problem of testing symmetric properties of probability distributions.
Here, the hope is for algorithms with subquadratic copy complexity in the
dimension $d$. This is because the "empirical Young diagram (EYD) algorithm"
can estimate the spectrum of a mixed state up to $\epsilon$accuracy using only
$\widetilde{O}(d^2/\epsilon^2)$ copies. In this work, we show that given a
mixed state $\rho\in\mathbb{C}^{d\times d}$: (i) $\Theta(d/\epsilon^2)$ copies
are necessary and sufficient to test whether $\rho$ is the maximally mixed
state, i.e., has spectrum $(\frac1d, ..., \frac1d)$; (ii)
$\Theta(r^2/\epsilon)$ copies are necessary and sufficient to test with
onesided error whether $\rho$ has rank $r$, i.e., has at most $r$ nonzero
eigenvalues; (iii) $\widetilde{\Theta}(r^2/\Delta)$ copies are necessary and
sufficient to distinguish whether $\rho$ is maximally mixed on an
$r$dimensional or an $(r+\Delta)$dimensional subspace; and (iv) The EYD
algorithm requires $\Omega(d^2/\epsilon^2)$ copies to estimate the spectrum of
$\rho$ up to $\epsilon$accuracy, nearly matching the known upper bound. In
addition, we simplify part of the proof of the upper bound. Our techniques
involve the asymptotic representation theory of the symmetric group; in
particular Kerov's algebra of polynomial functions on Young diagrams.

We report on progress towards implementing mixed ion species quantum
information processing for a scalable ion trap architecture. Mixed species
chains may help solve several problems with scaling ion trap quantum
computation to large numbers of qubits. Initial temperature measurements of
linear Coulomb crystals containing barium and ytterbium ions indicate that the
mass difference does not significantly impede cooling at low ion numbers.
Average motional occupation numbers are estimated to be $\bar{n} \approx 130$
quanta per mode for chains with small numbers of ions, which is within a factor
of three of the Doppler limit for barium ions in our trap. We also discuss
generation of ionphoton entanglement with barium ions with a fidelity of $F
\ge 0.84$, which is an initial step towards remote ionion coupling in a more
scalable quantum information architecture. Further, we are working to implement
these techniques in surface traps in order to exercise greater control over ion
chain ordering and positioning.

The absolutemagnitude distributions of seven supernova types are presented.
The data used here were primarily taken from the Asiago Supernova Catalogue,
but were supplemented with additional data. We accounted for both foreground
and hostgalaxy extinction. A bootstrap method is used to correct the samples
for Malmquist bias. Separately, we generate volumelimited samples, restricted
to events within 100 Mpc. We find that the superluminous events (M_B < 21)
make up only about 0.1% of all supernovae in the biascorrected sample. The
subluminous events (M_B > 15) make up about 3%. The normal Ia distribution was
the brightest with a mean absolute blue magnitude of 19.25. The IIP
distribution was the dimmest at 16.75.

Motivated by vision tasks such as robust face and object recognition, we
consider the following general problem: given a collection of lowdimensional
linear subspaces in a highdimensional ambient (image) space, and a query point
(image), efficiently determine the nearest subspace to the query in $\ell^1$
distance. In contrast to the naive exhaustive search which entails largescale
linear programs, we show that the computational burden can be cut down
significantly by a simple twostage algorithm: (1) projecting the query and
database subspaces into lowerdimensional space by random Cauchy matrix, and
solving smallscale distance evaluations (linear programs) in the projection
space to locate candidate nearest; (2) with few candidates upon independent
repetition of (1), getting back to the highdimensional space and performing
exhaustive search. To preserve the identity of the nearest subspace with
nontrivial probability, the projection dimension typically is loworder
polynomial of the subspace dimension multiplied by logarithm of number of the
subspaces (Theorem 2.1). The reduced dimensionality and hence complexity
renders the proposed algorithm particularly relevant to vision application such
as robust face and object instance recognition that we investigate empirically.

Building on work of Cai, F\"urer, and Immerman \cite{CFI92}, we show two
hardness results for the Graph Isomorphism problem. First, we show that there
are pairs of nonisomorphic $n$vertex graphs $G$ and $H$ such that any
sumofsquares (SOS) proof of nonisomorphism requires degree $\Omega(n)$. In
other words, we show an $\Omega(n)$round integrality gap for the Lasserre SDP
relaxation. In fact, we show this for pairs $G$ and $H$ which are not even
$(110^{14})$isomorphic. (Here we say that two $n$vertex, $m$edge graphs
$G$ and $H$ are $\alpha$isomorphic if there is a bijection between their
vertices which preserves at least $\alpha m$ edges.) Our second result is that
under the {\sc R3XOR} Hypothesis \cite{Fei02} (and also any of a class of
hypotheses which generalize the {\sc R3XOR} Hypothesis), the \emph{robust}
Graph Isomorphism problem is hard. I.e.\ for every $\epsilon > 0$, there is no
efficient algorithm which can distinguish graph pairs which are
$(1\epsilon)$isomorphic from pairs which are not even
$(1\epsilon_0)$isomorphic for some universal constant $\epsilon_0$. Along the
way we prove a robust asymmetry result for random graphs and hypergraphs which
may be of independent interest.

We have developed a vacuum chamber and control system for rapid testing of
microfabricated surface ion traps. Our system is modular in design and is based
on an invacuum printed circuit board with integrated filters. We have used
this system to successfully trap and cool barium ions and have achieved ion
'dark' lifetimes of 31.6 s + 3.4 s with controlled shuttling of ions. We
provide a detailed description of the ion trap system including the invacuum
materials used, control electronics and neutral atom source. We discuss the
challenges presented in achieving a system which can work reliably over two
years of operations in which the trap under test was changed at least 10 times.

Given $f:\{1, 1\}^n \rightarrow \{1, 1\}$, define the \emph{spectral
distribution} of $f$ to be the distribution on subsets of $[n]$ in which the
set $S$ is sampled with probability $\widehat{f}(S)^2$. Then the Fourier
EntropyInfluence (FEI) conjecture of Friedgut and Kalai (1996) states that
there is some absolute constant $C$ such that $\operatorname{H}[\widehat{f}^2]
\leq C\cdot\operatorname{Inf}[f]$. Here, $\operatorname{H}[\widehat{f}^2]$
denotes the Shannon entropy of $f$'s spectral distribution, and
$\operatorname{Inf}[f]$ is the total influence of $f$. This conjecture is one
of the major open problems in the analysis of Boolean functions, and settling
it would have several interesting consequences.
Previous results on the FEI conjecture have been largely through direct
calculation. In this paper we study a natural interpretation of the conjecture,
which states that there exists a communication protocol which, given subset $S$
of $[n]$ distributed as $\widehat{f}^2$, can communicate the value of $S$ using
at most $C\cdot\operatorname{Inf}[f]$ bits in expectation.
Using this interpretation, we are able show the following results:
1. First, if $f$ is computable by a read$k$ decision tree, then
$\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]$.
2. Next, if $f$ has $\operatorname{Inf}[f] \geq 1$ and is computable by a
decision tree with expected depth $d$, then $\operatorname{H}[\widehat{f}^2]
\leq 12d\cdot \operatorname{Inf}[f]$.
3. Finally, we give a new proof of the main theorem of O'Donnell and Tan
(ICALP 2013), i.e. that their FEI$^+$ conjecture composes.
In addition, we show that natural improvements to our decision tree results
would be sufficient to prove the FEI conjecture in its entirety. We believe
that our methods give more illuminating proofs than previous results about the
FEI conjecture.

In this work, we study the parity complexity measures
${\mathsf{C}^{\oplus}_{\min}}[f]$ and ${\mathsf{DT^{\oplus}}}[f]$.
${\mathsf{C}^{\oplus}_{\min}}[f]$ is the \emph{parity kill number} of $f$, the
fewest number of parities on the input variables one has to fix in order to
"kill" $f$, i.e. to make it constant. ${\mathsf{DT^{\oplus}}}[f]$ is the depth
of the shortest \emph{parity decision tree} which computes $f$. These
complexity measures have in recent years become increasingly important in the
fields of communication complexity \cite{ZS09, MO09, ZS10, TWXZ13} and
pseudorandomness \cite{BK12, Sha11, CT13}.
Our main result is a composition theorem for ${\mathsf{C}^{\oplus}_{\min}}$.
The $k$th power of $f$, denoted $f^{\circ k}$, is the function which results
from composing $f$ with itself $k$ times. We prove that if $f$ is not a parity
function, then ${\mathsf{C}^{\oplus}_{\min}}[f^{\circ k}] \geq
\Omega({\mathsf{C}_{\min}}[f]^{k}).$ In other words, the parity kill number of
$f$ is essentially supermultiplicative in the \emph{normal} kill number of $f$
(also known as the minimum certificate complexity).
As an application of our composition theorem, we show lower bounds on the
parity complexity measures of $\mathsf{Sort}^{\circ k}$ and $\mathsf{HI}^{\circ
k}$. Here $\mathsf{Sort}$ is the sort function due to Ambainis \cite{Amb06},
and $\mathsf{HI}$ is Kushilevitz's hemiicosahedron function \cite{NW95}. In
doing so, we disprove a conjecture of Montanaro and Osborne \cite{MO09} which
had applications to communication complexity and computational learning theory.
In addition, we give new lower bounds for conjectures of \cite{MO09,ZS10} and
\cite{TWXZ13}.