
Binary embedding is a nonlinear dimension reduction methodology where high
dimensional data are embedded into the Hamming cube while preserving the
structure of the original space. Specifically, for an arbitrary $N$ distinct
points in $\mathbb{S}^{p1}$, our goal is to encode each point using
$m$dimensional binary strings such that we can reconstruct their geodesic
distance up to $\delta$ uniform distortion. Existing binary embedding
algorithms either lack theoretical guarantees or suffer from running time
$O\big(mp\big)$. We make three contributions: (1) we establish a lower bound
that shows any binary embedding oblivious to the set of points requires $m =
\Omega(\frac{1}{\delta^2}\log{N})$ bits and a similar lower bound for
nonoblivious embeddings into Hamming distance; (2) [DELETED, see comment]; (3)
we also provide an analytic result about embedding a general set of points $K
\subseteq \mathbb{S}^{p1}$ with even infinite size. Our theoretical findings
are supported through experiments on both synthetic and real data sets.

We study the problem of testing identity against a given distribution with a
focus on the high confidence regime. More precisely, given samples from an
unknown distribution $p$ over $n$ elements, an explicitly given distribution
$q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em
with probability at least $1\delta$}, whether the distributions are identical
versus $\varepsilon$far in total variation distance. Most prior work focused
on the case that $\delta = \Omega(1)$, for which the sample complexity of
identity testing is known to be $\Theta(\sqrt{n}/\epsilon^2)$. Given such an
algorithm, one can achieve arbitrarily small values of $\delta$ via blackbox
amplification, which multiplies the required number of samples by
$\Theta(\log(1/\delta))$.
We show that blackbox amplification is suboptimal for any $\delta = o(1)$,
and give a new identity tester that achieves the optimal sample complexity. Our
new upper and lower bounds show that the optimal sample complexity of identity
testing is \[
\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} +
\log(1/\delta) \right)\right) \] for any $n, \varepsilon$, and $\delta$. For
the special case of uniformity testing, where the given distribution is the
uniform distribution $U_n$ over the domain, our new tester is surprisingly
simple: to test whether $p = U_n$ versus $d_{\mathrm TV}(p, U_n) \geq
\varepsilon$, we simply threshold $d_{\mathrm TV}(\widehat{p}, U_n)$, where
$\widehat{p}$ is the empirical probability distribution. The fact that this
simple "plugin" estimator is sampleoptimal is surprising, even in the
constant $\delta$ case. Indeed, it was believed that such a tester would not
attain sublinear sample complexity even for constant values of $\varepsilon$
and $\delta$.

We consider the problem of active linear regression with $\ell_2$bounded
noise. In this context, the learner receives a set of unlabeled data points,
chooses a small subset to receive the labels for, and must give an estimate of
the function that performs well on fresh samples. We give an algorithm that is
simultaneously optimal in the number of labeled and unlabeled data points, with
$O(d)$ labeled samples; previous work required $\Omega(d \log d)$ labeled
samples regardless of the number of unlabeled samples.
Our results also apply to learning linear functions from noisy queries, again
achieving optimal sample complexities. The techniques extend beyond linear
functions, giving improved sample complexities for learning the family of
$k$Fouriersparse signals with continuous frequencies.

Multicamera fullbody pose capture of humans and animals in outdoor
environments is a highly challenging problem. Our approach to it involves a
team of cooperating micro aerial vehicles (MAVs) with onboard cameras only.
The key enablingaspect of our approach is the onboard person detection and
tracking method. Recent stateoftheart methods based on deep neural networks
(DNN) are highly promising in this context. However, real time DNNs are
severely constrained in input data dimensions, in contrast to available camera
resolutions. Therefore, DNNs often fail at objects with small scale or far away
from the camera, which are typical characteristics of a scenario with aerial
robots. Thus, the core problem addressed in this paper is how to achieve
onboard, realtime, continuous and accurate visionbased detections using DNNs
for visual person tracking through MAVs. Our solution leverages cooperation
among multiple MAVs. First, each MAV fuses its own detections with those
obtained by other MAVs to perform cooperative visual tracking. This allows for
predicting future poses of the tracked person, which are used to selectively
process only the relevant regions of future images, even at high resolutions.
Consequently, using our DNNbased detector we are able to continuously track
even distant humans with high accuracy and speed. We demonstrate the efficiency
of our approach through real robot experiments involving two aerial robots
tracking a person, while maintaining an active perceptiondriven formation. Our
solution runs fully onboard our MAV's CPU and GPU, with no remote processing.
ROSbased source code is provided for the benefit of the community.

We consider the stochastic bandit problem in the sublinear space setting,
where one cannot record the winloss record for all $K$ arms. We give an
algorithm using $O(1)$ words of space with regret \[
\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{\Delta}\log T \] where
$\Delta_i$ is the gap between the best arm and arm $i$ and $\Delta$ is the gap
between the best and the secondbest arms. If the rewards are bounded away from
$0$ and $1$, this is within an $O(\log 1/\Delta)$ factor of the optimum regret
possible without space constraints.

We consider the problem of robust polynomial regression, where one receives
samples $(x_i, y_i)$ that are usually within $\sigma$ of a polynomial $y =
p(x)$, but have a $\rho$ chance of being arbitrary adversarial outliers.
Previously, it was known how to efficiently estimate $p$ only when $\rho <
\frac{1}{\log d}$. We give an algorithm that works for the entire feasible
range of $\rho < 1/2$, while simultaneously improving other parameters of the
problem. We complement our algorithm, which gives a factor 2 approximation,
with impossibility results that show, for example, that a $1.09$ approximation
is impossible even with infinitely many samples.

Sketching has emerged as a powerful technique for speeding up problems in
numerical linear algebra, such as regression. In the overconstrained regression
problem, one is given an $n \times d$ matrix $A$, with $n \gg d$, as well as an
$n \times 1$ vector $b$, and one wants to find a vector $\hat{x}$ so as to
minimize the residual error $\Axb\_2$. Using the sketch and solve paradigm,
one first computes $S \cdot A$ and $S \cdot b$ for a randomly chosen matrix
$S$, then outputs $x' = (SA)^{\dagger} Sb$ so as to minimize $\SAx'  Sb\_2$.
The sketchandsolve paradigm gives a bound on $\x'x^*\_2$ when $A$ is
wellconditioned. Our main result is that, when $S$ is the subsampled
randomized Fourier/Hadamard transform, the error $x'  x^*$ behaves as if it
lies in a "random" direction within this bound: for any fixed direction $a\in
\mathbb{R}^d$, we have with $1  d^{c}$ probability that
\[
\langle a, x'x^*\rangle \lesssim
\frac{\a\_2\x'x^*\_2}{d^{\frac{1}{2}\gamma}}, \quad (1)
\]
where $c, \gamma > 0$ are arbitrary constants.
This implies $\x'x^*\_{\infty}$ is a factor $d^{\frac{1}{2}\gamma}$
smaller than $\x'x^*\_2$. It also gives a better bound on the generalization
of $x'$ to new examples: if rows of $A$ correspond to examples and columns to
features, then our result gives a better bound for the error introduced by
sketchandsolve when classifying fresh examples. We show that not all
oblivious subspace embeddings $S$ satisfy these properties. In particular, we
give counterexamples showing that matrices based on CountSketch or leverage
score sampling do not satisfy these properties.
We also provide lower bounds, both on how small $\x'x^*\_2$ can be, and
for our new guarantee (1), showing that the subsampled randomized
Fourier/Hadamard transform is nearly optimal.

The goal of compressed sensing is to estimate a vector from an
underdetermined system of noisy linear measurements, by making use of prior
knowledge on the structure of vectors in the relevant domain. For almost all
results in this literature, the structure is represented by sparsity in a
wellchosen basis. We show how to achieve guarantees similar to standard
compressed sensing but without employing sparsity at all. Instead, we suppose
that vectors lie near the range of a generative model $G: \mathbb{R}^k \to
\mathbb{R}^n$. Our main theorem is that, if $G$ is $L$Lipschitz, then roughly
$O(k \log L)$ random Gaussian measurements suffice for an $\ell_2/\ell_2$
recovery guarantee. We demonstrate our results using generative models from
published variational autoencoder and generative adversarial networks. Our
method can use $5$$10$x fewer measurements than Lasso for the same accuracy.

We study the problem of estimating the number of triangles in a graph stream.
No streaming algorithm can get sublinear space on all graphs, so methods in
this area bound the space in terms of parameters of the input graph such as the
maximum number of triangles sharing a single edge. We give a sampling algorithm
that is additionally parameterized by the maximum number of triangles sharing a
single vertex. Our bound matches the best known turnstile results in all
graphs, and gets better performance on simple graphs like $G(n, p)$ or a set of
independent triangles.
We complement the upper bound with a lower bound showing that no sampling
algorithm can do better on those graphs by more than a log factor. In
particular, any insertion stream algorithm must use $\sqrt{T}$ space when all
the triangles share a common vertex, and any sampling algorithm must take
$T^\frac{1}{3}$ samples when all the triangles are independent. We add another
lower bound, also matching our algorithm's performance, which applies to all
graph classes. This lower bound covers "triangledependent" sampling
algorithms, a subclass that includes our algorithm and all previous sampling
algorithms for the problem.
Finally, we show how to generalize our algorithm to count arbitrary subgraphs
of constant size.

We study the fundamental problems of (i) uniformity testing of a discrete
distribution, and (ii) closeness testing between two discrete distributions
with bounded $\ell_2$norm. These problems have been extensively studied in
distribution testing and sampleoptimal estimators are known for
them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collisionbased testers proposed for
these problems ~\cite{GRdist:00, BFR+:00} are sampleoptimal, up to constant
factors. Previous analyses showed sample complexity upper bounds for these
testers that are optimal as a function of the domain size $n$, but suboptimal
by polynomial factors in the error parameter $\epsilon$. Our main contribution
is a new tight analysis establishing that these collisionbased testers are
informationtheoretically optimal, up to constant factors, both in the
dependence on $n$ and in the dependence on $\epsilon$.

The Neural GPU is a recent model that can learn algorithms such as
multidigit binary addition and binary multiplication in a way that generalizes
to inputs of arbitrary length. We show that there are two simple ways of
improving the performance of the Neural GPU: by carefully designing a
curriculum, and by increasing model size. The latter requires a memory
efficient implementation, as a naive implementation of the Neural GPU is memory
intensive. We find that these techniques increase the set of algorithmic
problems that can be solved by the Neural GPU: we have been able to learn to
perform all the arithmetic operations (and generalize to arbitrarily long
numbers) when the arguments are given in the decimal representation (which,
surprisingly, has not been possible before). We have also been able to train
the Neural GPU to evaluate long arithmetic expressions with multiple operands
that require respecting the precedence order of the operands, although these
have succeeded only in their binary representation, and not with perfect
accuracy.
In addition, we gain insight into the Neural GPU by investigating its failure
modes. We find that Neural GPUs that correctly generalize to arbitrarily long
numbers still fail to compute the correct answer on highlysymmetric, atypical
inputs: for example, a Neural GPU that achieves nearperfect generalization on
decimal multiplication of up to 100digit long numbers can fail on
$000000\dots002 \times 000000\dots002$ while succeeding at $2 \times 2$. These
failure modes are reminiscent of adversarial examples.

We propose a criterion for discrimination against a specified sensitive
attribute in supervised learning, where the goal is to predict some target
based on available features. Assuming data about the predictor, target, and
membership in the protected group are available, we show how to optimally
adjust any learned predictor so as to remove discrimination according to our
definition. Our framework also improves incentives by shifting the cost of poor
classification from disadvantaged groups to the decision maker, who can respond
by improving the classification accuracy.
In line with other studies, our notion is oblivious: it depends only on the
joint statistics of the predictor, the target and the protected attribute, but
not on interpretation of individualfeatures. We study the inherent limits of
defining and identifying biases based on such oblivious measures, outlining
what can and cannot be inferred from different oblivious tests.
We illustrate our notion using a case study of FICO credit scores.

We consider the problem of estimating a Fouriersparse signal from noisy
samples, where the sampling is done over some interval $[0, T]$ and the
frequencies can be "offgrid". Previous methods for this problem required the
gap between frequencies to be above 1/T, the threshold required to robustly
identify individual frequencies. We show the frequency gap is not necessary to
estimate the signal as a whole: for arbitrary $k$Fouriersparse signals under
$\ell_2$ bounded noise, we show how to estimate the signal with a constant
factor growth of the noise and sample complexity polynomial in $k$ and
logarithmic in the bandwidth and signaltonoise ratio.
As a special case, we get an algorithm to interpolate degree $d$ polynomials
from noisy measurements, using $O(d)$ samples and increasing the noise by a
constant factor in $\ell_2$.

In recent years, a number of works have studied methods for computing the
Fourier transform in sublinear time if the output is sparse. Most of these have
focused on the discrete setting, even though in many applications the input
signal is continuous and naive discretization significantly worsens the
sparsity level.
We present an algorithm for robustly computing sparse Fourier transforms in
the continuous setting. Let $x(t) = x^*(t) + g(t)$, where $x^*$ has a
$k$sparse Fourier transform and $g$ is an arbitrary noise term. Given sample
access to $x(t)$ for some duration $T$, we show how to find a
$k$Fouriersparse reconstruction $x'(t)$ with
$$\frac{1}{T}\int_0^T x'(t)  x(t) ^2 \mathrm{d} t \lesssim
\frac{1}{T}\int_0^T  g(t)^2 \mathrm{d}t.$$
The sample complexity is linear in $k$ and logarithmic in the signaltonoise
ratio and the frequency resolution. Previous results with similar sample
complexities could not tolerate an infinitesimal amount of i.i.d. Gaussian
noise, and even algorithms with higher sample complexities increased the noise
by a polynomial factor. We also give new results for how precisely the
individual frequencies of $x^*$ can be recovered.

We consider the problem of identifying the parameters of an unknown mixture
of two arbitrary $d$dimensional gaussians from a sequence of independent
random samples. Our main results are upper and lower bounds giving a
computationally efficient momentbased estimator with an optimal convergence
rate, thus resolving a problem introduced by Pearson (1894). Denoting by
$\sigma^2$ the variance of the unknown mixture, we prove that
$\Theta(\sigma^{12})$ samples are necessary and sufficient to estimate each
parameter up to constant additive error when $d=1.$ Our upper bound extends to
arbitrary dimension $d>1$ up to a (provably necessary) logarithmic loss in $d$
using a novelyet simpledimensionality reduction technique. We further
identify several interesting special cases where the sample complexity is
notably smaller than our optimal worstcase bound. For instance, if the means
of the two components are separated by $\Omega(\sigma)$ the sample complexity
reduces to $O(\sigma^2)$ and this is again optimal.
Our results also apply to learning each component of the mixture up to small
error in total variation distance, where our algorithm gives strong
improvements in sample complexity over previous work. We also extend our lower
bound to mixtures of $k$ Gaussians, showing that $\Omega(\sigma^{6k2})$
samples are necessary to estimate each parameter up to constant additive error.

We initiate the study of tradeoffs between sparsity and the number of
measurements in sparse recovery schemes for generic norms. Specifically, for a
norm $\\cdot\$, sparsity parameter $k$, approximation factor $K>0$, and
probability of failure $P>0$, we ask: what is the minimal value of $m$ so that
there is a distribution over $m \times n$ matrices $A$ with the property that
for any $x$, given $Ax$, we can recover a $k$sparse approximation to $x$ in
the given norm with probability at least $1P$? We give a partial answer to
this problem, by showing that for norms that admit efficient linear sketches,
the optimal number of measurements $m$ is closely related to the doubling
dimension of the metric induced by the norm $\\cdot\$ on the set of all
$k$sparse vectors. By applying our result to specific norms, we cast known
measurement bounds in our general framework (for the $\ell_p$ norms, $p \in
[1,2]$) as well as provide new, measurementefficient schemes (for the
EarthMover Distance norm). The latter result directly implies more succinct
linear sketches for the wellstudied planar $k$median clustering problem.
Finally, our lower bound for the doubling dimension of the EMD norm enables us
to address the open question of [FrahlingSohler, STOC'05] about the space
complexity of clustering problems in the dynamic streaming model.

We provide a new robust convergence analysis of the wellknown power method
for computing the dominant singular vectors of a matrix that we call the noisy
power method. Our result characterizes the convergence behavior of the
algorithm when a significant amount noise is introduced after each
matrixvector multiplication. The noisy power method can be seen as a
metaalgorithm that has recently found a number of important applications in a
broad range of machine learning problems including alternating minimization for
matrix completion, streaming principal component analysis (PCA), and
privacypreserving spectral analysis. Our general analysis subsumes several
existing adhoc convergence bounds and resolves a number of open problems in
multiple applications including streaming PCA and privacypreserving singular
vector computation.

Given a database, a common problem is to find the pairs or $k$tuples of
items that frequently cooccur. One specific problem is to create a small space
"sketch" of the data that records which $k$tuples appear in more than an
$\epsilon$ fraction of rows of the database.
We improve the lower bound of Liberty, Mitzenmacher, and Thaler [LMT14],
showing that $\Omega(\frac{1}{\epsilon}d \log (\epsilon d))$ bits are necessary
even in the case of $k=2$. This matches the sampling upper bound for all
$\epsilon \geq 1/d^{.99}$, and (in the case of $k=2$) another trivial upper
bound for $\epsilon = 1/d$.

We present a refined analysis of the classic CountSketch streaming heavy
hitters algorithm [CCF02]. CountSketch uses O(k log n) linear measurements of
a vector x in R^n to give an estimate x' of x. The standard analysis shows that
this estimate x' satisfies x'x_infty^2 < x_tail_2^2 / k, where x_tail
is the vector containing all but the largest k coordinates of x. Our main
result is that most of the coordinates of x' have substantially less error than
this upper bound; namely, for any c < O(log n), we show that each coordinate i
satisfies (x'_i  x_i)^2 < (c/log n) x_tail_2^2/k with probability
12^{Omega(c)}, as long as the hash functions are fully independent. This
subsumes the previous bound and is optimal for all c. Using these improved
point estimates, we prove a stronger concentration result for set estimates by
first analyzing the covariance matrix and then using a
medianofmedianofmedians argument to bootstrap the failure probability
bounds. These results also give improved results for l_2 recovery of exactly
ksparse estimates x^* when x is drawn from a distribution with suitable decay,
such as a power law or lognormal. We complement our results with simulations of
CountSketch on a power law distribution. The empirical evidence indicates that
our theoretical bounds give a precise characterization of the algorithm's
performance: the asymptotics are correct and the associated constants are
small. Our proof shows that any symmetric random variable with finite variance
and positive Fourier transform concentrates around 0 at least as well as a
Gaussian. This result, which may be of independent interest, gives good
concentration even when the noise does not converge to a Gaussian.

We present the first sampleoptimal sublinear time algorithms for the sparse
Discrete Fourier Transform over a twodimensional sqrt{n} x sqrt{n} grid. Our
algorithms are analyzed for /average case/ signals. For signals whose spectrum
is exactly sparse, our algorithms use O(k) samples and run in O(k log k) time,
where k is the expected sparsity of the signal. For signals whose spectrum is
approximately sparse, our algorithm uses O(k log n) samples and runs in O(k
log^2 n) time; the latter algorithm works for k=Theta(sqrt{n}). The number of
samples used by our algorithms matches the known lower bounds for the
respective signal models.
By a known reduction, our algorithms give similar results for the
onedimensional sparse Discrete Fourier Transform when n is a power of a small
composite number (e.g., n = 6^t).

In compressed sensing, the "restricted isometry property" (RIP) is a
sufficient condition for the efficient reconstruction of a nearly ksparse
vector x in C^d from m linear measurements Phi x. It is desirable for m to be
small, and for Phi to support fast matrixvector multiplication. In this work,
we give a randomized construction of RIP matrices Phi in C^{m x d}, preserving
the L_2 norms of all ksparse vectors with distortion 1+eps, where the
matrixvector multiply Phi x can be computed in nearly linear time. The number
of rows m is on the order of eps^{2}klog dlog^2(klog d). Previous analyses of
constructions of RIP matrices supporting fast matrixvector multiplies, such as
the sampled discrete Fourier matrix, required m to be larger by roughly a log k
factor.
Supporting fast matrixvector multiplication is useful for iterative recovery
algorithms which repeatedly multiply by Phi or Phi^*. Furthermore, our
construction, together with a connection between RIP matrices and the
JohnsonLindenstrauss lemma in [KrahmerWard, SIAM. J. Math. Anal. 2011],
implies fast JohnsonLindenstrauss embeddings with asymptotically fewer rows
than previously known.
Our approach is a simple twist on previous constructions. Rather than
choosing the rows for the embedding matrix to be rows sampled from some larger
structured matrix (such as the discrete Fourier transform or a random circulant
matrix), we instead choose each row of the embedding matrix to be a linear
combination of a small number of rows of the original matrix, with random sign
flips as coefficients. The main tool in our analysis is a recent bound for the
supremum of certain types of Rademacher chaos processes in
[KrahmerMendelsonRauhut, arXiv:1207.0235].

We give lower bounds for the problem of stable sparse recovery from
/adaptive/ linear measurements. In this problem, one would like to estimate a
vector $x \in \R^n$ from $m$ linear measurements $A_1x,..., A_mx$. One may
choose each vector $A_i$ based on $A_1x,..., A_{i1}x$, and must output $x*$
satisfying x*  x_p \leq (1 + \epsilon) \min_{k\text{sparse} x'} x  x'_p
with probability at least $1\delta>2/3$, for some $p \in \{1,2\}$. For $p=2$,
it was recently shown that this is possible with $m = O(\frac{1}{\epsilon}k
\log \log (n/k))$, while nonadaptively it requires $\Theta(\frac{1}{\epsilon}k
\log (n/k))$. It is also known that even adaptively, it takes $m =
\Omega(k/\epsilon)$ for $p = 2$. For $p = 1$, there is a nonadaptive upper
bound of $\tilde{O}(\frac{1}{\sqrt{\epsilon}} k\log n)$. We show:
* For $p=2$, $m = \Omega(\log \log n)$. This is tight for $k = O(1)$ and
constant $\epsilon$, and shows that the $\log \log n$ dependence is correct.
* If the measurement vectors are chosen in $R$ "rounds", then $m = \Omega(R
\log^{1/R} n)$. For constant $\epsilon$, this matches the previously known
upper bound up to an O(1) factor in $R$.
* For $p=1$, $m = \Omega(k/(\sqrt{\epsilon} \cdot \log k/\epsilon))$. This
shows that adaptivity cannot improve more than logarithmic factors, providing
the analog of the $m = \Omega(k/\epsilon)$ bound for $p = 2$.

We initiate the study of sparse recovery problems under the EarthMover
Distance (EMD). Specifically, we design a distribution over m x n matrices A
such that for any x, given Ax, we can recover a ksparse approximation to x
under the EMD distance. One construction yields m = O(k log(n/k)) and a 1 +
epsilon approximation factor, which matches the best achievable bound for other
error measures, such as the L_1 norm. Our algorithms are obtained by exploiting
novel connections to other problems and areas, such as streaming algorithms for
kmedian clustering and modelbased compressive sensing. We also provide novel
algorithms and results for the latter problems.

We propose a framework for compressive sensing of images with local
distinguishable objects, such as stars, and apply it to solve a problem in
celestial navigation. Specifically, let x be an Npixel realvalued image,
consisting of a small number of local distinguishable objects plus noise. Our
goal is to design an mbyN measurement matrix A with m << N, such that we can
recover an approximation to x from the measurements Ax.
We construct a matrix A and recovery algorithm with the following properties:
(i) if there are k objects, the number of measurements m is O((k log N)/(log
k)), undercutting the best known bound of O(k log(N/k)) (ii) the matrix A is
very sparse, which is important for hardware implementations of compressive
sensing algorithms, and (iii) the recovery algorithm is empirically fast and
runs in time polynomial in k and log(N).
We also present a comprehensive study of the application of our algorithm to
attitude determination, or finding one's orientation in space. Spacecraft
typically use cameras to acquire an image of the sky, and then identify stars
in the image to compute their orientation. Taking pictures is very expensive
for small spacecraft, since camera sensors use a lot of power. Our algorithm
optically compresses the image before it reaches the camera's array of pixels,
reducing the number of sensors that are required.

We consider the problem of computing the ksparse approximation to the
discrete Fourier transform of an ndimensional signal. We show:
* An O(k log n)time randomized algorithm for the case where the input signal
has at most k nonzero Fourier coefficients, and
* An O(k log n log(n/k))time randomized algorithm for general input signals.
Both algorithms achieve o(n log n) time, and thus improve over the Fast
Fourier Transform, for any k = o(n). They are the first known algorithms that
satisfy this property. Also, if one assumes that the Fast Fourier Transform is
optimal, the algorithm for the exactly ksparse case is optimal for any k =
n^{\Omega(1)}.
We complement our algorithmic results by showing that any algorithm for
computing the sparse Fourier transform of a general signal must use at least
\Omega(k log(n/k)/ log log n) signal samples, even if it is allowed to perform
adaptive sampling.