
We study the problem of testing identity against a given distribution with a
focus on the high confidence regime. More precisely, given samples from an
unknown distribution $p$ over $n$ elements, an explicitly given distribution
$q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em
with probability at least $1\delta$}, whether the distributions are identical
versus $\varepsilon$far in total variation distance. Most prior work focused
on the case that $\delta = \Omega(1)$, for which the sample complexity of
identity testing is known to be $\Theta(\sqrt{n}/\epsilon^2)$. Given such an
algorithm, one can achieve arbitrarily small values of $\delta$ via blackbox
amplification, which multiplies the required number of samples by
$\Theta(\log(1/\delta))$.
We show that blackbox amplification is suboptimal for any $\delta = o(1)$,
and give a new identity tester that achieves the optimal sample complexity. Our
new upper and lower bounds show that the optimal sample complexity of identity
testing is \[
\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} +
\log(1/\delta) \right)\right) \] for any $n, \varepsilon$, and $\delta$. For
the special case of uniformity testing, where the given distribution is the
uniform distribution $U_n$ over the domain, our new tester is surprisingly
simple: to test whether $p = U_n$ versus $d_{\mathrm TV}(p, U_n) \geq
\varepsilon$, we simply threshold $d_{\mathrm TV}(\widehat{p}, U_n)$, where
$\widehat{p}$ is the empirical probability distribution. The fact that this
simple "plugin" estimator is sampleoptimal is surprising, even in the
constant $\delta$ case. Indeed, it was believed that such a tester would not
attain sublinear sample complexity even for constant values of $\varepsilon$
and $\delta$.

While Generative Adversarial Networks (GANs) have demonstrated promising
performance on multiple vision tasks, their learning dynamics are not yet well
understood, both in theory and in practice. To address this issue, we study GAN
dynamics in a simple yet rich parametric model that exhibits several of the
common problematic convergence behaviors such as vanishing gradients, mode
collapse, and diverging or oscillatory behavior. In spite of the nonconvex
nature of our model, we are able to perform a rigorous theoretical analysis of
its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN
with an optimal discriminator provably converges, while first order
approximations of the discriminator steps lead to unstable GAN dynamics and
mode collapse. Our result suggests that using first order discriminator steps
(the defacto standard in most existing GAN setups) might be one of the factors
that makes GAN training challenging in practice.

We investigate the problem of identity testing for multidimensional histogram
distributions. A distribution $p: D \rightarrow \mathbb{R}_+$, where $D
\subseteq \mathbb{R}^d$, is called a {$k$histogram} if there exists a
partition of the domain into $k$ axisaligned rectangles such that $p$ is
constant within each such rectangle. Histograms are one of the most fundamental
nonparametric families of distributions and have been extensively studied in
computer science and statistics. We give the first identity tester for this
problem with {\em sublearning} sample complexity in any fixed dimension and a
nearlymatching sample complexity lower bound.
More specifically, let $q$ be an unknown $d$dimensional $k$histogram and
$p$ be an explicitly given $k$histogram. We want to correctly distinguish,
with probability at least $2/3$, between the case that $p = q$ versus
$\pq\_1 \geq \epsilon$. We design a computationally efficient algorithm for
this hypothesis testing problem with sample complexity $O((\sqrt{k}/\epsilon^2)
\log^{O(d)}(k/\epsilon))$. Our algorithm is robust to model misspecification,
i.e., succeeds even if $q$ is only promised to be {\em close} to a
$k$histogram. Moreover, for $k = 2^{\Omega(d)}$, we show a nearlymatching
sample complexity lower bound of $\Omega((\sqrt{k}/\epsilon^2)
(\log(k/\epsilon)/d)^{\Omega(d)})$ when $d\geq 2$.
Prior to our work, the sample complexity of the $d=1$ case was
wellunderstood, but no algorithm with sublearning sample complexity was
known, even for $d=2$. Our new upper and lower bounds have interesting
conceptual implications regarding the relation between learning and testing in
this setting.

We present an algorithm that, with high probability, generates a random
spanning tree from an edgeweighted undirected graph in
$\tilde{O}(n^{4/3}m^{1/2}+n^{2})$ time (The $\tilde{O}(\cdot)$ notation hides
$\operatorname{polylog}(n)$ factors). The tree is sampled from a distribution
where the probability of each tree is proportional to the product of its edge
weights. This improves upon the previous best algorithm due to Colbourn et al.
that runs in matrix multiplication time, $O(n^\omega)$. For the special case of
unweighted graphs, this improves upon the best previously known running time of
$\tilde{O}(\min\{n^{\omega},m\sqrt{n},m^{4/3}\})$ for $m \gg n^{5/3}$ (Colbourn
et al. '96, KelnerMadry '09, Madry et al. '15).
The effective resistance metric is essential to our algorithm, as in the work
of Madry et al., but we eschew determinantbased and random walkbased
techniques used by previous algorithms. Instead, our algorithm is based on
Gaussian elimination, and the fact that effective resistance is preserved in
the graph resulting from eliminating a subset of vertices (called a Schur
complement). As part of our algorithm, we show how to compute
$\epsilon$approximate effective resistances for a set $S$ of vertex pairs via
approximate Schur complements in $\tilde{O}(m+(n + S)\epsilon^{2})$ time,
without using the JohnsonLindenstrauss lemma which requires $\tilde{O}(
\min\{(m + S)\epsilon^{2}, m+n\epsilon^{4} +S\epsilon^{2}\})$ time. We
combine this approximation procedure with an error correction procedure for
handing edges where our estimate isn't sufficiently accurate.

We show variants of spectral sparsification routines can preserve the total
spanning tree counts of graphs, which by Kirchhoff's matrixtree theorem, is
equivalent to determinant of a graph Laplacian minor, or equivalently, of any
SDDM matrix. Our analyses utilizes this combinatorial connection to bridge
between statistical leverage scores / effective resistances and the analysis of
random graphs by [Janson, Combinatorics, Probability and Computing `94]. This
leads to a routine that in quadratic time, sparsifies a graph down to about
$n^{1.5}$ edges in ways that preserve both the determinant and the distribution
of spanning trees (provided the sparsified graph is viewed as a random object).
Extending this algorithm to work with Schur complements and approximate
Choleksy factorizations leads to algorithms for counting and sampling spanning
trees which are nearly optimal for dense graphs.
We give an algorithm that computes a $(1 \pm \delta)$ approximation to the
determinant of any SDDM matrix with constant probability in about $n^2
\delta^{2}$ time. This is the first routine for graphs that outperforms
generalpurpose routines for computing determinants of arbitrary matrices. We
also give an algorithm that generates in about $n^2 \delta^{2}$ time a
spanning tree of a weighted undirected graph from a distribution with total
variation distance of $\delta$ from the $w$uniform distribution .

We study the fundamental problems of (i) uniformity testing of a discrete
distribution, and (ii) closeness testing between two discrete distributions
with bounded $\ell_2$norm. These problems have been extensively studied in
distribution testing and sampleoptimal estimators are known for
them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collisionbased testers proposed for
these problems ~\cite{GRdist:00, BFR+:00} are sampleoptimal, up to constant
factors. Previous analyses showed sample complexity upper bounds for these
testers that are optimal as a function of the domain size $n$, but suboptimal
by polynomial factors in the error parameter $\epsilon$. Our main contribution
is a new tight analysis establishing that these collisionbased testers are
informationtheoretically optimal, up to constant factors, both in the
dependence on $n$ and in the dependence on $\epsilon$.

In this paper we introduce a notion of spectral approximation for directed
graphs. While there are many potential ways one might define approximation for
directed graphs, most of them are too strong to allow sparse approximations in
general. In contrast, we prove that for our notion of approximation, such
sparsifiers do exist, and we show how to compute them in almost linear time.
Using this notion of approximation, we provide a general framework for
solving asymmetric linear systems that is broadly inspired by the work of
[PengSpielman, STOC`14]. Applying this framework in conjunction with our
sparsification algorithm, we obtain an almost linear time algorithm for solving
directed Laplacian systems associated with Eulerian Graphs. Using this solver
in the recent framework of [CohenKelnerPeeblesPengSidfordVladu, FOCS`16],
we obtain almost linear time algorithms for solving a directed Laplacian linear
system, computing the stationary distribution of a Markov chain, computing
expected commute times in a directed graph, and more.
For each of these problems, our algorithms improves the previous best running
times of $O((nm^{3/4} + n^{2/3} m) \log^{O(1)} (n \kappa \epsilon^{1}))$ to
$O((m + n2^{O(\sqrt{\log{n}\log\log{n}})}) \log^{O(1)} (n \kappa
\epsilon^{1}))$ where $n$ is the number of vertices in the graph, $m$ is the
number of edges, $\kappa$ is a natural condition number associated with the
problem, and $\epsilon$ is the desired accuracy. We hope these results open the
door for further studies into directed spectral graph theory, and will serve as
a stepping stone for designing a new generation of fast algorithms for directed
graphs.

In this paper, we provide faster algorithms for computing various fundamental
quantities associated with random walks on a directed graph, including the
stationary distribution, personalized PageRank vectors, hitting times, and
escape probabilities. In particular, on a directed graph with $n$ vertices and
$m$ edges, we show how to compute each quantity in time
$\tilde{O}(m^{3/4}n+mn^{2/3})$, where the $\tilde{O}$ notation suppresses
polylogarithmic factors in $n$, the desired accuracy, and the appropriate
condition number (i.e. the mixing time or restart probability).
Our result improves upon the previous fastest running times for these
problems; previous results either invoke a general purpose linear system solver
on a $n\times n$ matrix with $m$ nonzero entries, or depend polynomially on
the desired error or natural condition number associated with the problem (i.e.
the mixing time or restart probability). For sparse graphs, we obtain a running
time of $\tilde{O}(n^{7/4})$, breaking the $O(n^{2})$ barrier of the best
running time one could hope to achieve using fast matrix multiplication.
We achieve our result by providing a similar running time improvement for
solving directed Laplacian systems, a natural directed or asymmetric analog of
the well studied symmetric or undirected Laplacian systems. We show how to
solve such systems in time $\tilde{O}(m^{3/4}n+mn^{2/3})$, and efficiently
reduce a broad range of problems to solving $\tilde{O}(1)$ directed Laplacian
systems on Eulerian graphs. We hope these results and our analysis open the
door for further study into directed spectral graph theory.

We study the problem of estimating the value of sums of the form $S_p
\triangleq \sum \binom{x_i}{p}$ when one has the ability to sample $x_i \geq 0$
with probability proportional to its magnitude. When $p=2$, this problem is
equivalent to estimating the selectivity of a selfjoin query in database
systems when one can sample rows randomly. We also study the special case when
$\{x_i\}$ is the degree sequence of a graph, which corresponds to counting the
number of $p$stars in a graph when one has the ability to sample edges
randomly.
Our algorithm for a $(1 \pm \varepsilon)$multiplicative approximation of
$S_p$ has query and time complexities $\O(\frac{m \log \log n}{\epsilon^2
S_p^{1/p}})$. Here, $m=\sum x_i/2$ is the number of edges in the graph, or
equivalently, half the number of records in the database table. Similarly, $n$
is the number of vertices in the graph and the number of unique values in the
database table. We also provide tight lower bounds (up to polylogarithmic
factors) in almost all cases, even when $\{x_i\}$ is a degree sequence and one
is allowed to use the structure of the graph to try to get a better estimate.
We are not aware of any prior lower bounds on the problem of join selectivity
estimation.
For the graph problem, prior work which assumed the ability to sample only
\emph{vertices} uniformly gave algorithms with matching lower bounds [Gonen,
Ron, and Shavitt. \textit{SIAM J. Comput.}, 25 (2011), pp. 13651411]. With the
ability to sample edges randomly, we show that one can achieve faster
algorithms for approximating the number of star subgraphs, bypassing the lower
bounds in this prior work. For example, in the regime where $S_p\leq n$, and
$p=2$, our upper bound is $\tilde{O}(n/S_p^{1/2})$, in contrast to their
$\Omega(n/S_p^{1/3})$ lower bound when no random edge queries are available.

A Fibonacci heap is a deterministic data structure implementing a priority
queue with optimal amortized operation costs. An unfortunate aspect of
Fibonacci heaps is that they must maintain a "mark bit" which serves only to
ensure efficiency of heap operations, not correctness. Karger proposed a simple
randomized variant of Fibonacci heaps in which mark bits are replaced by coin
flips. This variant still has expected amortized cost $O(1)$ for insert,
decreasekey, and merge. Karger conjectured that this data structure has
expected amortized cost $O(\log s)$ for deletemin, where $s$ is the number of
heap operations.
We give a tight analysis of Karger's randomized Fibonacci heaps, resolving
Karger's conjecture. Specifically, we obtain matching upper and lower bounds of
$\Theta(\log^2 s / \log \log s)$ for the runtime of deletemin. We also prove a
tight lower bound of $\Omega(\sqrt{n})$ on deletemin in terms of the number of
heap elements $n$. The request sequence used to prove this bound also solves an
open problem of Fredman on whether cascading cuts are necessary. Finally, we
give a simple additional modification to these heaps which yields a tight
runtime $O(\log^2 n / \log \log n)$ for deletemin.