• ### Robust Estimators in High Dimensions without the Computational Intractability(1604.06443)

March 15, 2019 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG, stat.ML
We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question:Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly-linearly with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.
• ### Robust Learning of Fixed-Structure Bayesian Networks(1606.07384)

Oct. 29, 2018 cs.AI, math.ST, stat.TH, cs.DS, cs.LG
We investigate the problem of learning Bayesian networks in a robust model where an $\epsilon$-fraction of the samples are adversarially corrupted. In this work, we study the fully observable discrete case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient robust learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has near-optimal sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples. Finally, we show on both synthetic and semi-synthetic data that our algorithm performs well in practice.
• ### The Orthogonal Vectors Conjecture for Branching Programs and Formulas(1709.05294)

Sept. 15, 2017 cs.DS, cs.CC
In the Orthogonal Vectors (OV) problem, we wish to determine if there is an orthogonal pair of vectors among $n$ Boolean vectors in $d$ dimensions. The OV Conjecture (OVC) posits that OV requires $n^{2-o(1)}$ time to solve, for all $d=\omega(\log n)$. Assuming the OVC, optimal time lower bounds have been proved for many prominent problems in $P$. We prove that OVC is true in several computational models of interest: * For all sufficiently large $n$ and $d$, OV for $n$ vectors in $\{0,1\}^d$ has branching program complexity $\tilde{\Theta}(n\cdot \min(n,2^d))$. In particular, the lower bounds match the upper bounds up to polylog factors. * OV has Boolean formula complexity $\tilde{\Theta}(n\cdot \min(n,2^d))$, over all complete bases of $O(1)$ fan-in. * OV requires $\tilde{\Theta}(n\cdot \min(n,2^d))$ wires, in formulas comprised of gates computing arbitrary symmetric functions of unbounded fan-in. Our lower bounds basically match the best known (quadratic) lower bounds for any explicit function in those models. Analogous lower bounds hold for many related problems shown to be hard under OVC, such as Batch Partial Match, Batch Subset Queries, and Batch Hamming Nearest Neighbors, all of which have very succinct reductions to OV. The proofs use a certain kind of input restriction that is different from typical random restrictions where variables are assigned independently. We give a sense in which independent random restrictions cannot be used to show hardness, in that OVC is false in the "average case" even for $AC^0$ formulas: * For every fixed $p \in (0,1)$ there is an $\epsilon_p > 0$ such that for every $n$ and $d$, OV instances where input bits are independently set to $1$ with probability $p$ (and $0$ otherwise) can be solved with $AC^0$ formulas of size $O(n^{2-\epsilon_p})$, on all but a $o_n(1)$ fraction of instances.
• ### Robust polynomial regression up to the information theoretic limit(1708.03257)

Aug. 10, 2017 cs.DS, cs.LG
We consider the problem of robust polynomial regression, where one receives samples $(x_i, y_i)$ that are usually within $\sigma$ of a polynomial $y = p(x)$, but have a $\rho$ chance of being arbitrary adversarial outliers. Previously, it was known how to efficiently estimate $p$ only when $\rho < \frac{1}{\log d}$. We give an algorithm that works for the entire feasible range of $\rho < 1/2$, while simultaneously improving other parameters of the problem. We complement our algorithm, which gives a factor 2 approximation, with impossibility results that show, for example, that a $1.09$ approximation is impossible even with infinitely many samples.
• ### A bound on partitioning clusters(1702.00912)

May 23, 2017 math.CO
Let $X$ be a finite collection of sets (or "clusters"). We consider the problem of counting the number of ways a cluster $A \in X$ can be partitioned into two disjoint clusters $A_1, A_2 \in X$, thus $A = A_1 \uplus A_2$ is the disjoint union of $A_1$ and $A_2$; this problem arises in the run time analysis of the ASTRAL algorithm in phylogenetic reconstruction. We obtain the bound $$| \{ (A_1,A_2,A) \in X \times X \times X: A = A_1 \uplus A_2 \} | \leq |X|^{3/p}$$ where $|X|$ denotes the cardinality of $X$, and $p := \log_3 \frac{27}{4} = 1.73814\dots$, so that $\frac{3}{p} = 1.72598\dots$. Furthermore, the exponent $p$ cannot be replaced by any larger quantity. This improves upon the trivial bound of $|X|^2$. The argument relies on establishing a one-dimensional convolution inequality that can be established by elementary calculus combined with some numerical verification. In a similar vein, we show that for any subset $A$ of a discrete cube $\{0,1\}^n$, the additive energy of $A$ (the number of quadruples $(a_1,a_2,a_3,a_4)$ in $A^4$ with $a_1+a_2=a_3+a_4$) is at most $|A|^{\log_2 6}$, and that this exponent is best possible.
• ### The independence number of the Birkhoff polytope graph, and applications to maximally recoverable codes(1702.05773)

March 31, 2017 math.CO, cs.DM
Maximally recoverable codes are codes designed for distributed storage which combine quick recovery from single node failure and optimal recovery from catastrophic failure. Gopalan et al [SODA 2017] studied the alphabet size needed for such codes in grid topologies and gave a combinatorial characterization for it. Consider a labeling of the edges of the complete bipartite graph $K_{n,n}$ with labels coming from $F_2^d$ , that satisfies the following condition: for any simple cycle, the sum of the labels over its edges is nonzero. The minimal d where this is possible controls the alphabet size needed for maximally recoverable codes in n x n grid topologies. Prior to the current work, it was known that d is between $(\log n)^2$ and $n\log n$. We improve both bounds and show that d is linear in n. The upper bound is a recursive construction which beats the random construction. The lower bound follows by first relating the problem to the independence number of the Birkhoff polytope graph, and then providing tight bounds for it using the representation theory of the symmetric group.
• ### An example concerning set addition in F_2^n(1703.01036)

March 3, 2017 math.CO, math.NT
We construct sets $A, B$ in a vector space over $\mathbb{F}_2$ with the property that $A$ is "statistically" almost closed under addition by $B$ in the sense that $a + b$ almost always lies in $A$ when $a \in A, b \in B$, but which is extremely far from being "combinatorially" almost closed under addition by $B$: if $A' \subset A$, $B' \subset B$ and $A' + B'$ is comparable in size to $A'$ then $|B'| \lessapprox |B|^{1/2}$.
• ### On the Joint Distribution Of $\mathrm{Sel}_\phi(E/\mathbb{Q})$ and $\mathrm{Sel}_{\hat\phi}(E^\prime/\mathbb{Q})$ in Quadratic Twist Families(1702.02687)

Feb. 9, 2017 math.NT
If $E$ is an elliptic curve with a point of order two, then work of Klagsbrun and Lemke Oliver shows that the distribution of $\dim_{\mathbb{F}_2}\mathrm{Sel}_\phi(E^d/\mathbb{Q}) - \dim_{\mathbb{F}_2} \mathrm{Sel}_{\hat\phi}(E^{\prime d}/\mathbb{Q})$ within the quadratic twist family tends to the discrete normal distribution $\mathcal{N}(0,\frac{1}{2} \log \log X)$ as $X \rightarrow \infty$. We consider the distribution of $\mathrm{dim}_{\mathbb{F}_2} \mathrm{Sel}_\phi(E^d/\mathbb{Q})$ within such a quadratic twist family when $\dim_{\mathbb{F}_2} \mathrm{Sel}_\phi(E^d/\mathbb{Q}) - \dim_{\mathbb{F}_2} \mathrm{Sel}_{\hat\phi}(E^{\prime d}/\mathbb{Q})$ has a fixed value $u$. Specifically, we show that for every $r$, the limiting probability that $\dim_{\mathbb{F}_2} \mathrm{Sel}_\phi(E^d/\mathbb{Q}) = r$ is given by an explicit constant $\alpha_{r,u}$. The constants $\alpha_{r,u}$ are closely related to the $u$-probabilities introduced in Cohen and Lenstra's work on the distribution of class groups, and thus provide a connection between the distribution of Selmer groups of elliptic curves and random abelian groups. Our analysis of this problem has two steps. The first step uses algebraic and combinatorial methods to directly relate the ranks of the Selmer groups in question to the dimensions of the kernels of random $\mathbb{F}_2$-matrices. This proves that the density of twists with a given $\phi$-Selmer rank $r$ is given by $\alpha_{r,u}$ for an unusual notion of density. The second step of the analysis utilizes techniques from analytic number theory to show that this result implies the correct asymptotics in terms of the natural notion of density.
• ### Testing Bayesian Networks(1612.03156)

Dec. 9, 2016 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
This work initiates a systematic investigation of testing {\em high-dimensional} structured distributions by focusing on testing {\em Bayesian networks} -- the prototypical family of directed graphical models. A Bayesian network is defined by a directed acyclic graph, where we associate a random variable with each node. The value at any particular node is conditionally independent of all the other non-descendant nodes once its parents are fixed. Specifically, we study the properties of identity testing and closeness testing of Bayesian networks. Our main contribution is the first non-trivial efficient testing algorithms for these problems and corresponding information-theoretic lower bounds. For a wide range of parameter settings, our testing algorithms have sample complexity {\em sublinear} in the dimension and are sample-optimal, up to constant factors.
• ### Pseudorandomness via the discrete Fourier transform(1506.04350)

Nov. 18, 2015 cs.CC
We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in logspace that have been considered in the literature, including halfspaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL - a central question in complexity theory. Our construction combines ideas from a large body of prior work, ranging from a classical construction of [NN93] to the recent gradually increasing independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some novel analytic machinery which might find other applications.
• ### Pseudorandomness for concentration bounds and signed majorities(1411.4584)

Nov. 17, 2014 cs.CC
The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL. In this work we make progress towards this goal by obtaining near-optimal generators for two important special cases: 1) We give a near optimal derandomization of the Chernoff bound for independent, uniformly random bits. Specifically, we show how to generate a x in {1,-1}^n using $\tilde{O}(\log (n/\epsilon))$ random bits such that for any unit vector u, <u,x> matches the sub-Gaussian tail behaviour predicted by the Chernoff bound up to error eps. 2) We construct a generator which fools halfspaces with {0,1,-1} coefficients with error eps with a seed-length of $\tilde{O}(\log(n/\epsilon))$. This includes the important special case of majorities. In both cases, the best previous results required seed-length of $O(\log n + \log^2(1/\epsilon))$. Technically, our work combines new Fourier-analytic tools with the iterative dimension reduction techniques and the gradually increasing independence paradigm of previous works (KaneMN11, CelisRSW13, GopalanMRTV12).
• ### A PRG for Lipschitz Functions of Polynomials with Applications to Sparsest Cut(1211.1109)

Nov. 6, 2012 math.FA, cs.DS, cs.CC
We give improved pseudorandom generators (PRGs) for Lipschitz functions of low-degree polynomials over the hypercube. These are functions of the form psi(P(x)), where P is a low-degree polynomial and psi is a function with small Lipschitz constant. PRGs for smooth functions of low-degree polynomials have received a lot of attention recently and play an important role in constructing PRGs for the natural class of polynomial threshold functions. In spite of the recent progress, no nontrivial PRGs were known for fooling Lipschitz functions of degree O(log n) polynomials even for constant error rate. In this work, we give the first such generator obtaining a seed-length of (log n)\tilde{O}(d^2/eps^2) for fooling degree d polynomials with error eps. Previous generators had an exponential dependence on the degree. We use our PRG to get better integrality gap instances for sparsest cut, a fundamental problem in graph theory with many applications in graph optimization. We give an instance of uniform sparsest cut for which a powerful semi-definite relaxation (SDP) first introduced by Goemans and Linial and studied in the seminal work of Arora, Rao and Vazirani has an integrality gap of exp(\Omega((log log n)^{1/2})). Understanding the performance of the Goemans-Linial SDP for uniform sparsest cut is an important open problem in approximation algorithms and metric embeddings and our work gives a near-exponential improvement over previous lower bounds which achieved a gap of \Omega(log log n).
• ### A Pseudopolynomial Algorithm for Alexandrov's Theorem(0812.5030)

Jan. 4, 2010 cs.CG
Alexandrov's Theorem states that every metric with the global topology and local geometry required of a convex polyhedron is in fact the intrinsic metric of a unique convex polyhedron. Recent work by Bobenko and Izmestiev describes a differential equation whose solution leads to the polyhedron corresponding to a given metric. We describe an algorithm based on this differential equation to compute the polyhedron to arbitrary precision given the metric, and prove a pseudopolynomial bound on its running time. Along the way, we develop pseudopolynomial algorithms for computing shortest paths and weighted Delaunay triangulations on a polyhedral surface, even when the surface edges are not shortest paths.
• ### New results on the least common multiple of consecutive integers(0808.1507)

Aug. 11, 2008 math.NT
When studying the least common multiple of some finite sequences of integers, the first author introduced the interesting arithmetic functions $g_k$ $(k \in \mathbb{N})$, defined by $g_k(n) := \frac{n (n + 1) ... (n + k)}{\lcm(n, n + 1, >..., n + k)}$ $(\forall n \in \mathbb{N} \setminus \{0\})$. He proved that $g_k$ $(k \in \mathbb{N})$ is periodic and $k!$ is a period of $g_k$. He raised the open problem consisting to determine the smallest positive period $P_k$ of $g_k$. Very recently, S. Hong and Y. Yang have improved the period $k!$ of $g_k$ to $\lcm(1, 2, ..., k)$. In addition, they have conjectured that $P_k$ is always a multiple of the positive integer $\frac{\lcm(1, 2, >..., k, k + 1)}{k + 1}$. An immediate consequence of this conjecture states that if $(k + 1)$ is prime then the exact period of $g_k$ is precisely equal to $\lcm(1, 2, ..., k)$. In this paper, we first prove the conjecture of S. Hong and Y. Yang and then we give the exact value of $P_k$ $(k \in \mathbb{N})$. We deduce, as a corollary, that $P_k$ is equal to the part of $\lcm(1, 2, ..., k)$ not divisible by some prime.