
We study highdimensional distribution learning in an agnostic setting where
an adversary is allowed to arbitrarily corrupt an $\varepsilon$fraction of the
samples. Such questions have a rich history spanning statistics, machine
learning and theoretical computer science. Even in the most basic settings, the
only known approaches are either computationally inefficient or lose
dimensiondependent factors in their error guarantees. This raises the
following question:Is highdimensional agnostic distribution learning even
possible, algorithmically?
In this work, we obtain the first computationally efficient algorithms with
dimensionindependent error guarantees for agnostically learning several
fundamental classes of highdimensional distributions: (1) a single Gaussian,
(2) a product distribution on the hypercube, (3) mixtures of two product
distributions (under a natural balancedness condition), and (4) mixtures of
spherical Gaussians. Our algorithms achieve error that is independent of the
dimension, and in many cases scales nearlylinearly with the fraction of
adversarially corrupted samples. Moreover, we develop a general recipe for
detecting and correcting corruptions in highdimensions, that may be applicable
to many other problems.

We investigate the problem of learning Bayesian networks in a robust model
where an $\epsilon$fraction of the samples are adversarially corrupted. In
this work, we study the fully observable discrete case where the structure of
the network is given. Even in this basic setting, previous learning algorithms
either run in exponential time or lose dimensiondependent factors in their
error guarantees. We provide the first computationally efficient robust
learning algorithm for this problem with dimensionindependent error
guarantees. Our algorithm has nearoptimal sample complexity, runs in
polynomial time, and achieves error that scales nearlylinearly with the
fraction of adversarially corrupted samples. Finally, we show on both synthetic
and semisynthetic data that our algorithm performs well in practice.

Robust estimation is much more challenging in high dimensions than it is in
one dimension: Most techniques either lead to intractable optimization problems
or estimators that can tolerate only a tiny fraction of errors. Recent work in
theoretical computer science has shown that, in appropriate distributional
models, it is possible to robustly estimate the mean and covariance with
polynomial time algorithms that can tolerate a constant fraction of
corruptions, independent of the dimension. However, the sample and time
complexity of these algorithms is prohibitively large for highdimensional
applications. In this work, we address both of these issues by establishing
sample complexity bounds that are optimal, up to logarithmic factors, as well
as giving various refinements that allow the algorithms to tolerate a much
larger fraction of corruptions. Finally, we show on both synthetic and real
data that our algorithms have stateoftheart performance and suddenly make
highdimensional robust estimation a realistic possibility.

In high dimensions, most machine learning methods are brittle to even a small
fraction of structured outliers. To address this, we introduce a new
metaalgorithm that can take in a base learner such as least squares or
stochastic gradient descent, and harden the learner to be resistant to
outliers. Our method, Sever, possesses strong theoretical guarantees yet is
also highly scalable  beyond running the base learner itself, it only
requires computing the top singular vector of a certain $n \times d$ matrix. We
apply Sever on a drug design dataset and a spam classification dataset, and
find that in both cases it has substantially greater robustness than several
baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$
test error, compared to $13.4\%20.5\%$ for the baselines, and $3\%$ error on
the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$
corruptions, we achieved $1.42$ meansquared error test error, compared to
$1.51$$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.

We study the problem of learning multivariate logconcave densities with
respect to a global loss function. We obtain the first upper bound on the
sample complexity of the maximum likelihood estimator (MLE) for a logconcave
density on $\mathbb{R}^d$, for all $d \geq 4$. Prior to this work, no finite
sample upper bound was known for this estimator in more than $3$ dimensions.
In more detail, we prove that for any $d \geq 1$ and $\epsilon>0$, given
$\tilde{O}_d((1/\epsilon)^{(d+3)/2})$ samples drawn from an unknown logconcave
density $f_0$ on $\mathbb{R}^d$, the MLE outputs a hypothesis $h$ that with
high probability is $\epsilon$close to $f_0$, in squared Hellinger loss. A
sample complexity lower bound of $\Omega_d((1/\epsilon)^{(d+1)/2})$ was
previously known for any learning algorithm that achieves this guarantee. We
thus establish that the sample complexity of the logconcave MLE is
nearoptimal, up to an $\tilde{O}(1/\epsilon)$ factor.

We study the problem of generalized uniformity testing \cite{BC17} of a
discrete probability distribution: Given samples from a probability
distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we
want to distinguish, with probability at least $2/3$, between the case that $p$
is uniform on some {\em subset} of $\mathbf{\Omega}$ versus $\epsilon$far, in
total variation distance, from any such uniform distribution.
We establish tight bounds on the sample complexity of generalized uniformity
testing. In more detail, we present a computationally efficient tester whose
sample complexity is optimal, up to constant factors, and a matching
informationtheoretic lower bound. Specifically, we show that the sample
complexity of generalized uniformity testing is
$\Theta\left(1/(\epsilon^{4/3}\p\_3) + 1/(\epsilon^{2} \p\_2) \right)$.

We study the general problem of testing whether an unknown distribution
belongs to a specified family of distributions. More specifically, given a
distribution family $\mathcal{P}$ and sample access to an unknown discrete
distribution $\mathbf{P}$, we want to distinguish (with high probability)
between the case that $\mathbf{P} \in \mathcal{P}$ and the case that
$\mathbf{P}$ is $\epsilon$far, in total variation distance, from every
distribution in $\mathcal{P}$. This is the prototypical hypothesis testing
problem that has received significant attention in statistics and, more
recently, in theoretical computer science.
The sample complexity of this general inference task depends on the
underlying family $\mathcal{P}$. The gold standard in distribution property
testing is to design sampleoptimal and computationally efficient algorithms
for this task. The main contribution of this work is a simple and general
testing technique that is applicable to all distribution families whose Fourier
spectrum satisfies a certain approximate sparsity property. To the best of our
knowledge, ours is the first use of the Fourier transform in the context of
distribution testing.
We apply our Fourierbased framework to obtain near sampleoptimal and
computationally efficient testers for the following fundamental distribution
families: Sums of Independent Integer Random Variables (SIIRVs), Poisson
Multinomial Distributions (PMDs), and Discrete LogConcave Distributions. For
the first two, ours are the first nontrivial testers in the literature, vastly
generalizing previous work on testing Poisson Binomial Distributions. For the
third, our tester improves on prior work in both sample and time complexity.

We study the efficient learnability of geometric concept classes 
specifically, lowdegree polynomial threshold functions (PTFs) and
intersections of halfspaces  when a fraction of the data is adversarially
corrupted. We give the first polynomialtime PAC learning algorithms for these
concept classes with dimensionindependent error guarantees in the presence of
nasty noise under the Gaussian distribution. In the nasty noise model, an
omniscient adversary can arbitrarily corrupt a small fraction of both the
unlabeled data points and their labels. This model generalizes wellstudied
noise models, including the malicious noise model and the agnostic (adversarial
label noise) model. Prior to our work, the only concept class for which
efficient malicious learning algorithms were known was the class of
origincentered halfspaces.
Specifically, our robust learning algorithm for lowdegree PTFs succeeds
under a number of tame distributions  including the Gaussian distribution
and, more generally, any logconcave distribution with (approximately) known
lowdegree moments. For LTFs under the Gaussian distribution, we give a
polynomialtime algorithm that achieves error $O(\epsilon)$, where $\epsilon$
is the noise rate. At the core of our PAC learning results is an efficient
algorithm to approximate the lowdegree Chowparameters of any bounded function
in the presence of nasty noise. To achieve this, we employ an iterative
spectral method for outlier detection and removal, inspired by recent work in
robust unsupervised learning. Our aforementioned algorithm succeeds for a range
of distributions satisfying mild concentration bounds and moment assumptions.
The correctness of our robust learning algorithm for intersections of
halfspaces makes essential use of a novel robust inverse independence lemma
that may be of broader interest.

We study the problem of estimating multivariate logconcave probability
density functions. We prove the first sample complexity upper bound for
learning logconcave densities on $\mathbb{R}^d$, for all $d \geq 1$. Prior to
our work, no upper bound on the sample complexity of this learning problem was
known for the case of $d>3$. In more detail, we give an estimator that, for any
$d \ge 1$ and $\epsilon>0$, draws $\tilde{O}_d \left( (1/\epsilon)^{(d+5)/2}
\right)$ samples from an unknown target logconcave density on $\mathbb{R}^d$,
and outputs a hypothesis that (with high probability) is $\epsilon$close to
the target, in total variation distance. Our upper bound on the sample
complexity comes close to the known lower bound of $\Omega_d \left(
(1/\epsilon)^{(d+1)/2} \right)$ for this problem.

We describe a general technique that yields the first {\em Statistical Query
lower bounds} for a range of fundamental highdimensional learning problems
involving Gaussian distributions. Our main results are for the problems of (1)
learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
a single unknown Gaussian distribution. For each of these problems, we show a
{\em superpolynomial gap} between the (informationtheoretic) sample
complexity and the computational complexity of {\em any} Statistical Query
algorithm for the problem. Our SQ lower bound for Problem (1) is qualitatively
matched by known learning algorithms for GMMs. Our lower bound for Problem (2)
implies that the accuracy of the robust learning algorithm
in~\cite{DiakonikolasKKLMS16} is essentially best possible among all
polynomialtime SQ algorithms.
Our SQ lower bounds are attained via a unified momentmatching technique that
is useful in other contexts and may be of broader interest. Our technique
yields nearlytight lower bounds for a number of related unsupervised
estimation problems. Specifically, for the problems of (3) robust covariance
estimation in spectral norm, and (4) robust sparse mean estimation, we
establish a quadratic {\em statisticalcomputational tradeoff} for SQ
algorithms, matching known upper bounds. Finally, our technique can be used to
obtain tight sample complexity lower bounds for highdimensional {\em testing}
problems. Specifically, for the classical problem of robustly {\em testing} an
unknown mean (known covariance) Gaussian, our technique implies an
informationtheoretic sample lower bound that scales {\em linearly} in the
dimension. Our sample lower bound matches the sample complexity of the
corresponding robust {\em learning} problem and separates the sample complexity
of robust testing from standard (nonrobust) testing.

We study the fundamental problem of learning the parameters of a
highdimensional Gaussian in the presence of noise  where an
$\varepsilon$fraction of our samples were chosen by an adversary. We give
robust estimators that achieve estimation error $O(\varepsilon)$ in the total
variation distance, which is optimal up to a universal constant that is
independent of the dimension.
In the case where just the mean is unknown, our robustness guarantee is
optimal up to a factor of $\sqrt{2}$ and the running time is polynomial in $d$
and $1/\epsilon$. When both the mean and covariance are unknown, the running
time is polynomial in $d$ and quasipolynomial in $1/\varepsilon$. Moreover all
of our algorithms require only a polynomial number of samples. Our work shows
that the same sorts of error guarantees that were established over fifty years
ago in the onedimensional setting can also be achieved by efficient algorithms
in highdimensional settings.

This work initiates a systematic investigation of testing {\em
highdimensional} structured distributions by focusing on testing {\em Bayesian
networks}  the prototypical family of directed graphical models. A Bayesian
network is defined by a directed acyclic graph, where we associate a random
variable with each node. The value at any particular node is conditionally
independent of all the other nondescendant nodes once its parents are fixed.
Specifically, we study the properties of identity testing and closeness testing
of Bayesian networks. Our main contribution is the first nontrivial efficient
testing algorithms for these problems and corresponding informationtheoretic
lower bounds. For a wide range of parameter settings, our testing algorithms
have sample complexity {\em sublinear} in the dimension and are sampleoptimal,
up to constant factors.

We investigate the complexity of computing approximate Nash equilibria in
anonymous games. Our main algorithmic result is the following: For any
$n$player anonymous game with a bounded number of strategies and any constant
$\delta>0$, an $O(1/n^{1\delta})$approximate Nash equilibrium can be computed
in polynomial time. Complementing this positive result, we show that if there
exists any constant $\delta>0$ such that an $O(1/n^{1+\delta})$approximate
equilibrium can be computed in polynomial time, then there is a fully
polynomialtime approximation scheme for this problem.
We also present a faster algorithm that, for any $n$player $k$strategy
anonymous game, runs in time $\tilde O((n+k) k n^k)$ and computes an $\tilde
O(n^{1/3} k^{11/3})$approximate equilibrium. This algorithm follows from the
existence of simple approximate equilibria of anonymous games, where each
player plays one strategy with probability $1\delta$, for some small $\delta$,
and plays uniformly at random with probability $\delta$.
Our approach exploits the connection between Nash equilibria in anonymous
games and Poisson multinomial distributions (PMDs). Specifically, we prove a
new probabilistic lemma establishing the following: Two PMDs, with large
variance in each direction, whose first few moments are approximately matching
are close in total variation distance. Our structural result strengthens
previous work by providing a smooth tradeoff between the variance bound and the
number of matching moments.

An $(n, k)$Poisson Multinomial Distribution (PMD) is a random variable of
the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random
vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In
this paper, we obtain a refined structural understanding of PMDs by analyzing
their Fourier transform. As our core structural result, we prove that the
Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly
speaking, its $L_1$norm is small outside a small set. By building on this
result, we obtain the following applications:
{\bf Learning Theory.} We design the first computationally efficient learning
algorithm for PMDs with respect to the total variation distance. Our algorithm
learns an arbitrary $(n, k)$PMD within variation distance $\epsilon$ using a
nearoptimal sample size of $\widetilde{O}_k(1/\epsilon^2),$ and runs in time
$\widetilde{O}_k(1/\epsilon^2) \cdot \log n.$ Previously, no algorithm with a
$\mathrm{poly}(1/\epsilon)$ runtime was known, even for $k=3.$
{\bf Game Theory.} We give the first efficient polynomialtime approximation
scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized
anonymous games with $n$ players and $k$ strategies, our algorithm computes a
wellsupported $\epsilon$Nash equilibrium in time $n^{O(k^3)} \cdot
(k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k1}}.$ The best
previous algorithm for this problem had running time $n^{(f(k)/\epsilon)^k},$
where $f(k) = \Omega(k^{k^2})$, for any $k>2.$
{\bf Statistics.} We prove a multivariate central limit theorem (CLT) that
relates an arbitrary PMD to a discretized multivariate Gaussian with the same
mean and covariance, in total variation distance. Our new CLT strengthens the
CLT of Valiant and Valiant by completely removing the dependence on $n$ in the
error bound.

We study the {\em robust proper learning} of univariate logconcave
distributions (over continuous and discrete domains). Given a set of samples
drawn from an unknown target distribution, we want to compute a logconcave
hypothesis distribution that is as close as possible to the target, in total
variation distance. In this work, we give the first computationally efficient
algorithm for this learning problem. Our algorithm achieves the
informationtheoretically optimal sample size (up to a constant factor), runs
in polynomial time, and is robust to model misspecification with nearlyoptimal
error guarantees.
Specifically, we give an algorithm that, on input $n=O(1/\eps^{5/2})$ samples
from an unknown distribution $f$, runs in time $\widetilde{O}(n^{8/5})$, and
outputs a logconcave hypothesis $h$ that (with high probability) satisfies
$\dtv(h, f) = O(\opt)+\eps$, where $\opt$ is the minimum total variation
distance between $f$ and the class of logconcave distributions. Our approach
to the robust proper learning problem is quite flexible and may be applicable
to many other univariate distribution families.

We give polynomial time algorithms for quantitative (and qualitative)
reachability analysis for Branching Markov Decision Processes (BMDPs).
Specifically, given a BMDP, and given an initial population, where the
objective of the controller is to maximize (or minimize) the probability of
eventually reaching a population that contains an object of a desired (or
undesired) type, we give algorithms for approximating the supremum (infimum)
reachability probability, within desired precision epsilon > 0, in time
polynomial in the encoding size of the BMDP and in log(1/epsilon). We
furthermore give Ptime algorithms for computing epsilonoptimal strategies for
both maximization and minimization of reachability probabilities. We also give
Ptime algorithms for all associated qualitative analysis problems, namely:
deciding whether the optimal (supremum or infimum) reachability probabilities
are 0 or 1. Prior to this paper, approximation of optimal reachability
probabilities for BMDPs was not even known to be decidable.
Our algorithms exploit the following basic fact: we show that for any BMDP,
its maximum (minimum) nonreachability probabilities are given by the greatest
fixed point (GFP) solution g* in [0,1]^n of a corresponding monotone max (min)
Probabilistic Polynomial System of equations (max/minPPS), x=P(x), which are
the Bellman optimality equations for a BMDP with nonreachability objectives.
We show how to compute the GFP of max/min PPSs to desired precision in Ptime.
We also study more general Branching Simple Stochastic Games (BSSGs) with
(non)reachability objectives. We show that: (1) the value of these games is
captured by the GFP of a corresponding maxminPPS; (2) the quantitative problem
of approximating the value is in TFNP; and (3) the qualitative problems
associated with the value are all solvable in Ptime.

We study the structure and learnability of sums of independent integer random
variables (SIIRVs). For $k \in \mathbb{Z}_{+}$, a $k$SIIRV of order $n \in
\mathbb{Z}_{+}$ is the probability distribution of the sum of $n$ independent
random variables each supported on $\{0, 1, \dots, k1\}$. We denote by ${\cal
S}_{n,k}$ the set of all $k$SIIRVs of order $n$.
In this paper, we tightly characterize the sample and computational
complexity of learning $k$SIIRVs. More precisely, we design a computationally
efficient algorithm that uses $\widetilde{O}(k/\epsilon^2)$ samples, and learns
an arbitrary $k$SIIRV within error $\epsilon,$ in total variation distance.
Moreover, we show that the {\em optimal} sample complexity of this learning
problem is $\Theta((k/\epsilon^2)\sqrt{\log(1/\epsilon)}).$ Our algorithm
proceeds by learning the Fourier transform of the target $k$SIIRV in its
effective support. Its correctness relies on the {\em approximate sparsity} of
the Fourier transform of $k$SIIRVs  a structural property that we establish,
roughly stating that the Fourier transform of $k$SIIRVs has small magnitude
outside a small set.
Along the way we prove several new structural results about $k$SIIRVs. As
one of our main structural contributions, we give an efficient algorithm to
construct a sparse {\em proper} $\epsilon$cover for ${\cal S}_{n,k},$ in total
variation distance. We also obtain a novel geometric characterization of the
space of $k$SIIRVs. Our characterization allows us to prove a tight lower
bound on the size of $\epsilon$covers for ${\cal S}_{n,k}$, and is the key
ingredient in our tight sample complexity lower bound.
Our approach of exploiting the sparsity of the Fourier transform in
distribution learning is general, and has recently found additional
applications.

We give an algorithm for properly learning Poisson binomial distributions. A
Poisson binomial distribution (PBD) of order $n$ is the discrete probability
distribution of the sum of $n$ mutually independent Bernoulli random variables.
Given $\widetilde{O}(1/\epsilon^2)$ samples from an unknown PBD $\mathbf{p}$,
our algorithm runs in time $(1/\epsilon)^{O(\log \log (1/\epsilon))}$, and
outputs a hypothesis PBD that is $\epsilon$close to $\mathbf{p}$ in total
variation distance. The previously best known running time for properly
learning PBDs was $(1/\epsilon)^{O(\log(1/\epsilon))}$.
As one of our main contributions, we provide a novel structural
characterization of PBDs. We prove that, for all $\epsilon >0,$ there exists an
explicit collection $\cal{M}$ of $(1/\epsilon)^{O(\log \log (1/\epsilon))}$
vectors of multiplicities, such that for any PBD $\mathbf{p}$ there exists a
PBD $\mathbf{q}$ with $O(\log(1/\epsilon))$ distinct parameters whose
multiplicities are given by some element of ${\cal M}$, such that $\mathbf{q}$
is $\epsilon$close to $\mathbf{p}$. Our proof combines tools from Fourier
analysis and algebraic geometry.
Our approach to the proper learning problem is as follows: Starting with an
accurate nonproper hypothesis, we fit a PBD to this hypothesis. More
specifically, we essentially start with the hypothesis computed by the
computationally efficient nonproper learning algorithm in our recent
work~\cite{DKS15}. Our aforementioned structural characterization allows us to
reduce the corresponding fitting problem to a collection of
$(1/\epsilon)^{O(\log \log(1/\epsilon))}$ systems of lowdegree polynomial
inequalities. We show that each such system can be solved in time
$(1/\epsilon)^{O(\log \log(1/\epsilon))}$, which yields the overall running
time of our algorithm.

The following two decision problems capture the complexity of comparing
integers or rationals that are succinctly represented in
productofexponentials notation, or equivalently, via arithmetic circuits
using only multiplication and division gates, and integer inputs:
Input instance: four lists of positive integers: a_1, ...., a_n ; b_1,....,
b_n ; c_1,....,c_m ; d_1, ...., d_m ; where each of the integers is represented
in binary.
Problem 1 (equality testing): Decide whether a_1^{b_1} a_2^{b_2} ....
a_n^{b_n} = c_1^{d_1} c_2^{d_2} .... c_m^{d_m} .
Problem 2 (inequality testing): Decide whether a_1^{b_1} a_2^{b_2} ...
a_n^{b_n} >= c_1^{d_1} c_2^{d_2} .... c_m^{d_m} .
Problem 1 is easily decidable in polynomial time using a simple iterative
algorithm. Problem 2 is much harder. We observe that the complexity of Problem
2 is intimately connected to deep conjectures and results in number theory. In
particular, if a refined form of the ABC conjecture formulated by Baker in 1998
holds, or if the older LangWaldschmidt conjecture (formulated in 1978) on
linear forms in logarithms holds, then Problem 2 is decidable in Ptime (in the
standard Turing model of computation). Moreover, it follows from the best
available quantitative bounds on linear forms in logarithms, e.g., by Baker and
W\"{u}stholz (1993) or Matveev (2000), that if m and n are fixed universal
constants then Problem 2 is decidable in Ptime (without relying on any
conjectures). This latter fact was observed earlier by Shub (1993).
We describe one application: Ptime maximum probability parsing for arbitrary
stochastic contextfree grammars (where \epsilonrules are allowed).

A central computational problem for analyzing and model checking various
classes of infinitestate recursive probabilistic systems (including
quasibirthdeath processes, multitype branching processes, stochastic
contextfree grammars, probabilistic pushdown automata and recursive Markov
chains) is the computation of {\em termination probabilities}, and computing
these probabilities in turn boils down to computing the {\em least fixed point}
(LFP) solution of a corresponding {\em monotone polynomial system} (MPS) of
equations, denoted x=P(x).
It was shown by Etessami & Yannakakis that a decomposed variant of Newton's
method converges monotonically to the LFP solution for any MPS that has a
nonnegative solution. Subsequently, Esparza, Kiefer, & Luttenberger obtained
upper bounds on the convergence rate of Newton's method for certain classes of
MPSs. More recently, better upper bounds have been obtained for special classes
of MPSs. However, prior to this paper, for arbitrary (not necessarily
stronglyconnected) MPSs, no upper bounds at all were known on the convergence
rate of Newton's method as a function of the encoding size P of the input
MPS, x=P(x).
In this paper we provide worstcase upper bounds, as a function of both the
input encoding size P, and epsilon > 0, on the number of iterations required
for decomposed Newton's method (even with rounding) to converge within additive
error epsilon > 0 of q^*, for any MPS with LFP solution q^*. Our upper bounds
are essentially optimal in terms of several important parameters.
Using our upper bounds, and building on prior work, we obtain the first
Ptime algorithm (in the standard Turing model of computation) for quantitative
model checking, to within desired precision, of discretetime QBDs and
(equivalently) probabilistic 1counter automata, with respect to any (fixed)
omegaregular or LTL property.

We study the problem of computing the probability that a given stochastic
contextfree grammar (SCFG), G, generates a string in a given regular language
L(D) (given by a DFA, D). This basic problem has a number of applications in
statistical natural language processing, and it is also a key necessary step
towards quantitative \omegaregular model checking of stochastic contextfree
processes (equivalently, 1exit recursive Markov chains, or stateless
probabilistic pushdown processes).
We show that the probability that G generates a string in L(D) can be
computed to within arbitrary desired precision in polynomial time (in the
standard Turing model of computation), under a rather mild assumption about the
SCFG, G, and with no extra assumption about D. We show that this assumption is
satisfied for SCFG's whose rule probabilities are learned via the wellknown
insideoutside (EM) algorithm for maximumlikelihood estimation (a standard
method for constructing SCFGs in statistical NLP and biological sequence
analysis). Thus, for these SCFGs the algorithm always runs in Ptime.

We show that one can approximate the least fixed point solution for a
multivariate system of monotone probabilistic polynomial equations in time
polynomial in both the encoding size of the system of equations and in
log(1/\epsilon), where \epsilon > 0 is the desired additive error bound of the
solution. (The model of computation is the standard Turing machine model.)
We use this result to resolve several open problems regarding the
computational complexity of computing key quantities associated with some
classic and heavily studied stochastic processes, including multitype
branching processes and stochastic contextfree grammars.

We show that one can approximate the least fixed point solution for a
multivariate system of monotone probabilistic max(min) polynomial equations,
referred to as maxPPSs (and minPPSs, respectively), in time polynomial in both
the encoding size of the system of equations and in log(1/epsilon), where
epsilon > 0 is the desired additive error bound of the solution. (The model of
computation is the standard Turing machine model.) We establish this result
using a generalization of Newton's method which applies to maxPPSs and minPPSs,
even though the underlying functions are only piecewisedifferentiable. This
generalizes our recent work which provided a Ptime algorithm for purely
probabilistic PPSs.
These equations form the Bellman optimality equations for several important
classes of infinitestate Markov Decision Processes (MDPs). Thus, as a
corollary, we obtain the first polynomial time algorithms for computing to
within arbitrary desired precision the optimal value vector for several classes
of infinitestate MDPs which arise as extensions of classic, and heavily
studied, purely stochastic processes. These include both the problem of
maximizing and mininizing the termination (extinction) probability of
multitype branching MDPs, stochastic contextfree MDPs, and 1exit Recursive
MDPs.
Furthermore, we also show that we can compute in Ptime an epsilonoptimal
policy for both maximizing and minimizing branching, contextfree, and
1exitRecursive MDPs, for any given desired epsilon > 0. This is despite the
fact that actually computing optimal strategies is SqrtSumhard and
PosSLPhard in this setting.
We also derive, as an easy consequence of these results, an FNP upper bound
on the complexity of computing the value (within arbitrary desired precision)
of branching simple stochastic games (BSSGs).