
COnstraintBased Reconstruction and Analysis (COBRA) provides a molecular
mechanistic framework for integrative analysis of experimental data and
quantitative prediction of physicochemically and biochemically feasible
phenotypic states. The COBRA Toolbox is a comprehensive software suite of
interoperable COBRA methods. It has found widespread applications in biology,
biomedicine, and biotechnology because its functions can be flexibly combined
to implement tailored COBRA protocols for any biochemical network. Version 3.0
includes new methods for quality controlled reconstruction, modelling,
topological analysis, strain and experimental design, network visualisation as
well as network integration of chemoinformatic, metabolomic, transcriptomic,
proteomic, and thermochemical data. New multilingual code integration also
enables an expansion in COBRA application scope via highprecision,
highperformance, and nonlinear numerical optimisation solvers for multiscale,
multicellular and reaction kinetic modelling, respectively. This protocol can
be adapted for the generation and analysis of a constraintbased model in a
wide variety of molecular systems biology scenarios. This protocol is an update
to the COBRA Toolbox 1.0 and 2.0. The COBRA Toolbox 3.0 provides an
unparalleled depth of constraintbased reconstruction and analysis methods.

We propose a fast proximal Newtontype algorithm for minimizing regularized
finite sums that returns an $\epsilon$suboptimal point in
$\tilde{\mathcal{O}}(d(n + \sqrt{\kappa d})\log(\frac{1}{\epsilon}))$ FLOPS,
where $n$ is number of samples, $d$ is feature dimension, and $\kappa$ is the
condition number. As long as $n > d$, the proposed method is more efficient
than stateoftheart accelerated stochastic firstorder methods for nonsmooth
regularizers which requires $\tilde{\mathcal{O}}(d(n + \sqrt{\kappa
n})\log(\frac{1}{\epsilon}))$ FLOPS. The key idea is to form the subsampled
Newton subproblem in a way that preserves the finite sum structure of the
objective, thereby allowing us to leverage recent developments in stochastic
firstorder methods to solve the subproblem. Experimental results verify that
the proposed algorithm outperforms previous algorithms for $\ell_1$regularized
logistic regression on real datasets.

We identify conditional parity as a general notion of nondiscrimination in
machine learning. In fact, several recently proposed notions of
nondiscrimination, including a few counterfactual notions, are instances of
conditional parity. We show that conditional parity is amenable to statistical
analysis by studying randomization as a general mechanism for achieving
conditional parity and a kernelbased test of conditional parity.

We develop a general approach to valid inference after model selection. At
the core of our framework is a result that characterizes the distribution of a
postselection estimator conditioned on the selection event. We specialize the
approach to model selection by the lasso to form valid confidence intervals for
the selected coefficients and test whether all relevant variables have been
included in the model.

Archetypal analysis and nonnegative matrix factorization (NMF) are staples
in a statisticians toolbox for dimension reduction and exploratory data
analysis. We describe a geometric approach to both NMF and archetypal analysis
by interpreting both problems as finding extreme points of the data cloud. We
also develop and analyze an efficient approach to finding extreme points in
high dimensions. For modern massive datasets that are too large to fit on a
single machine and must be stored in a distributed setting, our approach makes
only a small number of passes over the data. In fact, it is possible to obtain
the NMF or perform archetypal analysis with just two passes over the data.

We devise a oneshot approach to distributed sparse regression in the
highdimensional setting. The key idea is to average "debiased" or
"desparsified" lasso estimators. We show the approach converges at the same
rate as the lasso as long as the dataset is not split across too many machines.
We also extend the approach to generalized linear models.

Regularized Mestimators are used in diverse areas of science and engineering
to fit highdimensional models with some lowdimensional structure. Usually the
lowdimensional structure is encoded by the presence of the (unknown)
parameters in some lowdimensional model subspace. In such settings, it is
desirable for estimates of the model parameters to be \emph{model selection
consistent}: the estimates also fall in the model subspace. We develop a
general framework for establishing consistency and model selection consistency
of regularized Mestimators and show how it applies to some special cases of
interest in statistical learning. Our analysis identifies two key properties of
regularized Mestimators, referred to as geometric decomposability and
irrepresentability, that ensure the estimators are consistent and model
selection consistent.

We consider a discriminative learning (regression) problem, whereby the
regression function is a convex combination of k linear classifiers. Existing
approaches are based on the EM algorithm, or similar techniques, without
provable guarantees. We develop a simple method based on spectral techniques
and a `mirroring' trick, that discovers the subspace spanned by the
classifiers' parameter vectors. Under a probabilistic assumption on the feature
vector distribution, we prove that this approach has nearly optimal statistical
efficiency.

We generalize Newtontype methods for minimizing smooth functions to handle a
sum of two convex functions: a smooth function and a nonsmooth function with a
simple proximal mapping. We show that the resulting proximal Newtontype
methods inherit the desirable convergence behavior of Newtontype methods for
minimizing smooth functions, even when search directions are computed
inexactly. Many popular methods tailored to problems arising in bioinformatics,
signal processing, and statistical learning are special cases of proximal
Newtontype methods, and our analysis yields new convergence results for some
of these methods.

Twostep estimators often called upon to fit censored regression models in
many areas of science and engineering. Since censoring incurs a bias in the
naive leastsquares fit, a twostep estimator first estimates the bias and then
fits a corrected linear model. We develop a framework for performing valid
/postcorrection inference/ with twostep estimators. By exploiting recent
results on postselection inference, we obtain valid confidence intervals and
significance tests for the fitted coefficients.