
This paper proposes new nonparametric diagnostic tools to assess the
asymptotic validity of different treatment effects estimators that rely on the
correct specification of the propensity score. We derive a particular
restriction relating the propensity score distribution of treated and control
groups, and develop specification tests based upon it. The resulting tests do
not suffer from the "curse of dimensionality" when the vector of covariates is
highdimensional, are fully datadriven, do not require tuning parameters such
as bandwidths, and are able to detect a broad class of local alternatives
converging to the null at the parametric rate $n^{1/2}$, with $n$ the sample
size. We show that the use of an orthogonal projection on the tangent space of
nuisance parameters facilitates the simulation of critical values by means of a
multiplier bootstrap procedure, and can lead to power gains. The finite sample
performance of the tests is examined by means of a Monte Carlo experiment and
an empirical application. Opensource software is available for implementing
the proposed tests.

Given a set of baseline assumptions, a breakdown frontier is the boundary
between the set of assumptions which lead to a specific conclusion and those
which do not. In a potential outcomes model with a binary treatment, we
consider two conclusions: First, that ATE is at least a specific value (e.g.,
nonnegative) and second that the proportion of units who benefit from treatment
is at least a specific value (e.g., at least 50\%). For these conclusions, we
derive the breakdown frontier for two kinds of assumptions: one which indexes
relaxations of the baseline random assignment of treatment assumption, and one
which indexes relaxations of the baseline rank invariance assumption. These
classes of assumptions nest both the point identifying assumptions of random
assignment and rank invariance and the opposite end of no constraints on
treatment selection or the dependence structure between potential outcomes.
This frontier provides a quantitative measure of robustness of conclusions to
relaxations of the baseline point identifying assumptions. We derive
$\sqrt{N}$consistent sample analog estimators for these frontiers. We then
provide two asymptotically valid bootstrap procedures for constructing lower
uniform confidence bands for the breakdown frontier. As a measure of
robustness, estimated breakdown frontiers and their corresponding confidence
bands can be presented alongside traditional point estimates and confidence
intervals obtained under point identifying assumptions. We illustrate this
approach in an empirical application to the effect of child soldiering on
wages. We find that sufficiently weak conclusions are robust to simultaneous
failures of rank invariance and random assignment, while some stronger
conclusions are fairly robust to failures of rank invariance but not
necessarily to relaxations of random assignment.

We propose a method of estimating the linearinmeans model of peer effects
in which the peer group, defined by a social network, is endogenous in the
outcome equation for peer effects. Endogeneity is due to unobservable
individual characteristics that influence both link formation in the network
and the outcome of interest. We propose two estimators of the peer effect
equation that control for the endogeneity of the social connections using a
control function approach. We leave the functional form of the control function
unspecified and treat it as unknown. To estimate the model, we use a sieve
semiparametric approach, and we establish asymptotics of the semiparametric
estimator.

We propose a bootstrapbased calibrated projection procedure to build
confidence intervals for single components and for smooth functions of a
partially identified parameter vector in moment (in)equality models. The method
controls asymptotic coverage uniformly over a large class of data generating
processes. The extreme points of the calibrated projection confidence interval
are obtained by extremizing the value of the function of interest subject to a
proper relaxation of studentized sample analogs of the moment (in)equality
conditions. The degree of relaxation, or critical level, is calibrated so that
the function of theta, not theta itself, is uniformly asymptotically covered
with prespecified probability. This calibration is based on repeatedly checking
feasibility of linear programming problems, rendering it computationally
attractive.
Nonetheless, the program defining an extreme point of the confidence interval
is generally nonlinear and potentially intricate. We provide an algorithm,
based on the response surface method for global optimization, that approximates
the solution rapidly and accurately, and we establish its rate of convergence.
The algorithm is of independent interest for optimization problems with simple
objectives and complicated constraints. An empirical application estimating an
entry game illustrates the usefulness of the method. Monte Carlo simulations
confirm the accuracy of the solution algorithm, the good statistical as well as
computational performance of calibrated projection (including in comparison to
other methods), and the algorithm's potential to greatly accelerate computation
of other confidence intervals.

To quantify uncertainty around point estimates of conditional objects such as
conditional means or variances, parameter uncertainty has to be taken into
account. Attempts to incorporate parameter uncertainty are typically based on
the unrealistic assumption of observing two independent processes, where one is
used for parameter estimation, and the other for conditioning upon. Such
unrealistic foundation raises the question whether these intervals are
theoretically justified in a realistic setting. This paper presents an
asymptotic justification for this type of intervals that does not require such
an unrealistic assumption, but relies on a samplesplit approach instead. By
showing that our samplesplit intervals coincide asymptotically with the
standard intervals, we provide a novel, and realistic, justification for
confidence intervals of conditional objects. The analysis is carried out for a
rich class of time series models.

We derive fixed effects estimators of parameters and average partial effects
in (possibly dynamic) nonlinear panel data models with individual and time
effects. They cover logit, probit, ordered probit, Poisson and Tobit models
that are important for many empirical applications in micro and macroeconomics.
Our estimators use analytical and jackknife bias corrections to deal with the
incidental parameter problem, and are asymptotically unbiased under asymptotic
sequences where $N/T$ converges to a constant. We develop inference methods and
show that they perform well in numerical examples.

In many areas, practitioners seek to use observational data to learn a
treatment assignment policy that satisfies applicationspecific constraints,
such as budget, fairness, simplicity, or other functional form constraints. For
example, policies may be restricted to take the form of decision trees based on
a limited set of easily observable individual characteristics. We propose a new
approach to this problem motivated by the theory of semiparametrically
efficient estimation. Our method can be used to optimize either binary
treatments or infinitesimal nudges to continuous treatments, and can leverage
observational data where causal effects are identified using a variety of
strategies, including selection on observables and instrumental variables.
Given a doubly robust estimator of the causal effect of assigning everyone to
treatment, we develop an algorithm for choosing whom to treat, and establish
strong guarantees for the asymptotic utilitarian regret of the resulting
policy.

Structural econometric methods are often criticized for being sensitive to
functional form assumptions. We study parametric estimators of the local
average treatment effect (LATE) derived from a widely used class of latent
threshold crossing models and show they yield LATE estimates algebraically
equivalent to the instrumental variables (IV) estimator. Our leading example is
Heckman's (1979) twostep ("Heckit") control function estimator which, with
twosided noncompliance, can be used to compute estimates of a variety of
causal parameters. Equivalence with IV is established for a semiparametric
family of control function estimators and shown to hold at interior solutions
for a class of maximum likelihood estimators. Our results suggest differences
between structural and IV estimates often stem from disagreements about the
target parameter rather than from functional form assumptions per se. In cases
where equivalence fails, reporting structural estimates of LATE alongside IV
provides a simple means of assessing the credibility of structural
extrapolation exercises.

Oversubscribed treatments are often allocated using randomized waiting lists.
Applicants are ranked randomly, and treatment offers are made following that
ranking until all seats are filled. To estimate causal effects, researchers
often compare applicants getting and not getting an offer. We show that those
two groups are not statistically comparable. Therefore, the estimator arising
from that comparison is inconsistent. We propose a new estimator, and show that
it is consistent. Finally, we revisit an application, and we show that using
our estimator can lead to sizably different results from those obtained using
the commonly used estimator.

This paper considers the problem of testing many moment inequalities where
the number of moment inequalities, denoted by $p$, is possibly much larger than
the sample size $n$. There is a variety of economic applications where solving
this problem allows to carry out inference on causal and structural parameters,
a notable example is the market structure model of Ciliberto and Tamer (2009)
where $p=2^{m+1}$ with $m$ being the number of firms that could possibly enter
the market. We consider the test statistic given by the maximum of $p$
Studentized (or $t$type) inequalityspecific statistics, and analyze various
ways to compute critical values for the test statistic. Specifically, we
consider critical values based upon (i) the union bound combined with a
moderate deviation inequality for selfnormalized sums, (ii) the multiplier and
empirical bootstraps, and (iii) twostep and threestep variants of (i) and
(ii) by incorporating the selection of uninformative inequalities that are far
from being binding and a novel selection of weakly informative inequalities
that are potentially binding but do not provide first order information. We
prove validity of these methods, showing that under mild conditions, they lead
to tests with the error in size decreasing polynomially in $n$ while allowing
for $p$ being much larger than $n$, indeed $p$ can be of order $\exp (n^{c})$
for some $c > 0$. Importantly, all these results hold without any restriction
on the correlation structure between $p$ Studentized statistics, and also hold
uniformly with respect to suitably large classes of underlying distributions.
Moreover, in the online supplement, we show validity of a test based on the
block multiplier bootstrap in the case of dependent data under some general
mixing conditions.

This paper presents a new estimator of the intercept of a linear regression
model in cases where the outcome varaible is observed subject to a selection
rule. The intercept is often in this context of inherent interest; for example,
in a program evaluation context, the difference between the intercepts in
outcome equations for participants and nonparticipants can be interpreted as
the difference in average outcomes of participants and their counterfactual
average outcomes if they had chosen not to participate. The new estimator can
under mild conditions exhibit a rate of convergence in probability equal to
$n^{p/(2p+1)}$, where $p\ge 2$ is an integer that indexes the strength of
certain smoothness assumptions. This rate of convergence is shown in this
context to be the optimal rate of convergence for estimation of the intercept
parameter in terms of a minimax criterion. The new estimator, unlike other
proposals in the literature, is under mild conditions consistent and
asymptotically normal with a rate of convergence that is the same regardless of
the degree to which selection depends on unobservables in the outcome equation.
Simulation evidence and an empirical example are included.

We propose dual regression as an alternative to the quantile regression
process for the global estimation of conditional distribution functions under
minimal assumptions. Dual regression provides all the interpretational power of
the quantile regression process while avoiding the need for repairing the
intersecting conditional quantile surfaces that quantile regression often
produces in practice. Our approach introduces a mathematical programming
characterization of conditional distribution functions which, in its simplest
form, is the dual program of a simultaneous estimator for linear locationscale
models. We apply our general characterization to the specification and
estimation of a flexible class of conditional distribution functions, and
present asymptotic theory for the corresponding empirical dual regression
process.

This paper develops and implements a nonparametric test of Random Utility
Models. The motivating application is to test the null hypothesis that a sample
of crosssectional demand distributions was generated by a population of
rational consumers. We test a necessary and sufficient condition for this that
does not rely on any restriction on unobserved heterogeneity or the number of
goods. We also propose and implement a control function approach to account for
endogenous expenditure. An econometric result of independent interest is a test
for linear inequality constraints when these are represented as the vertices of
a polyhedron rather than its faces. An empirical application to the U.K.
Household Expenditure Survey illustrates computational feasibility of the
method in demand problems with 5 goods.

The ongoing net neutrality debate has generated a lot of heated discussions
on whether or not monetary interactions should be regulated between content and
access providers. Among the several topics discussed, `differential pricing'
has recently received attention due to `zerorating' platforms proposed by some
service providers. In the differential pricing scheme, Internet Service
Providers (ISPs) can exempt data access charges for on content from certain CPs
(zerorated) while no exemption is on content from other CPs. This allows the
possibility for Content Providers (CPs) to make `sponsorship' agreements to
zerorate their content and attract more user traffic. In this paper, we study
the effect of differential pricing on various players in the Internet. We first
consider a model with a monopolistic ISP and multiple CPs where users select
CPs based on the quality of service (QoS) and data access charges. We show that
in a differential pricing regime 1) a CP offering low QoS can make have higher
surplus than a CP offering better QoS through sponsorships. 2) Overall QoS
(mean delay) for end users can degrade under differential pricing schemes. In
the oligopolistic market with multiple ISPs, users tend to select the ISP with
lowest ISP resulting in same type of conclusions as in the monopolistic market.
We then study how differential pricing effects the revenue of ISPs.

We study factor models augmented by observed covariates that have explanatory
powers on the unknown factors. In financial factor models, the unknown factors
can be reasonably well explained by a few observable proxies, such as the
FamaFrench factors. In diffusion index forecasts, identified factors are
strongly related to several directly measurable economic variables such as
consumptionwealth variable, financial ratios, and term spread. With those
covariates, both the factors and loadings are identifiable up to a rotation
matrix even only with a finite dimension. To incorporate the explanatory power
of these covariates, we propose a smoothed principal component analysis (PCA):
(i) regress the data onto the observed covariates, and (ii) take the principal
components of the fitted data to estimate the loadings and factors. This allows
us to accurately estimate the percentage of both explained and unexplained
components in factors and thus to assess the explanatory power of covariates.
We show that both the estimated factors and loadings can be estimated with
improved rates of convergence compared to the benchmark method. The degree of
improvement depends on the strength of the signals, representing the
explanatory power of the covariates on the factors. The proposed estimator is
robust to possibly heavytailed distributions. We apply the model to forecast
US bond risk premia, and find that the observed macroeconomic characteristics
contain strong explanatory powers of the factors. The gain of forecast is more
substantial when the characteristics are incorporated to estimate the common
factors than directly used for forecasts.

With the advent of Big Data, nowadays in many applications databases
containing large quantities of similar time series are available. Forecasting
time series in these domains with traditional univariate forecasting procedures
leaves great potentials for producing accurate forecasts untapped. Recurrent
neural networks (RNNs), and in particular Long ShortTerm Memory (LSTM)
networks, have proven recently that they are able to outperform
stateoftheart univariate time series forecasting methods in this context
when trained across all available time series. However, if the time series
database is heterogeneous, accuracy may degenerate, so that on the way towards
fully automatic forecasting methods in this space, a notion of similarity
between the time series needs to be built into the methods. To this end, we
present a prediction model that can be used with different types of RNN models
on subgroups of similar time series, which are identified by time series
clustering techniques. We assess our proposed methodology using LSTM networks,
a widely popular RNN variant. Our method achieves competitive results on
benchmarking datasets under competition evaluation procedures. In particular,
in terms of mean sMAPE accuracy, it consistently outperforms the baseline LSTM
model and outperforms all other methods on the CIF2016 forecasting competition
dataset.

Quantile and quantile effect functions are important tools for descriptive
and causal analyses due to their natural and intuitive interpretation. Existing
inference methods for these functions do not apply to discrete random
variables. This paper offers a simple, practical construction of simultaneous
confidence bands for quantile and quantile effect functions of possibly
discrete random variables. It is based on a natural transformation of
simultaneous confidence bands for distribution functions, which are readily
available for many problems. The construction is generic and does not depend on
the nature of the underlying problem. It works in conjunction with parametric,
semiparametric, and nonparametric modeling methods for observed and
counterfactual distributions, and does not depend on the sampling scheme. We
apply our method to characterize the distributional impact of insurance
coverage on health care utilization and obtain the distributional decomposition
of the racial test score gap. We find that universal insurance coverage
increases the number of doctor visits across the entire distribution, and that
the racial test score gap is small at early ages but grows with age due to
socio economic factors affecting child development especially at the top of the
distribution. These are new, interesting empirical findings that complement
previous analyses that focused on mean effects only. In both applications, the
outcomes of interest are discrete rendering existing inference methods invalid
for obtaining uniform confidence bands for observed and counterfactual quantile
functions and for their difference  the quantile effects functions.

In treatment allocation problems the individuals to be treated often arrive
sequentially. We study a problem in which the policy maker is not only
interested in the expected cumulative welfare but is also concerned about the
uncertainty/risk of the treatment outcomes. At the outset, the total number of
treatment assignments to be made may even be unknown. A sequential treatment
policy which attains the minimax optimal regret is proposed. We also
demonstrate that the expected number of suboptimal treatments only grows slowly
in the number of treatments. Finally, we study a setting where outcomes are
only observed with delay.

Quantile regression (QR) is a principal regression method for analyzing the
impact of covariates on outcomes. The impact is described by the conditional
quantile function and its functionals. In this paper we develop the
nonparametric QRseries framework, covering many regressors as a special case,
for performing inference on the entire conditional quantile function and its
linear functionals. In this framework, we approximate the entire conditional
quantile function by a linear combination of series terms with
quantilespecific coefficients and estimate the functionvalued coefficients
from the data. We develop large sample theory for the QRseries coefficient
process, namely we obtain uniform strong approximations to the QRseries
coefficient process by conditionally pivotal and Gaussian processes. Based on
these strong approximations, or couplings, we develop four resampling methods
(pivotal, gradient bootstrap, Gaussian, and weighted bootstrap) that can be
used for inference on the entire QRseries coefficient function.
We apply these results to obtain estimation and inference methods for linear
functionals of the conditional quantile function, such as the conditional
quantile function itself, its partial derivatives, average partial derivatives,
and conditional average partial derivatives. Specifically, we obtain uniform
rates of convergence and show how to use the four resampling methods mentioned
above for inference on the functionals. All of the above results are for
functionvalued parameters, holding uniformly in both the quantile index and
the covariate value, and covering the pointwise case as a byproduct. We
demonstrate the practical utility of these results with an example, where we
estimate the price elasticity function and test the Slutsky condition of the
individual demand for gasoline, as indexed by the individual unobserved
propensity for gasoline consumption.

This paper proposes a straightforward algorithm to carry out inference in
large timevarying parameter vector autoregressions (TVPVARs) with mixture
innovation components for each coefficient in the system. We significantly
decrease the computational burden by approximating the latent indicators that
drive the timevariation in the coefficients with a latent threshold process
that depends on the absolute size of the shocks. The merits of our approach are
illustrated with two applications. First, we forecast the US term structure of
interest rates and demonstrate forecast gains of the proposed mixture
innovation model relative to other benchmark models. Second, we apply our
approach to US macroeconomic data and find significant evidence for
timevarying effects of a monetary policy tightening.

This paper considers inference on fixed effects in a linear regression model
estimated from network data. An important special case of our setup is the
twoway regression model. This is a workhorse technique in the analysis of
matched data sets, such as employeremployee or studentteacher panel data. We
formalize how the structure of the network affects the accuracy with which the
fixed effects can be estimated. This allows us to derive sufficient conditions
on the network for consistent estimation and asymptoticallyvalid inference to
be possible. Estimation of moments is also considered. We allow for general
networks and our setup covers both the dense and sparse case. We provide
numerical results for the estimation of teacher valueadded models and
regressions with occupational dummies.

We develop a Bayesian vector autoregressive (VAR) model with multivariate
stochastic volatility that is capable of handling vast dimensional information
sets. Three features are introduced to permit reliable estimation of the model.
First, we assume that the reducedform errors in the VAR feature a factor
stochastic volatility structure, allowing for conditional equationbyequation
estimation. Second, we apply recently developed globallocal shrinkage priors
to the VAR coefficients to cure the curse of dimensionality. Third, we utilize
recent innovations to efficiently sample from highdimensional multivariate
Gaussian distributions. This makes simulationbased fully Bayesian inference
feasible when the dimensionality is large but the time series length is
moderate. We demonstrate the merits of our approach in an extensive simulation
study and apply the model to US macroeconomic data to evaluate its forecasting
capabilities.

Factor structures or interactive effects are convenient devices to
incorporate latent variables in panel data models. We consider fixed effect
estimation of nonlinear panel singleindex models with factor structures in the
unobservables, which include logit, probit, ordered probit and Poisson
specifications. We establish that fixed effect estimators of model parameters
and average partial effects have normal distributions when the two dimensions
of the panel grow large, but might suffer of incidental parameter bias. We show
how models with factor structures can also be applied to capture important
features of network data such as reciprocity, degree heterogeneity, homophily
in latent variables and clustering. We illustrate this applicability with an
empirical example to the estimation of a gravity equation of international
trade between countries using a Poisson model with multiple factors.

Identification of the parameters of stable linear dynamical systems is a
wellstudied problem in the literature, both in the low and highdimensional
settings. However, there are hardly any results for the unstable case,
especially regarding finite time bounds. For this setting, classical results on
leastsquares estimation of the dynamics parameters are not applicable and
therefore new concepts and technical approaches need to be developed to address
the issue. Unstable linear systems arise in key real applications in control
theory, econometrics, and finance. This study establishes finite time bounds
for the identification error of the leastsquares estimates for a fairly large
class of heavytailed noise distributions, and transition matrices of such
systems. The results relate the time length (samples) required for estimation
to a function of the problem dimension and key characteristics of the true
underlying transition matrix and the noise distribution. To establish them,
appropriate concentration inequalities for random matrices and for sequences of
martingale differences are leveraged.

We give a general construction of debiased/locally robust/orthogonal (LR)
moment functions for GMM, where the derivative with respect to first step
nonparametric estimation is zero and equivalently first step estimation has no
effect on the influence function. This construction consists of adding an
estimator of the influence function adjustment term for first step
nonparametric estimation to identifying or original moment conditions. We also
give numerical methods for estimating LR moment functions that do not require
an explicit formula for the adjustment term.
LR moment conditions have reduced bias and so are important when the first
step is machine learning. We derive LR moment conditions for dynamic discrete
choice based on first step machine learning estimators of conditional choice
probabilities. We provide simple and general asymptotic theory for LR
estimators based on sample splitting. This theory uses the additive
decomposition of LR moment conditions into an identifying condition and a first
step influence adjustment. Our conditions require only mean square consistency
and a few (generally either one or two) readily interpretable rate conditions.
LR moment functions have the advantage of being less sensitive to first step
estimation. Some LR moment functions are also doubly robust meaning they hold
if one first step is incorrect. We give novel classes of doubly robust moment
functions and characterize double robustness. For doubly robust estimators our
asymptotic theory only requires one rate condition.