• This paper proposes new nonparametric diagnostic tools to assess the asymptotic validity of different treatment effects estimators that rely on the correct specification of the propensity score. We derive a particular restriction relating the propensity score distribution of treated and control groups, and develop specification tests based upon it. The resulting tests do not suffer from the "curse of dimensionality" when the vector of covariates is high-dimensional, are fully data-driven, do not require tuning parameters such as bandwidths, and are able to detect a broad class of local alternatives converging to the null at the parametric rate $n^{-1/2}$, with $n$ the sample size. We show that the use of an orthogonal projection on the tangent space of nuisance parameters facilitates the simulation of critical values by means of a multiplier bootstrap procedure, and can lead to power gains. The finite sample performance of the tests is examined by means of a Monte Carlo experiment and an empirical application. Open-source software is available for implementing the proposed tests.
  • Given a set of baseline assumptions, a breakdown frontier is the boundary between the set of assumptions which lead to a specific conclusion and those which do not. In a potential outcomes model with a binary treatment, we consider two conclusions: First, that ATE is at least a specific value (e.g., nonnegative) and second that the proportion of units who benefit from treatment is at least a specific value (e.g., at least 50\%). For these conclusions, we derive the breakdown frontier for two kinds of assumptions: one which indexes relaxations of the baseline random assignment of treatment assumption, and one which indexes relaxations of the baseline rank invariance assumption. These classes of assumptions nest both the point identifying assumptions of random assignment and rank invariance and the opposite end of no constraints on treatment selection or the dependence structure between potential outcomes. This frontier provides a quantitative measure of robustness of conclusions to relaxations of the baseline point identifying assumptions. We derive $\sqrt{N}$-consistent sample analog estimators for these frontiers. We then provide two asymptotically valid bootstrap procedures for constructing lower uniform confidence bands for the breakdown frontier. As a measure of robustness, estimated breakdown frontiers and their corresponding confidence bands can be presented alongside traditional point estimates and confidence intervals obtained under point identifying assumptions. We illustrate this approach in an empirical application to the effect of child soldiering on wages. We find that sufficiently weak conclusions are robust to simultaneous failures of rank invariance and random assignment, while some stronger conclusions are fairly robust to failures of rank invariance but not necessarily to relaxations of random assignment.
  • We propose a method of estimating the linear-in-means model of peer effects in which the peer group, defined by a social network, is endogenous in the outcome equation for peer effects. Endogeneity is due to unobservable individual characteristics that influence both link formation in the network and the outcome of interest. We propose two estimators of the peer effect equation that control for the endogeneity of the social connections using a control function approach. We leave the functional form of the control function unspecified and treat it as unknown. To estimate the model, we use a sieve semiparametric approach, and we establish asymptotics of the semiparametric estimator.
  • We propose a bootstrap-based calibrated projection procedure to build confidence intervals for single components and for smooth functions of a partially identified parameter vector in moment (in)equality models. The method controls asymptotic coverage uniformly over a large class of data generating processes. The extreme points of the calibrated projection confidence interval are obtained by extremizing the value of the function of interest subject to a proper relaxation of studentized sample analogs of the moment (in)equality conditions. The degree of relaxation, or critical level, is calibrated so that the function of theta, not theta itself, is uniformly asymptotically covered with prespecified probability. This calibration is based on repeatedly checking feasibility of linear programming problems, rendering it computationally attractive. Nonetheless, the program defining an extreme point of the confidence interval is generally nonlinear and potentially intricate. We provide an algorithm, based on the response surface method for global optimization, that approximates the solution rapidly and accurately, and we establish its rate of convergence. The algorithm is of independent interest for optimization problems with simple objectives and complicated constraints. An empirical application estimating an entry game illustrates the usefulness of the method. Monte Carlo simulations confirm the accuracy of the solution algorithm, the good statistical as well as computational performance of calibrated projection (including in comparison to other methods), and the algorithm's potential to greatly accelerate computation of other confidence intervals.
  • To quantify uncertainty around point estimates of conditional objects such as conditional means or variances, parameter uncertainty has to be taken into account. Attempts to incorporate parameter uncertainty are typically based on the unrealistic assumption of observing two independent processes, where one is used for parameter estimation, and the other for conditioning upon. Such unrealistic foundation raises the question whether these intervals are theoretically justified in a realistic setting. This paper presents an asymptotic justification for this type of intervals that does not require such an unrealistic assumption, but relies on a sample-split approach instead. By showing that our sample-split intervals coincide asymptotically with the standard intervals, we provide a novel, and realistic, justification for confidence intervals of conditional objects. The analysis is carried out for a rich class of time series models.
  • We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental parameter problem, and are asymptotically unbiased under asymptotic sequences where $N/T$ converges to a constant. We develop inference methods and show that they perform well in numerical examples.
  • In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application-specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to this problem motivated by the theory of semiparametrically efficient estimation. Our method can be used to optimize either binary treatments or infinitesimal nudges to continuous treatments, and can leverage observational data where causal effects are identified using a variety of strategies, including selection on observables and instrumental variables. Given a doubly robust estimator of the causal effect of assigning everyone to treatment, we develop an algorithm for choosing whom to treat, and establish strong guarantees for the asymptotic utilitarian regret of the resulting policy.
  • Structural econometric methods are often criticized for being sensitive to functional form assumptions. We study parametric estimators of the local average treatment effect (LATE) derived from a widely used class of latent threshold crossing models and show they yield LATE estimates algebraically equivalent to the instrumental variables (IV) estimator. Our leading example is Heckman's (1979) two-step ("Heckit") control function estimator which, with two-sided non-compliance, can be used to compute estimates of a variety of causal parameters. Equivalence with IV is established for a semi-parametric family of control function estimators and shown to hold at interior solutions for a class of maximum likelihood estimators. Our results suggest differences between structural and IV estimates often stem from disagreements about the target parameter rather than from functional form assumptions per se. In cases where equivalence fails, reporting structural estimates of LATE alongside IV provides a simple means of assessing the credibility of structural extrapolation exercises.
  • Oversubscribed treatments are often allocated using randomized waiting lists. Applicants are ranked randomly, and treatment offers are made following that ranking until all seats are filled. To estimate causal effects, researchers often compare applicants getting and not getting an offer. We show that those two groups are not statistically comparable. Therefore, the estimator arising from that comparison is inconsistent. We propose a new estimator, and show that it is consistent. Finally, we revisit an application, and we show that using our estimator can lead to sizably different results from those obtained using the commonly used estimator.
  • This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by $p$, is possibly much larger than the sample size $n$. There is a variety of economic applications where solving this problem allows to carry out inference on causal and structural parameters, a notable example is the market structure model of Ciliberto and Tamer (2009) where $p=2^{m+1}$ with $m$ being the number of firms that could possibly enter the market. We consider the test statistic given by the maximum of $p$ Studentized (or $t$-type) inequality-specific statistics, and analyze various ways to compute critical values for the test statistic. Specifically, we consider critical values based upon (i) the union bound combined with a moderate deviation inequality for self-normalized sums, (ii) the multiplier and empirical bootstraps, and (iii) two-step and three-step variants of (i) and (ii) by incorporating the selection of uninformative inequalities that are far from being binding and a novel selection of weakly informative inequalities that are potentially binding but do not provide first order information. We prove validity of these methods, showing that under mild conditions, they lead to tests with the error in size decreasing polynomially in $n$ while allowing for $p$ being much larger than $n$, indeed $p$ can be of order $\exp (n^{c})$ for some $c > 0$. Importantly, all these results hold without any restriction on the correlation structure between $p$ Studentized statistics, and also hold uniformly with respect to suitably large classes of underlying distributions. Moreover, in the online supplement, we show validity of a test based on the block multiplier bootstrap in the case of dependent data under some general mixing conditions.
  • This paper presents a new estimator of the intercept of a linear regression model in cases where the outcome varaible is observed subject to a selection rule. The intercept is often in this context of inherent interest; for example, in a program evaluation context, the difference between the intercepts in outcome equations for participants and non-participants can be interpreted as the difference in average outcomes of participants and their counterfactual average outcomes if they had chosen not to participate. The new estimator can under mild conditions exhibit a rate of convergence in probability equal to $n^{-p/(2p+1)}$, where $p\ge 2$ is an integer that indexes the strength of certain smoothness assumptions. This rate of convergence is shown in this context to be the optimal rate of convergence for estimation of the intercept parameter in terms of a minimax criterion. The new estimator, unlike other proposals in the literature, is under mild conditions consistent and asymptotically normal with a rate of convergence that is the same regardless of the degree to which selection depends on unobservables in the outcome equation. Simulation evidence and an empirical example are included.
  • We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution functions which, in its simplest form, is the dual program of a simultaneous estimator for linear location-scale models. We apply our general characterization to the specification and estimation of a flexible class of conditional distribution functions, and present asymptotic theory for the corresponding empirical dual regression process.
  • This paper develops and implements a nonparametric test of Random Utility Models. The motivating application is to test the null hypothesis that a sample of cross-sectional demand distributions was generated by a population of rational consumers. We test a necessary and sufficient condition for this that does not rely on any restriction on unobserved heterogeneity or the number of goods. We also propose and implement a control function approach to account for endogenous expenditure. An econometric result of independent interest is a test for linear inequality constraints when these are represented as the vertices of a polyhedron rather than its faces. An empirical application to the U.K. Household Expenditure Survey illustrates computational feasibility of the method in demand problems with 5 goods.
  • The ongoing net neutrality debate has generated a lot of heated discussions on whether or not monetary interactions should be regulated between content and access providers. Among the several topics discussed, `differential pricing' has recently received attention due to `zero-rating' platforms proposed by some service providers. In the differential pricing scheme, Internet Service Providers (ISPs) can exempt data access charges for on content from certain CPs (zero-rated) while no exemption is on content from other CPs. This allows the possibility for Content Providers (CPs) to make `sponsorship' agreements to zero-rate their content and attract more user traffic. In this paper, we study the effect of differential pricing on various players in the Internet. We first consider a model with a monopolistic ISP and multiple CPs where users select CPs based on the quality of service (QoS) and data access charges. We show that in a differential pricing regime 1) a CP offering low QoS can make have higher surplus than a CP offering better QoS through sponsorships. 2) Overall QoS (mean delay) for end users can degrade under differential pricing schemes. In the oligopolistic market with multiple ISPs, users tend to select the ISP with lowest ISP resulting in same type of conclusions as in the monopolistic market. We then study how differential pricing effects the revenue of ISPs.
  • We study factor models augmented by observed covariates that have explanatory powers on the unknown factors. In financial factor models, the unknown factors can be reasonably well explained by a few observable proxies, such as the Fama-French factors. In diffusion index forecasts, identified factors are strongly related to several directly measurable economic variables such as consumption-wealth variable, financial ratios, and term spread. With those covariates, both the factors and loadings are identifiable up to a rotation matrix even only with a finite dimension. To incorporate the explanatory power of these covariates, we propose a smoothed principal component analysis (PCA): (i) regress the data onto the observed covariates, and (ii) take the principal components of the fitted data to estimate the loadings and factors. This allows us to accurately estimate the percentage of both explained and unexplained components in factors and thus to assess the explanatory power of covariates. We show that both the estimated factors and loadings can be estimated with improved rates of convergence compared to the benchmark method. The degree of improvement depends on the strength of the signals, representing the explanatory power of the covariates on the factors. The proposed estimator is robust to possibly heavy-tailed distributions. We apply the model to forecast US bond risk premia, and find that the observed macroeconomic characteristics contain strong explanatory powers of the factors. The gain of forecast is more substantial when the characteristics are incorporated to estimate the common factors than directly used for forecasts.
  • With the advent of Big Data, nowadays in many applications databases containing large quantities of similar time series are available. Forecasting time series in these domains with traditional univariate forecasting procedures leaves great potentials for producing accurate forecasts untapped. Recurrent neural networks (RNNs), and in particular Long Short-Term Memory (LSTM) networks, have proven recently that they are able to outperform state-of-the-art univariate time series forecasting methods in this context when trained across all available time series. However, if the time series database is heterogeneous, accuracy may degenerate, so that on the way towards fully automatic forecasting methods in this space, a notion of similarity between the time series needs to be built into the methods. To this end, we present a prediction model that can be used with different types of RNN models on subgroups of similar time series, which are identified by time series clustering techniques. We assess our proposed methodology using LSTM networks, a widely popular RNN variant. Our method achieves competitive results on benchmarking datasets under competition evaluation procedures. In particular, in terms of mean sMAPE accuracy, it consistently outperforms the baseline LSTM model and outperforms all other methods on the CIF2016 forecasting competition dataset.
  • Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables. It is based on a natural transformation of simultaneous confidence bands for distribution functions, which are readily available for many problems. The construction is generic and does not depend on the nature of the underlying problem. It works in conjunction with parametric, semiparametric, and nonparametric modeling methods for observed and counterfactual distributions, and does not depend on the sampling scheme. We apply our method to characterize the distributional impact of insurance coverage on health care utilization and obtain the distributional decomposition of the racial test score gap. We find that universal insurance coverage increases the number of doctor visits across the entire distribution, and that the racial test score gap is small at early ages but grows with age due to socio economic factors affecting child development especially at the top of the distribution. These are new, interesting empirical findings that complement previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering existing inference methods invalid for obtaining uniform confidence bands for observed and counterfactual quantile functions and for their difference -- the quantile effects functions.
  • In treatment allocation problems the individuals to be treated often arrive sequentially. We study a problem in which the policy maker is not only interested in the expected cumulative welfare but is also concerned about the uncertainty/risk of the treatment outcomes. At the outset, the total number of treatment assignments to be made may even be unknown. A sequential treatment policy which attains the minimax optimal regret is proposed. We also demonstrate that the expected number of suboptimal treatments only grows slowly in the number of treatments. Finally, we study a setting where outcomes are only observed with delay.
  • Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the QR-series coefficient process, namely we obtain uniform strong approximations to the QR-series coefficient process by conditionally pivotal and Gaussian processes. Based on these strong approximations, or couplings, we develop four resampling methods (pivotal, gradient bootstrap, Gaussian, and weighted bootstrap) that can be used for inference on the entire QR-series coefficient function. We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Specifically, we obtain uniform rates of convergence and show how to use the four resampling methods mentioned above for inference on the functionals. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and the covariate value, and covering the pointwise case as a by-product. We demonstrate the practical utility of these results with an example, where we estimate the price elasticity function and test the Slutsky condition of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.
  • This paper proposes a straightforward algorithm to carry out inference in large time-varying parameter vector autoregressions (TVP-VARs) with mixture innovation components for each coefficient in the system. We significantly decrease the computational burden by approximating the latent indicators that drive the time-variation in the coefficients with a latent threshold process that depends on the absolute size of the shocks. The merits of our approach are illustrated with two applications. First, we forecast the US term structure of interest rates and demonstrate forecast gains of the proposed mixture innovation model relative to other benchmark models. Second, we apply our approach to US macroeconomic data and find significant evidence for time-varying effects of a monetary policy tightening.
  • This paper considers inference on fixed effects in a linear regression model estimated from network data. An important special case of our setup is the two-way regression model. This is a workhorse technique in the analysis of matched data sets, such as employer-employee or student-teacher panel data. We formalize how the structure of the network affects the accuracy with which the fixed effects can be estimated. This allows us to derive sufficient conditions on the network for consistent estimation and asymptotically-valid inference to be possible. Estimation of moments is also considered. We allow for general networks and our setup covers both the dense and sparse case. We provide numerical results for the estimation of teacher value-added models and regressions with occupational dummies.
  • We develop a Bayesian vector autoregressive (VAR) model with multivariate stochastic volatility that is capable of handling vast dimensional information sets. Three features are introduced to permit reliable estimation of the model. First, we assume that the reduced-form errors in the VAR feature a factor stochastic volatility structure, allowing for conditional equation-by-equation estimation. Second, we apply recently developed global-local shrinkage priors to the VAR coefficients to cure the curse of dimensionality. Third, we utilize recent innovations to efficiently sample from high-dimensional multivariate Gaussian distributions. This makes simulation-based fully Bayesian inference feasible when the dimensionality is large but the time series length is moderate. We demonstrate the merits of our approach in an extensive simulation study and apply the model to US macroeconomic data to evaluate its forecasting capabilities.
  • Factor structures or interactive effects are convenient devices to incorporate latent variables in panel data models. We consider fixed effect estimation of nonlinear panel single-index models with factor structures in the unobservables, which include logit, probit, ordered probit and Poisson specifications. We establish that fixed effect estimators of model parameters and average partial effects have normal distributions when the two dimensions of the panel grow large, but might suffer of incidental parameter bias. We show how models with factor structures can also be applied to capture important features of network data such as reciprocity, degree heterogeneity, homophily in latent variables and clustering. We illustrate this applicability with an empirical example to the estimation of a gravity equation of international trade between countries using a Poisson model with multiple factors.
  • Identification of the parameters of stable linear dynamical systems is a well-studied problem in the literature, both in the low and high-dimensional settings. However, there are hardly any results for the unstable case, especially regarding finite time bounds. For this setting, classical results on least-squares estimation of the dynamics parameters are not applicable and therefore new concepts and technical approaches need to be developed to address the issue. Unstable linear systems arise in key real applications in control theory, econometrics, and finance. This study establishes finite time bounds for the identification error of the least-squares estimates for a fairly large class of heavy-tailed noise distributions, and transition matrices of such systems. The results relate the time length (samples) required for estimation to a function of the problem dimension and key characteristics of the true underlying transition matrix and the noise distribution. To establish them, appropriate concentration inequalities for random matrices and for sequences of martingale differences are leveraged.
  • We give a general construction of debiased/locally robust/orthogonal (LR) moment functions for GMM, where the derivative with respect to first step nonparametric estimation is zero and equivalently first step estimation has no effect on the influence function. This construction consists of adding an estimator of the influence function adjustment term for first step nonparametric estimation to identifying or original moment conditions. We also give numerical methods for estimating LR moment functions that do not require an explicit formula for the adjustment term. LR moment conditions have reduced bias and so are important when the first step is machine learning. We derive LR moment conditions for dynamic discrete choice based on first step machine learning estimators of conditional choice probabilities. We provide simple and general asymptotic theory for LR estimators based on sample splitting. This theory uses the additive decomposition of LR moment conditions into an identifying condition and a first step influence adjustment. Our conditions require only mean square consistency and a few (generally either one or two) readily interpretable rate conditions. LR moment functions have the advantage of being less sensitive to first step estimation. Some LR moment functions are also doubly robust meaning they hold if one first step is incorrect. We give novel classes of doubly robust moment functions and characterize double robustness. For doubly robust estimators our asymptotic theory only requires one rate condition.