• We propose new tests for assessing whether covariates in a treatment group and matched control group are balanced in observational studies. The tests exhibit high power under a wide range of multivariate alternatives, some of which existing tests have little power for. The asymptotic permutation null distributions of the proposed tests are studied and the p-values calculated through the asymptotic results work well in finite samples, facilitating the application of the test to large data sets. The tests are illustrated in a study of the effect of smoking on blood lead levels. The proposed tests are implemented in an R package BalanceCheck.
  • Effect modification occurs when the effect of the treatment on an outcome varies according to the level of other covariates and often has important implications in decision making. When there are tens or hundreds of covariates, it becomes necessary to use the observed data to select a simpler model for effect modification and then make valid statistical inference. We propose a two stage procedure to solve this problem. First, we use Robinson's transformation to decouple the nuisance parameters from the treatment effect of interest and use machine learning algorithms to estimate the nuisance parameters. Next, after plugging in the estimates of the nuisance parameters, we use the Lasso to choose a low-complexity model for effect modification. Compared to a full model consisting of all the covariates, the selected model is much more interpretable. Compared to the univariate subgroup analyses, the selected model greatly reduces the number of false discoveries. We show that the conditional selective inference for the selected model is asymptotically valid given the rate assumptions in classical semiparametric regression. Extensive simulation studies are conducted to verify the asymptotic results and an epidemiological application is used to demonstrate the method.
  • Instrumental variable analysis is a widely used method to estimate causal effects in the presence of unmeasured confounding. When the instruments, exposure and outcome are not measured in the same sample, Angrist and Krueger (1992) suggested to use two-sample instrumental variable (TSIV) estimators that use sample moments from an instrument-exposure sample and an instrument-outcome sample. However, this method is biased if the two samples are from heterogeneous populations so that the distributions of the instruments are different. In linear structural equation models, we derive a new class of TSIV estimators that are robust to heterogeneous samples under the key assumption that the structural relations in the two samples are the same. The widely used two-sample two-stage least squares estimator belongs to this class. It is generally not asymptotically efficient, although we find that it performs similarly to the optimal TSIV estimator in most practical situations. We then attempt to relax the linearity assumption. We find that, unlike one-sample analyses, the TSIV estimator is not robust to misspecified exposure model. Additionally, to nonparametrically identify the magnitude of the causal effect, the noise in the exposure must have the same distributions in the two samples. However, this assumption is in general untestable because the exposure is not observed in one sample. Nonetheless, we may still identify the sign of the causal effect in the absence of homogeneity of the noise.
  • Instrumental variables are commonly used to estimate effects of a treatment afflicted by unmeasured confounding, and in practice instruments are often continuous (e.g., measures of distance, or treatment preference). However, available methods for continuous instruments have important limitations: they either require restrictive parametric assumptions for identification, or else rely on modeling both the outcome and treatment process well (and require modeling effect modification by all adjustment covariates). In this work we develop the first semiparametric doubly robust estimators of the local instrumental variable effect curve, i.e., the effect among those who would take treatment for instrument values above some threshold and not below. In addition to being robust to misspecification of either the instrument or treatment/outcome processes, our approach also incorporates information about the instrument mechanism and allows for flexible data-adaptive estimation of effect modification. We discuss asymptotic properties under weak conditions, and use the methods to study infant mortality effects of neonatal intensive care units with high versus low technical capacity, using travel time as an instrument.
  • In matched observational studies where treatment assignment is not randomized, sensitivity analysis helps investigators determine how sensitive their estimated treatment effect is to some unmeasured con- founder. The standard approach calibrates the sensitivity analysis according to the worst case bias in a pair. This approach will result in a conservative sensitivity analysis if the worst case bias does not hold in every pair. In this paper, we show that for binary data, the standard approach can be calibrated in terms of the average bias in a pair rather than worst case bias. When the worst case bias and average bias differ, the average bias interpretation results in a less conservative sensitivity analysis and more power. In many studies, the average case calibration may also carry a more natural interpretation than the worst case calibration and may also allow researchers to incorporate additional data to establish an empirical basis with which to calibrate a sensitivity analysis. We illustrate this with a study of the effects of cellphone use on the incidence of automobile accidents. Finally, we extend the average case calibration to the sensitivity analysis of confidence intervals for attributable effects.
  • Mendelian randomization (MR) is an instrumental variable method of estimating the causal effect of risk exposures in epidemiology, where genetic variants are used as instruments. With the increasing availability of large-scale genome-wide association studies, it is now possible to greatly improve the power of MR by using genetic variants that are only weakly relevant. We consider how to increase the efficiency of Mendelian randomization by a genome-wide design where more than a thousand genetic instruments are used. An empirical partially Bayes estimator is proposed, where weaker instruments are shrunken more heavily and thus brings less variation to the MR estimate. This is generally more efficient than the profile-likelihood-based estimator which gives no shrinkage to weak instruments. We apply our method to estimate the causal effect of blood lipids on cardiovascular diseases. We find high-density lipoprotein cholesterol (HDL-c) has a significantly protective effect on heart diseases, while previous MR studies reported null findings.
  • It is common in instrumental variable studies for instrument values to be missing, for example when the instrument is a genetic test in Mendelian randomization studies. In this paper we discuss two apparent paradoxes that arise in so-called single consent designs where there is one-sided noncompliance, i.e., where unencouraged units cannot access treatment. The first paradox is that, even under a missing completely at random assumption, a complete-case analysis is biased when knowledge of one-sided noncompliance is taken into account; this is not the case when such information is disregarded. This occurs because incorporating information about one-sided noncompliance induces a dependence between the missingness and treatment. The second paradox is that, although incorporating such information does not lead to efficiency gains without missing data, the story is different when instrument values are missing: there, incorporating such information changes the efficiency bound, allowing possible efficiency gains. This is because some of the missing values can be filled in, based on the fact that anyone who received treatment must have been encouraged by the instrument (since the unencouraged cannot access treatment).
  • Effect modification means the magnitude or stability of a treatment effect varies as a function of an observed covariate. Generally, larger and more stable treatment effects are insensitive to larger biases from unmeasured covariates, so a causal conclusion may be considerably firmer if this pattern is noted if it occurs. We propose a new strategy, called the submax-method, that combines exploratory and confirmatory efforts to determine whether there is stronger evidence of causality - that is, greater insensitivity to unmeasured confounding - in some subgroups of individuals. It uses the joint distribution of test statistics that split the data in various ways based on certain observed covariates. For $L$ binary covariates, the method splits the population $L$ times into two subpopulations, perhaps first men and women, perhaps then smokers and nonsmokers, computing a test statistic from each subpopulation, and appends the test statistic for the whole population, making $2L+1$ test statistics in total. Although $L$ binary covariates define $2^{L}$ interaction groups, only $2L+1$ tests are performed, and at least $L+1$ of these tests use at least half of the data. The submax-method achieves the highest design sensitivity and the highest Bahadur efficiency of its component tests. Moreover, the form of the test is sufficiently tractable that its large sample power may be studied analytically. The simulation suggests that the submax method exhibits superior performance, in comparison with an approach using CART, when there is effect modification of moderate size. Using data from the NHANES I Epidemiologic Follow-Up Survey, an observational study of the effects of physical activity on survival is used to illustrate the method. The method is implemented in the $\texttt{R}$ package $\texttt{submax}$ which contains the NHANES example.
  • Modern, high dimensional data has renewed investigation on instrumental variables (IV) analysis, primarily focusing on estimation of effects of endogenous variables and putting little attention towards specification tests. This paper studies in high dimensions the Durbin-Wu-Hausman (DWH) test, a popular specification test for endogeneity in IV regression. We show, surprisingly, that the DWH test maintains its size in high dimensions, but at an expense of power. We propose a new test that remedies this issue and has better power than the DWH test. Simulation studies reveal that our test achieves near-oracle performance to detect endogeneity.
  • Two problems that arise in making causal inferences for non-mortality outcomes such as bronchopulmonary dysplasia (BPD) are unmeasured confounding and censoring by death, i.e., the outcome is only observed when subjects survive. In randomized experiments with noncompliance, instrumental variable methods can be used to control for the unmeasured confounding without censoring by death. But when there is censoring by death, the average causal treatment effect cannot be identified under usual assumptions, but can be studied for a specific subpopulation by using sensitivity analysis with additional assumptions. However, in observational studies, evaluation of the local average treatment effect (LATE) in censoring by death problems with unmeasured confounding is not well studied. We develop a novel sensitivity analysis method based on instrumental variable models for studying the LATE. Specifically, we present the identification results under an additional assumption, and propose a three-step procedure for the LATE estimation. Also, we propose an improved two-step procedure by simultaneously estimating the instrument propensity score (i.e., the probability of instrument given covariates) and the parameters induced by the assumption. We have shown with simulation studies that the two-step procedure can be more robust and efficient than the three-step procedure. Finally, we apply our sensitivity analysis methods to a study of the effect of delivery at high-level neonatal intensive care units on the risk of BPD.
  • Studies have shown that exposure to air pollution, even at low levels, significantly increases mortality. As regulatory actions are becoming prohibitively expensive, robust evidence to guide the development of targeted interventions to reduce air pollution exposure is needed. In this paper, we introduce a novel statistical method that splits the data into two subsamples: (a) Using the first subsample, we consider a data-driven search for $\textit{de novo}$ discovery of subgroups that could have exposure effects that differ from the population mean; and then (b) using the second subsample, we quantify evidence of effect modification among the subgroups with nonparametric randomization-based tests. We also develop a sensitivity analysis method to assess the robustness of the conclusions to unmeasured confounding bias. Via simulation studies and theoretical arguments, we demonstrate that since we discover the subgroups in the first subsample, hypothesis testing on the second subsample can focus on theses subgroups only, thus substantially increasing the statistical power of the test. We apply our method to the data of 1,612,414 Medicare beneficiaries in New England region in the United States for the period 2000 to 2006. We find that seniors aged between 81-85 with low income and seniors aged above 85 have statistically significant higher causal effects of exposure to PM$_{2.5}$ on 5-year mortality rate compared to the population mean.
  • Mendelian randomization (MR) is a method of exploiting genetic variation to unbiasedly estimate a causal effect in presence of unmeasured confounding. MR is being widely used in epidemiology and other related areas of population science. In this paper, we study statistical inference in the increasingly popular two-sample summary-data MR design. We show a linear model for the observed associations approximately holds in a wide variety of settings when all the genetic variants satisfy the exclusion restriction assumption, or in genetic terms, when there is no pleiotropy. In this scenario, we derive a maximum profile likelihood estimator with provable consistency and asymptotic normality. However, through analyzing real datasets, we find strong evidence of both systematic and idiosyncratic pleiotropy in MR, echoing some recent discoveries in statistical genetics. We model the systematic pleiotropy by a random effects model, where no genetic variant satisfies the exclusion restriction condition exactly. In this case we propose a consistent and asymptotically normal estimator by adjusting the profile score. We then tackle the idiosyncratic pleiotropy by robustifying the adjusted profile score. We demonstrate the robustness and efficiency of the proposed methods using several simulated and real datasets.
  • In the evaluation of treatment effects, it is of major policy interest to know if the treatment is beneficial for some and harmful for others, a phenomenon known as qualitative interaction. We formulate this question as a multiple testing problem with many conservative null $p$-values, in which the classical multiple testing methods may lose power substantially. We propose a simple technique---conditioning---to improve the power. A crucial assumption we need is uniform conservativeness, meaning for any conservative $p$-value $p$, the conditional distribution $(p/\tau)\,|\,p \le \tau$ is stochastically larger than the uniform distribution on $(0,1)$ for any $\tau$. We show this property holds for one-sided tests in a one-dimensional exponential family (e.g.\ testing for qualitative interaction) as well as testing $|\mu|\le\eta$ using a statistic $X \sim \mathrm{N}(\mu,1)$ (e.g.\ testing for practical importance with threshold $\eta$). We propose an adaptive method to select the threshold $\tau$. Our theoretical and simulation results suggest the proposed tests gain significant power when many $p$-values are uniformly conservative and lose little power when no $p$-value is uniformly conservative. We apply our method to two educational intervention datasets.
  • A major challenge in instrumental variables (IV) analysis is to find instruments that are valid, or have no direct effect on the outcome and are ignorable. Typically one is unsure whether all of the putative IVs are in fact valid. We propose a general inference procedure in the presence of invalid IVs, called Two-Stage Hard Thresholding (TSHT) with voting. TSHT uses two hard thresholding steps to select strong instruments and generate candidate sets of valid IVs. Voting takes the candidate sets and uses majority and plurality rules to determine the true set of valid IVs. In low dimensions, if the sufficient and necessary identification condition under invalid instruments is met, which is more general than the so-called 50% rule or the majority rule, our proposal (i) correctly selects valid IVs, (ii) consistently estimates the causal effect, (iii) produces valid confidence intervals for the causal effect, and (iv) has oracle-optimal width. In high dimensions, we establish nearly identical results without oracle-optimality. In simulations, our proposal outperforms traditional and recent methods in the invalid IV literature. We also apply our method to re-analyze the causal effect of education on earnings.
  • In observational studies, the causal effect of a treatment on the distribution of outcomes is of interest beyond the average treatment effect. Instrumental variable methods allow for causal inference by controlling for unmeasured confounding. The existing nonparametric method for estimating the effect of the treatment on the distribution of outcomes for compliers has several drawbacks, such as producing estimates that violate the non-decreasing and non-negative properties of cumulative distribution functions. In this paper, we propose a novel nonparametric composite likelihood approach, referred to as the binomial likelihood (BL) method, which overcomes the limitations of the previous techniques and utilizes the advantage of likelihood methods. We show the consistency of the maximum binomial likelihood (MBL) estimators and derive their asymptotic distributions. Next, we develop a computationally efficient algorithm for computing the MBL estimates by combining the expectation-maximization (EM) and the pool-adjacent-violators algorithms (PAVA). Moreover, the BL method can be used to construct a binomial likelihood-ratio test (BLRT) for the null hypothesis of no distributional treatment effect. Asymptotic expansion of the BLRT test is derived and the performance of the BL method is demonstrated in simulation studies. Finally, we apply our method to a study of the effect of Vietnam veteran status on the distribution of civilian annual earnings.
  • Causal effects are commonly defined as comparisons of the potential outcomes under treatment and control, but this definition is threatened by the possibility that the treatment or control condition is not well-defined, existing instead in more than one version. A simple, widely applicable analysis is proposed to address the possibility that the treatment or control condition exists in two versions with two different treatment effects. This analysis loses no power in the main comparison of treatment and control, provides additional information about version effects, and controls the family-wise error rate in several comparisons. The method is motivated and illustrated using an on-going study of the possibility that repeated head trauma in high school football causes an increase in risk of early on-set dementia.
  • We discuss observational studies that test many causal hypotheses, either hypotheses about many outcomes or many treatments. To be credible an observational study that tests many causal hypotheses must demonstrate that its conclusions are neither artifacts of multiple testing nor of small biases from nonrandom treatment assignment. In a sense that needs to be defined carefully, hidden within a sensitivity analysis for nonrandom assignment is an enormous correction for multiple testing: in the absence of bias, it is extremely improbable that multiple testing alone would create an association insensitive to moderate biases. We propose a new strategy called "cross-screening", different from but motivated by recent work of Bogomolov and Heller on replicability. Cross-screening splits the data in half at random, uses the first half to plan a study carried out on the second half, then uses the second half to plan a study carried out on the first half, and reports the more favorable conclusions of the two studies correcting using the Bonferroni inequality for having done two studies. If the two studies happen to concur, then they achieve Bogomolov-Heller replicability; however, importantly, replicability is not required for strong control of the family-wise error rate, and either study alone suffices for firm conclusions. In randomized studies with a few hypotheses, cross-split screening is not an attractive method when compared with conventional methods of multiplicity control, but it can become attractive when hundreds or thousands of hypotheses are subjected to sensitivity analyses in an observational study. We illustrate the technique by comparing 46 biomarkers in individuals who consume large quantities of fish versus little or no fish.
  • An experimental unit is an opportunity to randomly apply or withhold a treatment. There is interference between units if the application of the treatment to one unit may also affect other units. In cognitive neuroscience, a common form of experiment presents a sequence of stimuli or requests for cognitive activity at random to each experimental subject and measures biological aspects of brain activity that follow these requests. Each subject is then many experimental units, and interference between units within an experimental subject is likely, in part because the stimuli follow one another quickly and in part because human subjects learn or become experienced or primed or bored as the experiment proceeds. We use a recent fMRI experiment concerned with the inhibition of motor activity to illustrate and further develop recently proposed methodology for inference in the presence of interference. A simulation evaluates the power of competing procedures.
  • Optogenetics is a new tool to study neuronal circuits that have been genetically modified to allow stimulation by flashes of light. We study recordings from single neurons within neural circuits under optogenetic stimulation. The data from these experiments present a statistical challenge of modeling a high frequency point process (neuronal spikes) while the input is another high frequency point process (light flashes). We further develop a generalized linear model approach to model the relationships between two point processes, employing additive point-process response functions. The resulting model, Point-process Responses for Optogenetics (PRO), provides explicit nonlinear transformations to link the input point process with the output one. Such response functions may provide important and interpretable scientific insights into the properties of the biophysical process that governs neural spiking in response to optogenetic stimulation. We validate and compare the PRO model using a real dataset and simulations, and our model yields a superior area-under-the- curve value as high as 93% for predicting every future spike. For our experiment on the recurrent layer V circuit in the prefrontal cortex, the PRO model provides evidence that neurons integrate their inputs in a sophisticated manner. Another use of the model is that it enables understanding how neural circuits are altered under various disease conditions and/or experimental conditions by comparing the PRO parameters.
  • There is effect modification if the magnitude or stability of a treatment effect varies systematically with the level of an observed covariate. A larger or more stable treatment effect is typically less sensitive to bias from unmeasured covariates, so it is important to recognize effect modification when it is present. We illustrate a recent proposal for conducting a sensitivity analysis that empirically discovers effect modification by exploratory methods, but controls the family-wise error rate in discovered groups. The example concerns a study of mortality and use of the intensive care unit in 23,715 matched pairs of two Medicare patients, one of whom underwent surgery at a hospital identified for superior nursing, the other at a conventional hospital. The pairs were matched exactly for 130 four-digit ICD-9 surgical procedure codes and balanced 172 observed covariates. The pairs were then split into five groups of pairs by CART in its effort to locate effect modification. The evidence of a beneficial effect of magnet hospitals on mortality is least sensitive to unmeasured biases in a large group of patients undergoing rather serious surgical procedures, but in the absence of other life-threatening conditions, such as a comorbidity of congestive heart failure or an emergency admission leading to surgery.
  • Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.
  • Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. However, in practice, some of the putative instrumental variables are likely to be invalid. This paper presents a simple and general approach to construct a confidence interval that is robust to possibly invalid instruments. The robust confidence interval has theoretical guarantees on having the correct coverage and can also be used to assess the sensitivity of inference when instrumental variables assumptions are violated. The paper also shows that the robust confidence interval outperforms traditional confidence intervals popular in instrumental variables literature when invalid instruments are present. The new approach is applied to a developmental economics study of the causal effect of income on food expenditures.
  • Mediation analysis seeks to understand the mechanism by which a treatment affects an outcome. Count or zero-inflated count outcome are common in many studies in which mediation analysis is of interest. For example, in dental studies, outcomes such as decayed, missing and filled teeth are typically zero inflated. Existing mediation analysis approaches for count data assume sequential ignorability of the mediator. This is often not plausible because the mediator is not randomized so that there are unmeasured confounders associated with the mediator and the outcome. In this paper, we develop causal methods based on instrumental variable (IV) approaches for mediation analysis for count data possibly with a lot of zeros that do not require the assumption of sequential ignorability. We first define the direct and indirect effect ratios for those data, and then propose estimating equations and use empirical likelihood to estimate the direct and indirect effects consistently. A sensitivity analysis is proposed for violations of the IV exclusion restriction assumption. Simulation studies demonstrate that our method works well for different types of outcomes under different settings. Our method is applied to a randomized dental caries prevention trial and a study of the effect of a massive flood in Bangladesh on children's diarrhea.
  • A potential causal relationship between head injuries sustained by NFL players and later-life neurological decline may have broad implications for participants in youth and high school football programs. However, brain trauma risk at the professional level may be different than that at the youth and high school levels and the long-term effects of participation at these levels is as-yet unclear. To investigate the effect of playing high school football on later life depression and cognitive functioning, we propose a retrospective observational study using data from the Wisconsin Longitudinal Study (WLS) of graduates from Wisconsin high schools in 1957. We compare 1,153 high school males who played varsity football to 2,751 male students who did not. 1,951 of the control subjects did not play any sport and the remaining 800 controls played a non-contact sport. We focus on two primary outcomes measured at age 65: a composite cognitive outcome measuring verbal fluency and memory and the modified CES-D depression score. To control for potential confounders we adjust for pre-exposure covariates such as IQ with matching and model-based covariate adjustment. We will conduct an ordered testing procedure that uses all 2,751 controls while controlling for possible unmeasured differences between students who played sports and those who did not. We will quantitatively assess the sensitivity of the results to potential unmeasured confounding. The study will also consider several secondary outcomes of clinical interest such as aggression and heavy drinking. The rich set of pre-exposure variables, relatively unbiased sampling, and longitudinal nature of the WLS dataset make the proposed analysis unique among related studies that rely primarily on convenience samples of football players with reported neurological symptoms.
  • Malaria is a parasitic disease that is a major health problem in many tropical regions. The most characteristic symptom of malaria is fever. The fraction of fevers that are attributable to malaria, the malaria attributable fever fraction (MAFF), is an important public health measure for assessing the effect of malaria control programs and other purposes. Estimating the MAFF is not straightforward because there is no gold standard diagnosis of a malaria attributable fever; an individual can have malaria parasites in her blood and a fever, but the individual may have developed partial immunity that allows her to tolerate the parasites and the fever is being caused by another infection. We define the MAFF using the potential outcome framework for causal inference and show what assumptions underlie current estimation methods. Current estimation methods rely on an assumption that the parasite density is correctly measured. However, this assumption does not generally hold because (i) fever kills some parasites and (ii) the measurement of parasite density has measurement error. In the presence of these problems, we show current estimation methods do not perform well. We propose a novel maximum likelihood estimation method based on exponential family g-modeling. Under the assumption that the measurement error mechanism and the magnitude of the fever killing effect are known, we show that our proposed method provides approximately unbiased estimates of the MAFF in simulation studies. A sensitivity analysis can be used to assess the impact of different magnitudes of fever killing and different measurement error mechanisms. We apply our proposed method to estimate the MAFF in Kilombero, Tanzania.