• Understanding treatment heterogeneity is essential to the development of precision medicine, which seeks to tailor medical treatments to subgroups of patients with similar characteristics. One of the challenges to achieve this goal is that we usually do not have a priori knowledge of the grouping information of patients with respect to treatment. To address this problem, we consider a heterogeneous regression model by assuming that the coefficient for treatment variables are subject-dependent and belong to different subgroups with unknown grouping information. We develop a concave fusion penalized method for automatically estimating the grouping structure and the subgroup-specific treatment effects, and derive an alternating direction method of multipliers algorithm for its implementation. We also study the theoretical properties of the proposed method and show that under suitable conditions there exists a local minimizer that equals the oracle least squares estimator with a priori knowledge of the true grouping information with high probability. This provides theoretical support for making statistical inference about the subgroup-specific treatment effects based on the proposed method. We evaluate the performance of the proposed method by simulation studies and illustrate its application by analyzing the data from the AIDS Clinical Trials Group Study.
  • We propose an estimation methodology for a semiparametric quantile factor panel model. We provide tools for inference that are robust to the existence of moments and to the form of weak cross-sectional dependence in the idiosyncratic error term. We apply our method to daily stock return data.
  • In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the "large $p$ small $n$" setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.
  • An important step in developing individualized treatment strategies is to correctly identify subgroups of a heterogeneous population, so that specific treatment can be given to each subgroup. In this paper, we consider the situation with samples drawn from a population consisting of subgroups with different means, along with certain covariates. We propose a penalized approach for subgroup analysis based on a regression model, in which heterogeneity is driven by unobserved latent factors and thus can be represented by using subject-specific intercepts. We apply concave penalty functions to pairwise differences of the intercepts. This procedure automatically divides the observations into subgroups. We develop an alternating direction method of multipliers algorithm with concave penalties to implement the proposed approach and demonstrate its convergence. We also establish the theoretical properties of our proposed estimator and determine the order requirement of the minimal difference of signals between groups in order to recover them. These results provide a sound basis for making statistical inference in subgroup analysis. Our proposed method is further illustrated by simulation studies and analysis of the Cleveland heart disease dataset.
  • In genetic studies, not only can the number of predictors obtained from microarray measurements be extremely large, there can also be multiple response variables. Motivated by such a situation, we consider semiparametric dimension reduction methods in sparse multivariate regression models. Previous studies on joint variable and rank selection have focused on parametric models while here we consider the more challenging varying-coefficient models which make the investigation on nonlinear interactions of variables possible. Spline approximation, rank constraints and concave group penalties are utilized for model estimation. Asymptotic oracle properties of the estimators are presented. We also propose reduced-rank independent screening to deal with the situation when the dimension is so high that penalized estimation cannot be efficiently applied. In simulations, we show the advantages of simultaneously performing variable and rank selection. A real data set is analyzed to illustrate the good prediction performance when incorporating interactions between genetic variables and an index variable.
  • We propose a two-step estimating procedure for generalized additive partially linear models with clustered data using estimating equations. Our proposed method applies to the case that the number of observations per cluster is allowed to increase with the number of independent subjects. We establish oracle properties for the two-step estimator of each function component such that it performs as well as the univariate function estimator by assuming that the parametric vector and all other function components are known. Asymptotic distributions and consistency properties of the estimators are obtained. Finite-sample experiments with both simulated continuous and binary response variables confirm the asymptotic results. We illustrate the methods with an application to a U.S. unemployment data set.
  • We consider the problem of simultaneous variable selection and estimation in additive, partially linear models for longitudinal/clustered data. We propose an estimation procedure via polynomial splines to estimate the nonparametric components and apply proper penalty functions to achieve sparsity in the linear part. Under reasonable conditions, we obtain the asymptotic normality of the estimators for the linear components and the consistency of the estimators for the nonparametric components. We further demonstrate that, with proper choice of the regularization parameter, the penalized estimators of the non-zero coefficients achieve the asymptotic oracle property. The finite sample behavior of the penalized estimators is evaluated with simulation studies and illustrated by a longitudinal CD4 cell count data set.