• We study factor models augmented by observed covariates that have explanatory powers on the unknown factors. In financial factor models, the unknown factors can be reasonably well explained by a few observable proxies, such as the Fama-French factors. In diffusion index forecasts, identified factors are strongly related to several directly measurable economic variables such as consumption-wealth variable, financial ratios, and term spread. With those covariates, both the factors and loadings are identifiable up to a rotation matrix even only with a finite dimension. To incorporate the explanatory power of these covariates, we propose a smoothed principal component analysis (PCA): (i) regress the data onto the observed covariates, and (ii) take the principal components of the fitted data to estimate the loadings and factors. This allows us to accurately estimate the percentage of both explained and unexplained components in factors and thus to assess the explanatory power of covariates. We show that both the estimated factors and loadings can be estimated with improved rates of convergence compared to the benchmark method. The degree of improvement depends on the strength of the signals, representing the explanatory power of the covariates on the factors. The proposed estimator is robust to possibly heavy-tailed distributions. We apply the model to forecast US bond risk premia, and find that the observed macroeconomic characteristics contain strong explanatory powers of the factors. The gain of forecast is more substantial when the characteristics are incorporated to estimate the common factors than directly used for forecasts.
  • This paper studies model selection consistency for high dimensional sparse regression when data exhibits both cross-sectional and serial dependency. Most commonly-used model selection methods fail to consistently recover the true model when the covariates are highly correlated. Motivated by econometric studies, we consider the case where covariate dependence can be reduced through factor model, and propose a consistent strategy named Factor-Adjusted Regularized Model Selection (FarmSelect). By separating the latent factors from idiosyncratic components, we transform the problem from model selection with highly correlated covariates to that with weakly correlated variables. Model selection consistency as well as optimal rates of convergence are obtained under mild conditions. Numerical studies demonstrate the nice finite sample performance in terms of both model selection and out-of-sample prediction. Moreover, our method is flexible in a sense that it pays no price for weakly correlated and uncorrelated cases. Our method is applicable to a wide range of high dimensional sparse regression problems. An R-package FarmSelect is also provided for implementation.
  • In this paper, we study the model selection and structure specification for the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to be larger than the sample size. We first propose a penalised likelihood method with the LASSO penalty function to obtain the preliminary estimates of the functional coefficients. Then, using the quadratic approximation for the local log-likelihood function and the adaptive group LASSO penalty (or the local linear approximation of the group SCAD penalty) with the help of the preliminary estimation of the functional coefficients, we introduce a novel penalised weighted least squares procedure to select the significant covariates and identify the constant coefficients among the coefficients of the selected covariates, which could thus specify the semiparametric modelling structure. The developed model selection and structure specification approach not only inherits many nice statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the computational algorithm that is proposed to implement our method. Under some mild conditions, we establish the asymptotic properties for the proposed model selection and estimation procedure such as the sparsity and oracle property. We also conduct simulation studies to examine the finite sample performance of the proposed method, and finally apply the method to analyse a real data set, which leads to some interesting findings.