• Randomly censored survival data are frequently encountered in applied sciences including biomedical or reliability applications and clinical trial analyses. Testing the significance of statistical hypotheses is crucial in such analyses to get conclusive inference but the existing likelihood based tests, under a fully parametric model, are extremely non-robust against outliers in the data. Although, there exists a few robust parameter estimators (e.g., M-estimators and minimum density power divergence estimators) given randomly censored data, there is hardly any robust testing procedure available in the literature in this context. One of the major difficulties in this context is the construction of a suitable consistent estimator of the asymptotic variance of M estimators; the latter is a function of the unknown censoring distribution. In this paper, we take the first step in this direction by proposing a consistent estimator of asymptotic variance of the M-estimators based on randomly censored data without any assumption on the form of the censoring scheme. We then describe and study a class of robust Wald-type tests for parametric statistical hypothesis, both simple as well as composite, under such set-up, along with their general asymptotic and robustness properties. Robust tests for comparing two independent randomly censored samples and robust tests against one sided alternatives are also discussed. Their advantages and usefulness are demonstrated for the tests based on the minimum density power divergence estimators with specific attention to clinical trial analyses.
  • New massively redundant low frequency arrays allow for a novel investigation of closure relations in interferometry. We employ commissioning data from the Hydrogen Epoch of Reionization Array to investigate closure quantities in this densely packed grid array of 14m antennas operating at 100 MHz to 200 MHz. We investigate techniques that utilize closure phase spectra for redundant triads to estimate departures from redundancy for redundant baseline visibilities. We find a median absolute deviation from redundancy in closure phase across the observed frequency range of about 4.5deg. This value translates into a non-redundancy per visibility phase of about 2.6deg, using prototype electronics. The median absolute deviations from redundancy decrease with longer baselines. We show that closure phase spectra can be used to identify ill-behaved antennas in the array, independent of calibration. We investigate the temporal behavior of closure spectra. The Allan variance increases after a one minute stride time, due to passage of the sky through the primary beam of the transit telescope. However, the closure spectra repeat to well within the noise per measurement at corresponding local sidereal times (LST) from day to day. In future papers in this series we will develop the technique of using closure phase spectra in the search for the HI 21cm signal from cosmic reionization.
  • We propose a sparse regression method based on the non-concave penalized density power divergence loss function which is robust against infinitesimal contamination in very high dimensionality. Present methods of sparse and robust regression are based on $\ell_1$-penalization, and their theoretical properties are not well-investigated. In contrast, we use a general class of folded concave penalties that ensure sparse recovery and consistent estimation of regression coefficients. We propose an alternating algorithm based on the Concave-Convex procedure to obtain our estimate, and demonstrate its robustness properties using influence function analysis. Under some conditions on the fixed design matrix and penalty function, we prove that this estimator possesses large-sample oracle properties in an ultrahigh-dimensional regime. The performance and effectiveness of our proposed method for parameter estimation and prediction compared to state-of-the-art are demonstrated through simulation studies.
  • We consider the problem of robust inference under the important generalized linear model (GLM) with stochastic covariates. We derive the properties of the minimum density power divergence estimator of the parameters in GLM with random design and used this estimator to propose a robust Wald-type test for testing any general composite null hypothesis about the GLM. The asymptotic and robustness properties of the proposed test are also examined for the GLM with random design. Application of the proposed robust inference procedures to the popular Poisson regression model for analyzing count data is discussed in detail both theoretically and numerically with some interesting real data examples.
  • Data on rates, percentages or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology and several others. In this paper, we develop a robust inference procedure for the beta regression model which is used to describe such response variables taking values in $(0, 1)$ through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.
  • Minimum divergence methods are popular tools in a variety of statistical applications. We consider tubular model adequacy tests, and demonstrate that the new divergences that are generated in the process are very useful in robust statistical inference. In particular we show that the family of $S$-divergences can be alternatively developed using the tubular model adequacy tests; a further application of the paradigm generates a larger superfamily of divergences. We describe the properties of this larger class and its potential applications in robust inference. Along the way, the failure of the first order influence function analysis in capturing the robustness of these procedures is also established.
  • Direct detection of the Epoch of Reionization (EoR) via the red-shifted 21-cm line will have unprecedented implications on the study of structure formation in the infant Universe. To fulfill this promise, current and future 21-cm experiments need to detect this weak EoR signal in the presence of foregrounds that are several orders of magnitude larger. This requires extreme noise control and improved wide-field high dynamic-range imaging techniques. We propose a new imaging method based on a maximum likelihood framework which solves for the interferometric equation directly on the sphere, or equivalently in the $uvw$-domain. The method uses the one-to-one relation between spherical waves and spherical harmonics (SpH). It consistently handles signals from the entire sky, and does not require a $w$-term correction. The spherical-harmonics coefficients represent the sky-brightness distribution and the visibilities in the $uvw$-domain, and provide a direct estimate of the spatial power spectrum. Using these spectrally-smooth SpH coefficients, bright foregrounds can be removed from the signal, including their side-lobe noise, which is one of the limiting factors in high dynamics range wide-field imaging. Chromatic effects causing the so-called "wedge" are effectively eliminated (i.e. deconvolved) in the cylindrical ($k_{\perp}, k_{\parallel}$) power spectrum, compared to a power spectrum computed directly from the images of the foreground visibilities where the wedge is clearly present. We illustrate our method using simulated LOFAR observations, finding an excellent reconstruction of the input EoR signal with minimal bias.
  • Although Bayesian inference is an immensely popular paradigm among a large segment of scientists including statisticians, most applications consider objective priors and need critical investigations (Efron, 2013, Science). While it has several optimal properties, a major drawback of Bayesian inference is the lack of robustness against data contamination and model misspecification, which becomes pernicious in the use of objective priors. This paper presents the general formulation of a Bayes pseudo-posterior distribution yielding robust inference. Exponential convergence results related to the new pseudo-posterior and the corresponding Bayes estimators are established under the general parametric set-up and illustrations are provided for the independent stationary as well as non-homogeneous models. Several additional details and properties of the procedure are described, including the estimation under fixed-design regression models.
  • This paper considers the problem of robust hypothesis testing under non-identically distributed data. We propose Wald-type tests for both simple and composite hypothesis for independent but non-homogeneous observations based on the robust minimum density power divergence estimator of the common underlying parameter. Asymptotic and theoretical robustness properties of the proposed tests have been discussed. Application to the problem of testing the general linear hypothesis in a generalized linear model with fixed-design has been considered in detail with specific illustrations for its special cases under normal and Poisson distributions.
  • Experiments often yield non-identically distributed data for statistical analysis. Tests of hypothesis under such set-ups are generally performed using the likelihood ratio test, which is non-robust with respect to outliers and model misspecification. In this paper, we consider the set-up of non-identically but independently distributed observations and develop a general class of test statistics for testing parametric hypothesis based on the density power divergence. The proposed tests have bounded influence functions, are highly robust with respect to data contamination, have high power against contiguous alternatives, and are consistent at any fixed alternative. The methodology is illustrated by the simple and generalized linear regression models with fixed covariates.
  • The "Tapered Gridded Estimator" (TGE) is a novel way to directly estimate the angular power spectrum from radio-interferometric visibility data that reduces the computation by efficiently gridding the data, consistently removes the noise bias, and suppresses the foreground contamination to a large extent by tapering the primary beam response through an appropriate convolution in the visibility domain. Here we demonstrate the effectiveness of TGE in recovering the diffuse emission power spectrum through numerical simulations. We present details of the simulation used to generate low frequency visibility data for sky model with extragalactic compact radio sources and diffuse Galactic synchrotron emission. We then use different imaging strategies to identify the most effective option of point source subtraction and to study the underlying diffuse emission. Finally, we apply TGE to the residual data to measure the angular power spectrum, and assess the impact of incomplete point source subtraction in recovering the input power spectrum $C_{\ell}$ of the synchrotron emission. This estimator is found to successfully recovers the $C_{\ell}$ of input model from the residual visibility data. These results are relevant for measuring the diffuse emission like the Galactic synchrotron emission. It is also an important step towards characterizing and removing both diffuse and compact foreground emission in order to detect the redshifted $21\, {\rm cm}$ signal from the Epoch of Reionization.
  • Characterizing the diffuse Galactic synchrotron emission at arcminute angular scales is needed to reliably remove foregrounds in cosmological 21-cm measurements. The study of this emission is also interesting in its own right. Here, we quantify the fluctuations of the diffuse Galactic synchrotron emission using visibility data for two of the fields observed by the TIFR GMRT Sky Survey (TGSS). We have used the 2D Tapered Gridded Estimator (TGE) to estimate the angular power spectrum $(C_{\ell})$ from the visibilities. We find that the sky signal, after subtracting the point sources, is likely dominated by the diffuse Galactic synchrotron radiation across the angular multipole range $240 \le \ell \lesssim 500$. We present a power law fit, $C_{\ell}=A\times\big(\frac{1000}{l}\big)^{\beta}$, to the measured $C_{\ell}$ over this $\ell$ range. We find that $(A,\beta)$ have values $(356\pm109~{\rm mK^2},2.8\pm0.3)$ and $(54\pm26~{\rm mK^2},2.2\pm0.4)$ in the two fields. For the second field, however, there is indication of a significant residual point source contribution, and for this field we interpret the measured $C_{\ell}$ as an upper limit for the diffuse Galactic synchrotron emission. While in both fields the slopes are consistent with earlier measurements, the second field appears to have an amplitude which is considerably smaller compared to similar measurements in other parts of the sky.
  • Parametric hypothesis testing associated with two independent samples arises frequently in several applications in biology, medical sciences, epidemiology, reliability and many more. In this paper, we propose robust Wald-type tests for testing such two sample problems using the minimum density power divergence estimators of the underlying parameters. In particular, we consider the simple two-sample hypothesis concerning the full parametric homogeneity of the samples as well as the general two-sample (composite) hypotheses involving nuisance parameters also. The asymptotic and theoretical robustness properties of the proposed Wald-type tests have been developed for both the simple and general composite hypotheses. Some particular cases of testing against one-sided alternatives are discussed with specific attention to testing the effectiveness of a treatment in clinical trials. Performances of the proposed tests have also been illustrated numerically through appropriate real data examples.
  • This paper considers the problem of inliers and empty cells and the resulting issue of relative inefficiency in estimation under pure samples from a discrete population when the sample size is small. Many minimum divergence estimators in the $S$-divergence family, although possessing very strong outlier stability properties, often have very poor small sample efficiency in the presence of inliers and some are not even defined in the presence of a single empty cell; this limits the practical applicability of these estimators, in spite of their otherwise sound robustness properties and high asymptotic efficiency. Here, we will study a penalized version of the $S$-divergences such that the resulting minimum divergence estimators are free from these issues without altering their robustness properties and asymptotic efficiencies. We will give a general proof for the asymptotic properties of these minimum penalized $S$-divergence estimators. This provides a significant addition to the literature as the asymptotics of penalized divergences which are not finitely defined are currently unavailable in the literature. The small sample advantages of the minimum penalized $S$-divergence estimators are examined through an extensive simulation study and some empirical suggestions regarding the choice of the relevant underlying tuning parameters are also provided.
  • The diffuse Galactic synchrotron emission (DGSE) is the most important diffuse foreground component for future cosmological 21-cm observations. The DGSE is also an important probe of the cosmic ray electron and magnetic field distributions in the turbulent interstellar medium (ISM) of our Galaxy. In this paper we briefly review the Tapered Gridded Estimator (TGE) which can be used to quantify the angular power spectrum of the sky signal directly from the visibilities measured in radio-interferometric observations. The salient features of the TGE are (1.) it deals with the gridded data which makes it computationally very fast (2.) it avoids a positive noise bias which normally arises from the system noise inherent to the visibility data, and (3.) it allows us to taper the sky response and thereby suppresses the contribution from unsubtracted point sources in the outer parts and the sidelobes of the antenna beam pattern. We also summarize earlier work where the TGE was used to measure the C_l of the DGSE using 150 MHz GMRT data. Earlier measurements of the angular power spectrum are restricted to smaller angular multipole l ~ 10^3 for the DGSE, the signal at the larger l values is dominated by the residual point sources after source subtraction. The higher sensitivity of the upcoming SKA1 Low will allow the point sources to be subtracted to a fainter level than possible with existing telescopes. We predict that it will be possible to measure the angular power spectrum of the DGSE to larger values of l with SKA1 Low. Our results show that it should be possible to achieve l_{max} ~ 10^4 and ~ 10^5 with 2 minutes and 10 hours of observations respectively.
  • In this paper a robust version of the classical Wald test statistics for linear hypothesis in the logistic regression model is introduced and its properties are explored. We study the problem under the assumption of random covariates although some ideas with non random covariates are also considered. The family of tests considered is based on the minimum density power divergence estimator instead of the maximum likelihood estimator and it is referred to as the Wald-type test statistic in the paper. We obtain the asymptotic distribution and also study the robustness properties of the Wald type test statistic. The robustness of the tests is investigated theoretically through the influence function analysis as well as suitable practical examples. It is theoretically established that the level as well as the power of the Wald-type tests are stable against contamination, while the classical Wald type test breaks down in this scenario. Some classical examples are presented which numerically substantiate the theory developed. Finally a simulation study is included to provide further confirmation of the validity of the theoretical results established in the paper.
  • We present the improved visibility based Tapered Gridded Estimator (TGE) for the power spectrum of the diffuse sky signal. The visibilities are gridded to reduce the computation, and tapered through a convolution to suppress the contribution from the outer regions of the telescope's field of view. The TGE also internally estimates the noise bias, and subtracts this out to give an unbiased estimate of the power spectrum. An earlier version of the 2D TGE for the angular power spectrum $C_{\ell}$ is improved and then extended to obtain the 3D TGE for the power spectrum $P({\bf k})$ of the 21-cm brightness temperature fluctuations. Analytic formulas are also presented for predicting the variance of the binned power spectrum. The estimator and its variance predictions are validated using simulations of $150 \, {\rm MHz}$ GMRT observations. We find that the estimator accurately recovers the input model for the 1D Spherical Power Spectrum $P(k)$ and the 2D Cylindrical Power Spectrum $P(k_\perp,k_\parallel)$, and the predicted variance is also in reasonably good agreement with the simulations.
  • Mixed-effect models are very popular for analyzing data with a hierarchical structure, e.g. repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed effect covariates collected from each patient can be quite large, e.g. data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed effect variables in linear mixed-effects models along with maximum penalized likelihood estimation of both fixed and random effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high-dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model set-up (like finite mixture of regressions etc.) illustrating the huge range of applicability of our proposal.
  • Analysis of random censored life-time data along with some related stochastic covariables is of great importance in many applied sciences like medical research, population studies and planning etc. The parametric estimation technique commonly used under this set-up is based on the efficient but non-robust likelihood approach. In this paper, we propose a robust parametric estimator for the censored data with stochastic covariates based on the minimum density power divergence approach. The resulting estimator also has competitive efficiency with respect to the maximum likelihood estimator under pure data. The strong robustness property of the proposed estimator with respect to the presence of outliers is examined and illustrated through an appropriate simulation study in the context of censored regression with stochastic covariates. Further, the theoretical asymptotic properties of the proposed estimator are also derived in terms of a general class of M-estimators based on the estimating equation.
  • The extreme value theory is very popular in applied sciences including Finance, economics, hydrology and many other disciplines. In univariate extreme value theory, we model the data by a suitable distribution from the general max-domain of attraction (MAD) characterized by its tail index; there are three broad classes of tails -- the Pareto type, the Weibull type and the Gumbel type. The simplest and most common estimator of the tail index is the Hill estimator that works only for Pareto type tails and has a high bias; it is also highly non-robust in presence of outliers with respect to the assumed model. There have been some recent attempts to produce asymptotically unbiased or robust alternative to the Hill estimator; however all the robust alternatives work for any one type of tail. This paper proposes a new general estimator of the tail index that is both robust and has smaller bias under all the three tail types compared to the existing robust estimators. This essentially produces a robust generalization of the estimator proposed by Matthys and Beirlant (2003) under the same model approximation through a suitable exponential regression framework using the density power divergence. The robustness properties of the estimator are derived in the paper along with an extensive simulation study. A method for bias correction is also proposed with application to some real data examples.
  • It is important to correctly subtract point sources from radio-interferometric data in order to measure the power spectrum of diffuse radiation like the Galactic synchrotron or the Epoch of Reionization 21-cm signal. It is computationally very expensive and challenging to image a very large area and accurately subtract all the point sources from the image. The problem is particularly severe at the sidelobes and the outer parts of the main lobe where the antenna response is highly frequency dependent and the calibration also differs from that of the phase center. Here we show that it is possible to overcome this problem by tapering the sky response. Using simulated 150 MHz observations, we demonstrate that it is possible to suppress the contribution due to point sources from the outer parts by using the Tapered Gridded Estimator to measure the angular power spectrum C_l of the sky signal. We also show from the simulation that this method can self-consistently compute the noise bias and accurately subtract it to provide an unbiased estimation of C_l.
  • We consider a robust version of the classical Wald test statistics for testing simple and composite null hypotheses for general parametric models. These test statistics are based on the minimum density power divergence estimators instead of the maximum likelihood estimators. An extensive study of their robustness properties is given though the influence functions as well as the chi-square inflation factors. It is theoretically established that the level and power of these robust tests are stable against outliers, whereas the classical Wald test breaks down. Some numerical examples confirm the validity of the theoretical results.
  • The 21cm-galaxy cross-power spectrum is expected to be one of the promising probes of the Epoch of Reionization (EoR), as it could offer information about the progress of reionization and the typical scale of ionized regions at different redshifts. With upcoming observations of 21cm emission from the EoR with the Low Frequency Array (LOFAR), and of high redshift Lyalpha emitters (LAEs) with Subaru's Hyper Suprime Cam (HSC), we investigate the observability of such cross-power spectrum with these two instruments, which are both planning to observe the ELAIS-N1 field at z=6.6. In this paper we use N-body + radiative transfer (both for continuum and Lyalpha photons) simulations at redshift 6.68, 7.06 and 7.3 to compute the 3D theoretical 21cm-galaxy cross-power spectrum, as well as to predict the 2D 21cm-galaxy cross-power spectrum expected to be observed by LOFAR and HSC. Once noise and projection effects are accounted for, our predictions of the 21cm-galaxy cross-power spectrum show clear anti-correlation on scales larger than ~ 60 h$^{-1}$ Mpc (corresponding to k ~ 0.1 h Mpc$^{-1}$), with levels of significance p=0.04 at z=6.6 and p=0.048 at z=7.3. On smaller scales, instead, the signal is completely contaminated.
  • Observations of the EoR with the 21-cm hyperfine emission of neutral hydrogen (HI) promise to open an entirely new window onto the formation of the first stars, galaxies and accreting black holes. In order to characterize the weak 21-cm signal, we need to develop imaging techniques which can reconstruct the extended emission very precisely. Here, we present an inversion technique for LOFAR baselines at NCP, based on a Bayesian formalism with optimal spatial regularization, which is used to reconstruct the diffuse foreground map directly from the simulated visibility data. We notice the spatial regularization de-noises the images to a large extent, allowing one to recover the 21-cm power-spectrum over a considerable $k_{\perp}-k_{\para}$ space in the range of $0.03\,{\rm Mpc^{-1}}<k_{\perp}<0.19\,{\rm Mpc^{-1}}$ and $0.14\,{\rm Mpc^{-1}}<k_{\para}<0.35\,{\rm Mpc^{-1}}$ without subtracting the noise power-spectrum. We find that, in combination with using the GMCA, a non-parametric foreground removal technique, we can mostly recover the spherically average power-spectrum within $2\sigma$ statistical fluctuations for an input Gaussian random rms noise level of $60 \, {\rm mK}$ in the maps after 600 hrs of integration over a $10 \, {\rm MHz}$ bandwidth.
  • We present a robust test for composite null hypothesis based on the general $S$-divergence family. This requires a non-trivial extension of the results of Ghosh et al.~(2015). We derive the asymptotic and theoretical robustness properties of the resulting test along with the properties of the minimum $S$-divergence estimators under parameter restrictions imposed by the null hypothesis. An illustration in the context of the normal model is also presented.