
The widespread use of generalized linear models in casecontrol genetic
studies has helped identify many diseaseassociated risk factors typically
defined as DNA variants, or single nucleotide polymorphisms (SNPs). Up to now,
most literature has focused on selecting a unique best subset of SNPs based on
some statistical perspectives. In the presence of pronounced noise, however,
multiple biological paths are often found to be equally supported by a given
dataset when dealing with complex genetic diseases. We address the ambiguity
related to SNP selection by constructing a list of models called variable
selection confidence set (VSCS), which contains the collection of all
wellsupported SNP combinations at a userspecified confidence level. The VSCS
extends the familiar notion of confidence intervals in the variable selection
setting and provides the practitioner with new tools aiding the variable
selection activity beyond trusting a single model. Based on the VSCS, we
consider natural graphical and numerical statistics measuring the inclusion
importance of a SNP based on its frequency in the most parsimonious VSCS
models. This work is motivated by available casecontrol genetic data on
agerelated macular degeneration, a widespread complex disease and leading
cause of vision loss.

In this article, we introduce the concept of model confidence bounds (MCB)
for variable selection in the context of nested models. Similarly to the
endpoints in the familiar confidence interval for parameter estimation, the MCB
identifies two nested models (upper and lower confidence bound models)
containing the true model at a given level of confidence. Instead of trusting a
single selected model obtained from a given model selection method, the MCB
proposes a group of nested models as candidates and the MCB's width and
composition enable the practitioner to assess the overall model selection
uncertainty. A new graphical tool  the model uncertainty curve (MUC)  is
introduced to visualize the variability of model selection and to compare
different model selection procedures. The MCB methodology is implemented by a
fast bootstrap algorithm that is shown to yield the correct asymptotic coverage
under rather general conditions. Our Monte Carlo simulations and real data
examples confirm the validity and illustrate the advantages of the proposed
method.

Multicolor cell spatiotemporal image data have become important to
investigate organ development and regeneration, malignant growth or immune
responses by tracking different cell types both in vivo and in vitro.
Statistical modeling of image data from common longitudinal cell experiments
poses significant challenges due to the presence of complex spatiotemporal
interactions between different cell types and difficulties related to
measurement of single cell trajectories. Current analysis methods focus mainly
on univariate cases, often not considering the spatiotemporal effects
affecting cell growth between different cell populations. In this paper, we
propose a conditional spatial autoregressive model to describe multivariate
count cell data on the lattice, and develop inference tools. The proposed
methodology is computationally tractable and enables researchers to estimate a
complete statistical model of multicolor cell growth. Our methodology is
applied on real experimental data where we investigate how interactions between
cells affect their growth. We include two case studies; the first evaluates
interactions between cancer cells and fibroblasts, which are normally present
in the tumor microenvironment, whilst the second evaluates interactions between
cloned cancer cells when grown as different combinations.

Recently, IBM has made available a quantum computer provided with 16 qubits,
denoted as IBM Q16. Previously, only a 5 qubit device, denoted as Q5, was
available. Both IBM devices can be used to run quantum programs, by means of a
cloudbased platform. In this paper, we illustrate our experience with IBM Q16
in demonstrating entanglement assisted invariance, also known as envariance,
and parity learning by querying a uniform quantum example oracle. In
particular, we illustrate the nontrivial strategy we have designed for
compiling $n$qubit quantum circuits ($n$ being an input parameter) on any IBM
device, taking into account topological constraints.

The traditional activity of model selection aims at discovering a single
model superior to other candidate models. In the presence of pronounced noise,
however, multiple models are often found to explain the same data equally well.
To resolve this model selection ambiguity, we introduce the general approach of
model selection confidence sets (MSCSs) based on likelihood ratio testing. A
MSCS is defined as a list of models statistically indistinguishable from the
true model at a userspecified level of confidence, which extends the familiar
notion of confidence intervals to the modelselection framework. Our approach
guarantees asymptotically correct coverage probability of the true model when
both sample size and model dimension increase. We derive conditions under which
the MSCS contains all the relevant information about the true model structure.
In addition, we propose natural statistics based on the MSCS to measure
importance of variables in a principled way that accounts for the overall model
uncertainty. When the space of feasible models is large, MSCS is implemented by
an adaptive stochastic search algorithm which samples MSCS models with high
probability. The MSCS methodology is illustrated through numerical experiments
on synthetic data and real data examples.

Growth in both size and complexity of modern data challenges the
applicability of traditional likelihoodbased inference. Composite likelihood
(CL) methods address the difficulties related to model selection and
computational intractability of the full likelihood by combining a number of
lowdimensional likelihood objects into a single objective function used for
inference. This paper introduces a procedure to combine partial likelihood
objects from a large set of feasible candidates and simultaneously carry out
parameter estimation. The new method constructs estimating equations balancing
statistical efficiency and computing cost by minimizing an approximate distance
from the full likelihood score subject to a L1norm penalty representing the
available computing resources. This results in truncated CL equations
containing only the most informative partial likelihood score terms. An
asymptotic theory within a framework where both sample size and data dimension
grow is developed and finitesample properties are illustrated through
numerical examples.

Testing the association between a phenotype and many genetic variants from
casecontrol data is essential in genomewide association study (GWAS). This is
a challenging task as many such variants are correlated or noninformative.
Similarities exist in testing the population difference between two groups of
high dimensional data with intractable full likelihood function. Testing may be
tackled by a maximum composite likelihood (MCL) not entailing the full
likelihood, but current MCL tests are subject to power loss for involving
noninformative or redundant sublikelihoods. In this paper, we develop a
forward search and test method for simultaneous powerful group difference
testing and informative sublikelihoods composition. Our method constructs a
sequence of Waldtype test statistics by including only informative
sublikelihoods progressively so as to improve the test power under local
sparsity alternatives. Numerical studies show that it achieves considerable
improvement over the available tests as the modeling complexity grows. Our
method is further validated by testing the motivating GWAS data on breast
cancer with interesting results obtained.

Composite likelihood estimation has an important role in the analysis of
multivariate data for which the full likelihood function is intractable. An
important issue in composite likelihood inference is the choice of the weights
associated with lowerdimensional data subsets, since the presence of
incompatible submodels can deteriorate the accuracy of the resulting
estimator. In this paper, we introduce a new approach for simultaneous
parameter estimation by tilting, or reweighting, each sublikelihood component
called discriminative composite likelihood estimation (DMcLE). The
dataadaptive weights maximize the composite likelihood function, subject to
moving a given distance from uniform weights; then, the resulting weights can
be used to rank lowerdimensional likelihoods in terms of their influence in
the composite likelihood function. Our analytical findings and numerical
examples support the stability of the resulting estimator compared to
estimators constructed using standard composition strategies based on uniform
weights. The properties of the new method are illustrated through simulated
data and real spatial data on multivariate precipitation extremes.

The traditional maximum likelihood estimator (MLE) is often of limited use in
complex highdimensional data due to the intractability of the underlying
likelihood function. Maximum composite likelihood estimation (McLE) avoids full
likelihood specification by combining a number of partial likelihood objects
depending on small data subsets, thus enabling inference for complex data. A
fundamental difficulty in making the McLE approach practicable is the selection
from numerous candidate likelihood objects for constructing the composite
likelihood function. In this paper, we propose a flexible Gibbs sampling scheme
for optimal selection of sublikelihood components. The sampled composite
likelihood functions are shown to converge to the one maximally informative on
the unknown parameters in equilibrium, since sublikelihood objects are chosen
with probability depending on the variance of the corresponding McLE. A
penalized version of our method generates sparse likelihoods with a relatively
small number of components when the data complexity is intense. Our algorithms
are illustrated through numerical examples on simulated data as well as real
genotype SNP data from a casecontrol study.

In this paper, the maximum L$q$likelihood estimator (ML$q$E), a new
parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 3035]
is introduced. The properties of the ML$q$E are studied via asymptotic analysis
and computer simulations. The behavior of the ML$q$E is characterized by the
degree of distortion $q$ applied to the assumed model. When $q$ is properly
chosen for small and moderate sample sizes, the ML$q$E can successfully trade
bias for precision, resulting in a substantial reduction of the mean squared
error. When the sample size is large and $q$ tends to 1, a necessary and
sufficient condition to ensure a proper asymptotic normality and efficiency of
ML$q$E is established.