
In this paper, we propose a new regularization technique called "functional
SCAD". We then combine this technique with the smoothing spline method to
develop a smooth and locally sparse (i.e., zero on some subregions) estimator
for the coefficient function in functional linear regression. The functional
SCAD has a nice shrinkage property that enables our estimating procedure to
identify the null subregions of the coefficient function without over shrinking
the nonzero values of the coefficient function. Additionally, the smoothness
of our estimated coefficient function is regularized by a roughness penalty
rather than by controlling the number of knots. Our method is more
theoretically sound and is computationally simpler than the other available
methods. An asymptotic analysis shows that our estimator is consistent and can
identify the null region with the probability tending to one. Furthermore,
simulation studies show that our estimator has superior numerical performance.
Finally, the practical merit of our method is demonstrated on two real
applications.

This paper is motivated by the problem of integrating multiple sources of
measurements. We consider two multipleinputmultipleoutput (MIMO) channels, a
primary channel and a secondary channel, with dependent input signals. The
primary channel carries the signal of interest, and the secondary channel
carries a signal that shares a joint distribution with the primary signal. The
problem of particular interest is designing the secondary channel matrix, when
the primary channel matrix is fixed. We formulate the problem as an
optimization problem, in which the optimal secondary channel matrix maximizes
an informationbased criterion. An analytical solution is provided in a special
case. Two fasttocompute algorithms, one extrinsic and the other intrinsic,
are proposed to approximate the optimal solutions in general cases. In
particular, the intrinsic algorithm exploits the geometry of the unit sphere, a
manifold embedded in Euclidean space. The performances of the proposed
algorithms are examined through a simulation study. A discussion of the choice
of dimension for the secondary channel is given.

We consider the problem of selecting covariates in spatial linear models with
Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that
enables simultaneous variable selection and parameter estimation is developed
and, for ease of computation, PMLE is approximated by onestep sparse
estimation (OSE). To further improve computational efficiency, particularly
with large sample sizes, we propose penalized maximum covariancetapered
likelihood estimation (PMLE$_{\mathrm{T}}$) and its onestep sparse estimation
(OSE$_{\mathrm{T}}$). General forms of penalty functions with an emphasis on
smoothly clipped absolute deviation are used for penalized maximum likelihood.
Theoretical properties of PMLE and OSE, as well as their approximations
PMLE$_{\mathrm{T}}$ and OSE$_{\mathrm{T}}$ using covariance tapering, are
derived, including consistency, sparsity, asymptotic normality and the oracle
properties. For covariance tapering, a byproduct of our theoretical results is
consistency and asymptotic normality of maximum covariancetapered likelihood
estimates. Finitesample properties of the proposed methods are demonstrated in
a simulation study and, for illustration, the methods are applied to analyze
two real data sets.

Object Oriented Data Analysis is a new area in statistics that studies
populations of general data objects. In this article we consider populations of
treestructured objects as our focus of interest. We develop improved analysis
tools for data lying in a binary tree space analogous to classical Principal
Component Analysis methods in Euclidean space. Our extensions of PCA are
analogs of one dimensional subspaces that best fit the data. Previous work was
based on the notion of treelines.
In this paper, a generalization of the previous treeline notion is proposed:
ktreelines. Previously proposed treelines are ktreelines where k=1. New
subcases of ktreelines studied in this work are the 2treelines and
treecurves, which explain much more variation per principal component than
treelines. The optimal principal component treelines were computable in
linear time. Because 2treelines and treecurves are more complex, they are
computationally more expensive, but yield improved data analysis results.
We provide a comparative study of all these methods on a motivating data set
consisting of brain vessel structures of 98 subjects.

This study introduces a new method of visualizing complex tree structured
objects. The usefulness of this method is illustrated in the context of
detecting unexpected features in a data set of very large trees. The major
contribution is a novel twodimensional graphical representation of each tree,
with a covariate coded by color. The motivating data set contains three
dimensional representations of brain artery systems of 105 subjects. Due to
inaccuracies inherent in the medical imaging techniques, issues with the
reconstruction algo rithms and inconsistencies introduced by manual
adjustment, various discrepancies are present in the data. The proposed
representation enables quick visual detection of the most common discrepancies.
For our driving example, this tool led to the modification of 10% of the artery
trees and deletion of 6.7%. The benefits of our cleaning method are
demonstrated through a statistical hypothesis test on the effects of aging on
vessel structure. The data cleaning resulted in improved significance levels.

The active field of Functional Data Analysis (about understanding the
variation in a set of curves) has been recently extended to Object Oriented
Data Analysis, which considers populations of more general objects. A
particularly challenging extension of this set of ideas is to populations of
treestructured objects. We develop an analog of Principal Component Analysis
for trees, based on the notion of treelines, and propose numerically fast
(linear time) algorithms to solve the resulting optimization problems. The
solutions we obtain are used in the analysis of a data set of 73 individuals,
where each data object is a tree of blood vessels in one person's brain.

Object oriented data analysis is the statistical analysis of populations of
complex objects. In the special case of functional data analysis, these data
objects are curves, where standard Euclidean approaches, such as principal
component analysis, have been very successful. Recent developments in medical
image analysis motivate the statistical analysis of populations of more complex
data objects which are elements of mildly nonEuclidean spaces, such as Lie
groups and symmetric spaces, or of strongly nonEuclidean spaces, such as
spaces of treestructured data objects. These new contexts for object oriented
data analysis create several potentially large new interfaces between
mathematics and statistics. This point is illustrated through the careful
development of a novel mathematical framework for statistical analysis of
populations of treestructured objects.