
Principal Component Analysis can be performed over small domains of an
embedded Riemannian manifold in order to relate the covariance analysis of the
underlying point set with the local extrinsic and intrinsic curvature. We show
that the volume of domains on a submanifold of general codimension, determined
by the intersection with higherdimensional cylinders and balls in the ambient
space, have asymptotic expansions in terms of the mean and scalar curvatures.
Moreover, we propose a generalization of the classical third fundamental form
to general submanifolds and prove that the eigenvalue decomposition of the
covariance matrices of the domains have asymptotic expansions with scale that
contain the curvature information encoded by the traces of this tensor. In the
case of hypersurfaces, this covariance analysis recovers the principal
curvatures and principal directions, which can be used as descriptors at scale
to build up estimators of the second fundamental form, and thus the Riemann
tensor, of general submanifolds.

Integral invariants obtained from Principal Component Analysis on a small
kernel domain of a submanifold encode important geometric information
classically defined in differentialgeometric terms. We generalize to
hypersurfaces in any dimension major results known for surfaces in space, which
in turn yield a method to estimate the extrinsic and intrinsic curvature of an
embedded Riemannian submanifold of general codimension. In particular, integral
invariants are defined by the volume, barycenter, and the EVD of the covariance
matrix of the domain. We obtain the asymptotic expansion of such invariants for
a spherical volume component delimited by a hypersurface and for the
hypersurface patch created by ball intersetions, showing that the eigenvalues
and eigenvectors can be used as multiscale estimators of the principal
curvatures and principal directions. This approach may be interpreted as
performing statistical analysis on the underlying pointset of a submanifold in
order to obtain geometric descriptors at scale with potential applications to
Manifold Learning and Geometry Processing of point clouds.

Particle physics has an ambitious and broad experimental programme for the
coming decades. This programme requires large investments in detector hardware,
either to build new facilities and experiments, or to upgrade existing ones.
Similarly, it requires commensurate investment in the R&D of software to
acquire, manage, process, and analyse the shear amounts of data to be recorded.
In planning for the HLLHC in particular, it is critical that all of the
collaborating stakeholders agree on the software goals and priorities, and that
the efforts complement each other. In this spirit, this white paper describes
the R&D activities required to prepare for this software upgrade.

Let $\gamma: I \rightarrow \mathbb R^n$ be a parametric curve of class
$C^{n+1}$, regular of order $n$. The FrenetSerret apparatus of $\gamma$ at
$\gamma(t)$ consists of a frame $e_1(t), \dots , e_n(t)$ and generalized
curvature values $\kappa_1(t), \dots, \kappa_{n1}(t)$. Associated with each
point of $\gamma$ there are also local singular vectors $u_1(t), \dots, u_n(t)$
and local singular values $\sigma_1(t), \dots, \sigma_{n}(t)$. This local
information is obtained by considering a limit, as $\epsilon$ goes to zero, of
covariance matrices defined along $\gamma$ within an $\epsilon$ball centered
at $\gamma(t)$. We prove that for each $t\in I$, the FrenetSerret frame and
the local singular vectors agree at $\gamma(t)$ and that the values of the
curvature functions at $t$ can be expressed as a fixed multiple of a ratio of
local singular values at $t$. More precisely, we show that if $\gamma(t)\subset
\mathbb R^n$ for any $n\in\mathbb N$ then, for each $i$ between $2$ and $n$,
$\kappa_{i1}(t)=\sqrt{a_{i1}}\frac{\sigma_{i}(t)}{\sigma_1(t)
\sigma_{i1}(t)}$ with $a_{i1} = \left(\frac{i}{i+(1)^i}\right)^2
{\frac{4i^21}{3}}$. For this we prove a general formula for the recursion
relation of a certain class of sequences of Hankel determinants using the
theory of monic orthogonal polynomials and moment sequences.

The convex hull of a set of points, $C$, serves to expose extremal properties
of $C$ and can help identify elements in $C$ of high interest. For many
problems, particularly in the presence of noise, the true vertex set (and
facets) may be difficult to determine. One solution is to expand the list of
high interest candidates to points lying near the boundary of the convex hull.
We propose a quadratic program for the purpose of stratifying points in a data
cloud based on proximity to the boundary of the convex hull. For each data
point, a quadratic program is solved to determine an associated weight vector.
We show that the weight vector encodes geometric information concerning the
point's relationship to the boundary of the convex hull. The computation of the
weight vectors can be carried out in parallel, and for a fixed number of points
and fixed neighborhood size, the overall computational complexity of the
algorithm grows linearly with dimension. As a consequence, meaningful
computations can be completed on reasonably large, high dimensional data sets.

The existence of characteristic structure, or shape, in complex data sets has
been recognized as increasingly important for mathematical data analysis. This
realization has motivated the development of new tools such as persistent
homology for exploring topological invariants, or features, in large data sets.
In this paper we apply persistent homology to the characterization of gas
plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of
4way arrays. We investigate hyperspectral movies of LongWavelength Infrared
data monitoring an experimental release of chemical simulant into the air. Our
approach models regions of interest within the hyperspectral data cubes as
points on the real Grassmann manifold $G(k, n)$ (whose points parameterize the
$k$dimensional subspaces of $\mathbb{R}^n$), contrasting our approach with the
more standard framework in Euclidean space. An advantage of this approach is
that it allows a sequence of time slices in a hyperspectral movie to be
collapsed to a sequence of points in such a way that some of the key structure
within and between the slices is encoded by the points on the Grassmann
manifold. This motivates the search for topological features, associated with
the evolution of the frames of a hyperspectral movie, within the corresponding
points on the Grassmann manifold. The proposed mathematical model affords the
processing of large data sets while retaining valuable discriminatory
information. In this paper, we discuss how embedding our data in the Grassmann
manifold, together with topological data analysis, captures dynamical events
that occur as the chemical plume is released and evolves.

Many datasets can be viewed as a noisy sampling of an underlying space, and
tools from topological data analysis can characterize this structure for the
purpose of knowledge discovery. One such tool is persistent homology, which
provides a multiscale description of the homological features within a dataset.
A useful representation of this homological information is a persistence
diagram (PD). Efforts have been made to map PDs into spaces with additional
structure valuable to machine learning tasks. We convert a PD to a
finitedimensional vector representation which we call a persistence image
(PI), and prove the stability of this transformation with respect to small
perturbations in the inputs. The discriminatory power of PIs is compared
against existing methods, showing significant performance gains. We explore the
use of PIs with vectorbased machine learning tools, such as linear sparse
support vector machines, which identify features containing discriminating
topological information. Finally, high accuracy inference of parameter values
from the dynamic output of a discrete dynamical system (the linked twist map)
and a partial differential equation (the anisotropic KuramotoSivashinsky
equation) provide a novel application of the discriminatory power of PIs.

Computing plays an essential role in all aspects of high energy physics. As
computational technology evolves rapidly in new directions, and data throughput
and volume continue to follow a steep trendline, it is important for the HEP
community to develop an effective response to a series of expected challenges.
In order to help shape the desired response, the HEP Forum for Computational
Excellence (HEPFCE) initiated a roadmap planning activity with two key
overlapping drivers  1) software effectiveness, and 2) infrastructure and
expertise advancement. The HEPFCE formed three working groups, 1) Applications
Software, 2) Software Libraries and Tools, and 3) Systems (including systems
software), to provide an overview of the current status of HEP computing and to
present findings and opportunities for the desired HEP computational roadmap.
The final versions of the reports are combined in this document, and are
presented along with introductory material.

This is a report from the Libraries and Tools Working Group of the High
Energy Physics Forum for Computational Excellence. It presents the vision of
the working group for how the HEP software community may organize and be
supported in order to more efficiently share and develop common software
libraries and tools across the world's diverse set of HEP experiments. It gives
prioritized recommendations for achieving this goal and provides a survey of a
select number of areas in the current HEP software library and tools landscape.
The survey identifies aspects which support this goal and areas with
opportunities for improvements. The survey covers event processing software
frameworks, software development, data management, workflow and workload
management, geometry information management and conditions databases.

Clinicians need to predict patient outcomes with high accuracy as early as
possible after disease inception. In this manuscript, we show that
patienttopatient variability sets a fundamental limit on outcome prediction
accuracy for a general class of mathematical models for the immune response to
infection. However, accuracy can be increased at the expense of delayed
prognosis. We investigate several systems of ordinary differential equations
(ODEs) that model the host immune response to a pathogen load. Advantages of
systems of ODEs for investigating the immune response to infection include the
ability to collect data on large numbers of `virtual patients', each with a
given set of model parameters, and obtain many time points during the course of
the infection. We implement patienttopatient variability $v$ in the ODE
models by randomly selecting the model parameters from Gaussian distributions
with variance $v$ that are centered on physiological values. We use logistic
regression with oneversusall classification to predict the discrete
steadystate outcomes of the system. We find that the prediction algorithm
achieves near $100\%$ accuracy for $v=0$, and the accuracy decreases with
increasing $v$ for all ODE models studied. The fact that multiple steadystate
outcomes can be obtained for a given initial condition, i.e. the basins of
attraction overlap in the space of initial conditions, limits the prediction
accuracy for $v>0$. Increasing the elapsed time of the variables used to train
and test the classifier, increases the prediction accuracy, while adding
explicit external noise to the ODE models decreases the prediction accuracy.
Our results quantify the competition between early prognosis and high
prediction accuracy that is frequently encountered by clinicians.

We propose an approach for capturing the signal variability in hyperspectral
imagery using the framework of the Grassmann manifold. Labeled points from each
class are sampled and used to form abstract points on the Grassmannian. The
resulting points on the Grassmannian have representations as orthonormal
matrices and as such do not reside in Euclidean space in the usual sense. There
are a variety of metrics which allow us to determine a distance matrices that
can be used to realize the Grassmannian as an embedding in Euclidean space. We
illustrate that we can achieve an approximately isometric embedding of the
Grassmann manifold using the chordal metric while this is not the case with
geodesic distances. However, nonisometric embeddings generated by using a
pseudometric on the Grassmannian lead to the best classification results. We
observe that as the dimension of the Grassmannian grows, the accuracy of the
classification grows to 100% on two illustrative examples. We also observe a
decrease in classification rates if the dimension of the points on the
Grassmannian is too large for the dimension of the Euclidean space. We use
sparse support vector machines to perform additional model reduction. The
resulting classifier selects a subset of dimensions of the embedding without
loss in classification performance.

The ability to characterize the color content of natural imagery is an
important application of image processing. The pixel by pixel coloring of
images may be viewed naturally as points in color space, and the inherent
structure and distribution of these points affords a quantization, through
clustering, of the color information in the image. In this paper, we present a
novel topologically driven clustering algorithm that permits segmentation of
the color features in a digital image. The algorithm blends Locally Linear
Embedding (LLE) and vector quantization by mapping color information to a lower
dimensional space, identifying distinct color regions, and classifying pixels
together based on both a proximity measure and color content. It is observed
that these techniques permit a significant reduction in color resolution while
maintaining the visually important features of images.

We report on searches for a standard model (SM) Higgs boson in $p\bar(p)$
collisions at a center of mass energy sqrt{s}=1.96 TeV with the CDF and D0
detectors using an integrated luminosity of more than 3.0/fb. For a SM Higgs
with mass greater than 135 GeV, the dominant decay mode is two W bosons and the
searches presented are based upon the subsequent electron and muon decays of
the two W bosons. Significant improvement in background modeling and signal
predictions have been implemented since previous preliminary results. No
significant excess is observed, and limits on standard model Higgs production
are calculated. The observed 95% confidence level upper limits are found to be
a factor of 1.63 (2.0) higher than the predicted SM cross section at m_H = 165
GeV for the CDF (D0) experiment while the expected limits are a factor of 1.66
(1.9) higher than the predicted SM cross section.