• ### Integral Invariants from Covariance Analysis of Embedded Riemannian Manifolds(1804.10425)

April 27, 2018 math.DG
Principal Component Analysis can be performed over small domains of an embedded Riemannian manifold in order to relate the covariance analysis of the underlying point set with the local extrinsic and intrinsic curvature. We show that the volume of domains on a submanifold of general codimension, determined by the intersection with higher-dimensional cylinders and balls in the ambient space, have asymptotic expansions in terms of the mean and scalar curvatures. Moreover, we propose a generalization of the classical third fundamental form to general submanifolds and prove that the eigenvalue decomposition of the covariance matrices of the domains have asymptotic expansions with scale that contain the curvature information encoded by the traces of this tensor. In the case of hypersurfaces, this covariance analysis recovers the principal curvatures and principal directions, which can be used as descriptors at scale to build up estimators of the second fundamental form, and thus the Riemann tensor, of general submanifolds.
• ### Manifold Curvature Descriptors from Hypersurface Integral Invariants(1804.04808)

April 13, 2018 math.DG
Integral invariants obtained from Principal Component Analysis on a small kernel domain of a submanifold encode important geometric information classically defined in differential-geometric terms. We generalize to hypersurfaces in any dimension major results known for surfaces in space, which in turn yield a method to estimate the extrinsic and intrinsic curvature of an embedded Riemannian submanifold of general codimension. In particular, integral invariants are defined by the volume, barycenter, and the EVD of the covariance matrix of the domain. We obtain the asymptotic expansion of such invariants for a spherical volume component delimited by a hypersurface and for the hypersurface patch created by ball intersetions, showing that the eigenvalues and eigenvectors can be used as multi-scale estimators of the principal curvatures and principal directions. This approach may be interpreted as performing statistical analysis on the underlying point-set of a submanifold in order to obtain geometric descriptors at scale with potential applications to Manifold Learning and Geometry Processing of point clouds.
• Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
• ### Geometry of Curves in $\mathbb R^n$, Singular Value Decomposition, and Hankel Determinants(1511.05008)

Oct. 19, 2017 math.DG
Let $\gamma: I \rightarrow \mathbb R^n$ be a parametric curve of class $C^{n+1}$, regular of order $n$. The Frenet-Serret apparatus of $\gamma$ at $\gamma(t)$ consists of a frame $e_1(t), \dots , e_n(t)$ and generalized curvature values $\kappa_1(t), \dots, \kappa_{n-1}(t)$. Associated with each point of $\gamma$ there are also local singular vectors $u_1(t), \dots, u_n(t)$ and local singular values $\sigma_1(t), \dots, \sigma_{n}(t)$. This local information is obtained by considering a limit, as $\epsilon$ goes to zero, of covariance matrices defined along $\gamma$ within an $\epsilon$-ball centered at $\gamma(t)$. We prove that for each $t\in I$, the Frenet-Serret frame and the local singular vectors agree at $\gamma(t)$ and that the values of the curvature functions at $t$ can be expressed as a fixed multiple of a ratio of local singular values at $t$. More precisely, we show that if $\gamma(t)\subset \mathbb R^n$ for any $n\in\mathbb N$ then, for each $i$ between $2$ and $n$, $\kappa_{i-1}(t)=\sqrt{a_{i-1}}\frac{\sigma_{i}(t)}{\sigma_1(t) \sigma_{i-1}(t)}$ with $a_{i-1} = \left(\frac{i}{i+(-1)^i}\right)^2 {\frac{4i^2-1}{3}}$. For this we prove a general formula for the recursion relation of a certain class of sequences of Hankel determinants using the theory of monic orthogonal polynomials and moment sequences.
• ### Stratifying High Dimensional Data Based on Proximity to the Convex Hull Boundary(1611.01419)

Nov. 4, 2016 cs.CG, math.OC
The convex hull of a set of points, $C$, serves to expose extremal properties of $C$ and can help identify elements in $C$ of high interest. For many problems, particularly in the presence of noise, the true vertex set (and facets) may be difficult to determine. One solution is to expand the list of high interest candidates to points lying near the boundary of the convex hull. We propose a quadratic program for the purpose of stratifying points in a data cloud based on proximity to the boundary of the convex hull. For each data point, a quadratic program is solved to determine an associated weight vector. We show that the weight vector encodes geometric information concerning the point's relationship to the boundary of the convex hull. The computation of the weight vectors can be carried out in parallel, and for a fixed number of points and fixed neighborhood size, the overall computational complexity of the algorithm grows linearly with dimension. As a consequence, meaningful computations can be completed on reasonably large, high dimensional data sets.
• ### Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies(1607.02196)

July 11, 2016 math.AT, cs.CV, cs.CG
The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper we apply persistent homology to the characterization of gas plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of 4-way arrays. We investigate hyperspectral movies of Long-Wavelength Infrared data monitoring an experimental release of chemical simulant into the air. Our approach models regions of interest within the hyperspectral data cubes as points on the real Grassmann manifold $G(k, n)$ (whose points parameterize the $k$-dimensional subspaces of $\mathbb{R}^n$), contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows a sequence of time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmann manifold. This motivates the search for topological features, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the Grassmann manifold. The proposed mathematical model affords the processing of large data sets while retaining valuable discriminatory information. In this paper, we discuss how embedding our data in the Grassmann manifold, together with topological data analysis, captures dynamical events that occur as the chemical plume is released and evolves.
• ### Persistence Images: A Stable Vector Representation of Persistent Homology(1507.06217)

July 11, 2016 math.AT, cs.CG, stat.ML
Many datasets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a dataset. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.
• ### High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)(1510.08545)

Oct. 29, 2015 hep-ex, cs.DC, physics.comp-ph, cs.CE
Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.
• ### HEP-FCE Working Group on Libraries and Tools(1506.01309)

June 3, 2015 hep-ex, physics.data-an
This is a report from the Libraries and Tools Working Group of the High Energy Physics Forum for Computational Excellence. It presents the vision of the working group for how the HEP software community may organize and be supported in order to more efficiently share and develop common software libraries and tools across the world's diverse set of HEP experiments. It gives prioritized recommendations for achieving this goal and provides a survey of a select number of areas in the current HEP software library and tools landscape. The survey identifies aspects which support this goal and areas with opportunities for improvements. The survey covers event processing software frameworks, software development, data management, workflow and workload management, geometry information management and conditions databases.
• ### Outcome prediction in mathematical models of immune response to infection(1503.08324)

March 28, 2015 q-bio.QM
Clinicians need to predict patient outcomes with high accuracy as early as possible after disease inception. In this manuscript, we show that patient-to-patient variability sets a fundamental limit on outcome prediction accuracy for a general class of mathematical models for the immune response to infection. However, accuracy can be increased at the expense of delayed prognosis. We investigate several systems of ordinary differential equations (ODEs) that model the host immune response to a pathogen load. Advantages of systems of ODEs for investigating the immune response to infection include the ability to collect data on large numbers of `virtual patients', each with a given set of model parameters, and obtain many time points during the course of the infection. We implement patient-to-patient variability $v$ in the ODE models by randomly selecting the model parameters from Gaussian distributions with variance $v$ that are centered on physiological values. We use logistic regression with one-versus-all classification to predict the discrete steady-state outcomes of the system. We find that the prediction algorithm achieves near $100\%$ accuracy for $v=0$, and the accuracy decreases with increasing $v$ for all ODE models studied. The fact that multiple steady-state outcomes can be obtained for a given initial condition, i.e. the basins of attraction overlap in the space of initial conditions, limits the prediction accuracy for $v>0$. Increasing the elapsed time of the variables used to train and test the classifier, increases the prediction accuracy, while adding explicit external noise to the ODE models decreases the prediction accuracy. Our results quantify the competition between early prognosis and high prediction accuracy that is frequently encountered by clinicians.
• ### Classification of Hyperspectral Imagery on Embedded Grassmannians(1502.00946)

Feb. 3, 2015 cs.CV
We propose an approach for capturing the signal variability in hyperspectral imagery using the framework of the Grassmann manifold. Labeled points from each class are sampled and used to form abstract points on the Grassmannian. The resulting points on the Grassmannian have representations as orthonormal matrices and as such do not reside in Euclidean space in the usual sense. There are a variety of metrics which allow us to determine a distance matrices that can be used to realize the Grassmannian as an embedding in Euclidean space. We illustrate that we can achieve an approximately isometric embedding of the Grassmann manifold using the chordal metric while this is not the case with geodesic distances. However, non-isometric embeddings generated by using a pseudometric on the Grassmannian lead to the best classification results. We observe that as the dimension of the Grassmannian grows, the accuracy of the classification grows to 100% on two illustrative examples. We also observe a decrease in classification rates if the dimension of the points on the Grassmannian is too large for the dimension of the Euclidean space. We use sparse support vector machines to perform additional model reduction. The resulting classifier selects a subset of dimensions of the embedding without loss in classification performance.
• ### Locally Linear Embedding Clustering Algorithm for Natural Imagery(1202.4387)

Feb. 20, 2012 math.GT, cs.CV, cs.CG
The ability to characterize the color content of natural imagery is an important application of image processing. The pixel by pixel coloring of images may be viewed naturally as points in color space, and the inherent structure and distribution of these points affords a quantization, through clustering, of the color information in the image. In this paper, we present a novel topologically driven clustering algorithm that permits segmentation of the color features in a digital image. The algorithm blends Locally Linear Embedding (LLE) and vector quantization by mapping color information to a lower dimensional space, identifying distinct color regions, and classifying pixels together based on both a proximity measure and color content. It is observed that these techniques permit a significant reduction in color resolution while maintaining the visually important features of images.
• ### Searches at the Tevatron for a High Mass Standard Model Higgs Boson(0810.3747)

Oct. 21, 2008 hep-ex
We report on searches for a standard model (SM) Higgs boson in $p\bar(p)$ collisions at a center of mass energy sqrt{s}=1.96 TeV with the CDF and D0 detectors using an integrated luminosity of more than 3.0/fb. For a SM Higgs with mass greater than 135 GeV, the dominant decay mode is two W bosons and the searches presented are based upon the subsequent electron and muon decays of the two W bosons. Significant improvement in background modeling and signal predictions have been implemented since previous preliminary results. No significant excess is observed, and limits on standard model Higgs production are calculated. The observed 95% confidence level upper limits are found to be a factor of 1.63 (2.0) higher than the predicted SM cross section at m_H = 165 GeV for the CDF (D0) experiment while the expected limits are a factor of 1.66 (1.9) higher than the predicted SM cross section.