• Sparse principal component analysis (SPCA) has emerged as a powerful technique for modern data analysis. We discuss a robust and scalable algorithm for computing sparse principal component analysis. Specifically, we model SPCA as a matrix factorization problem with orthogonality constraints, and develop specialized optimization algorithms that partially minimize a subset of the variables (variable projection). The framework incorporates a wide variety of sparsity-inducing regularizers for SPCA. We also extend the variable projection approach to robust SPCA, for any robust loss that can be expressed as the Moreau envelope of a simple function, with the canonical example of the Huber loss. Finally, randomized methods for linear algebra are used to extend the approach to the large-scale (big data) setting. The proposed algorithms are demonstrated using both synthetic and real world data.
  • Topological data analysis (TDA) has emerged as one of the most promising techniques to reconstruct the unknown shapes of high-dimensional spaces from observed data samples. TDA, thus, yields key shape descriptors in the form of persistent topological features that can be used for any supervised or unsupervised learning task, including multi-way classification. Sparse sampling, on the other hand, provides a highly efficient technique to reconstruct signals in the spatial-temporal domain from just a few carefully-chosen samples. Here, we present a new method, referred to as the Sparse-TDA algorithm, that combines favorable aspects of the two techniques. This combination is realized by selecting an optimal set of sparse pixel samples from the persistent features generated by a vector-based TDA algorithm. These sparse samples are selected from a low-rank matrix representation of persistent features using QR pivoting. We show that the Sparse-TDA method demonstrates promising performance on three benchmark problems related to human posture recognition and image texture classification.
  • Optimal sensor placement is a central challenge in the design, prediction, estimation, and control of high-dimensional systems. High-dimensional states can often leverage a latent low-dimensional representation, and this inherent compressibility enables sparse sensing. This article explores optimized sensor placement for signal reconstruction based on a tailored library of features extracted from training data. Sparse point sensors are discovered using the singular value decomposition and QR pivoting, which are two ubiquitous matrix computations that underpin modern linear dimensionality reduction. Sparse sensing in a tailored basis is contrasted with compressed sensing, a universal signal recovery method in which an unknown signal is reconstructed via a sparse representation in a universal basis. Although compressed sensing can recover a wider class of signals, we demonstrate the benefits of exploiting known patterns in data with optimized sensing. In particular, drastic reductions in the required number of sensors and improved reconstruction are observed in examples ranging from facial images to fluid vorticity fields. Principled sensor placement may be critically enabling when sensors are costly and provides faster state estimation for low-latency, high-bandwidth control. MATLAB code is provided for all examples.
  • The CANDECOMP/PARAFAC (CP) tensor decomposition is a popular dimensionality-reduction method for multiway data. Dimensionality reduction is often sought since many high-dimensional tensors have low intrinsic rank relative to the dimension of the ambient measurement space. However, the emergence of `big data' poses significant computational challenges for computing this fundamental tensor decomposition. Leveraging modern randomized algorithms, we demonstrate that the coherent structure can be learned from a smaller representation of the tensor in a fraction of the time. Moreover, the high-dimensional signal can be faithfully approximated from the compressed measurements. Thus, this simple but powerful algorithm enables one to compute the approximate CP decomposition even for massive tensors. The approximation error can thereby be controlled via oversampling and the computation of power iterations. In addition to theoretical results, several empirical results demonstrate the performance of the proposed algorithm.
  • This paper addresses the problem of identifying different flow environments from sparse data collected by wing strain sensors. Insects regularly perform this feat using a sparse ensemble of noisy strain sensors on their wing. First, we obtain strain data from numerical simulation of a Manduca sexta hawkmoth wing undergoing different flow environments. Our data-driven method learns low-dimensional strain features originating from different aerodynamic environments using proper orthogonal decomposition (POD) modes in the frequency domain, and leverages sparse approximation to classify a set of strain frequency signatures using a dictionary of POD modes. This bio-inspired machine learning architecture for dictionary learning and sparse classification permits fewer costly physical strain sensors while being simultaneously robust to sensor noise. A measurement selection algorithm identifies frequencies that best discriminate the different aerodynamic environments in low-rank POD feature space. In this manner, sparse and noisy wing strain data can be exploited to robustly identify different aerodynamic environments encountered in flight, providing insight into the stereotyped placement of neurons that act as strain sensors on a Manduca sexta hawkmoth wing.