• This paper presents a randomized algorithm for computing the near-optimal low-rank dynamic mode decomposition (DMD). Randomized algorithms are emerging techniques to compute low-rank matrix approximations at a fraction of the cost of deterministic algorithms, easing the computational challenges arising in the area of `big data'. The idea is to derive a small matrix from the high-dimensional data, which is then used to efficiently compute the dynamic modes and eigenvalues. The algorithm is presented in a modular probabilistic framework, and the approximation quality can be controlled via oversampling and power iterations. The effectiveness of the resulting randomized DMD (rDMD) algorithm is demonstrated on several benchmark examples of increasing complexity, providing an accurate and efficient approach to extract spatiotemporal coherent structures from big data in a framework that scales with the intrinsic rank of the data, rather than the ambient measurement dimension.
  • Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of `big data' has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a near-optimal factorization. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS.
  • Sparse principal component analysis (SPCA) has emerged as a powerful technique for modern data analysis. We discuss a robust and scalable algorithm for computing sparse principal component analysis. Specifically, we model SPCA as a matrix factorization problem with orthogonality constraints, and develop specialized optimization algorithms that partially minimize a subset of the variables (variable projection). The framework incorporates a wide variety of sparsity-inducing regularizers for SPCA. We also extend the variable projection approach to robust SPCA, for any robust loss that can be expressed as the Moreau envelope of a simple function, with the canonical example of the Huber loss. Finally, randomized methods for linear algebra are used to extend the approach to the large-scale (big data) setting. The proposed algorithms are demonstrated using both synthetic and real world data.
  • Matrix decompositions are fundamental tools in the area of applied mathematics, statistical computing, and machine learning. In particular, low-rank matrix decompositions are vital, and widely used for data analysis, dimensionality reduction, and data compression. Massive datasets, however, pose a computational challenge for traditional algorithms, placing significant constraints on both memory and processing power. Recently, the powerful concept of randomness has been introduced as a strategy to ease the computational load. The essential idea of probabilistic algorithms is to employ some amount of randomness in order to derive a smaller matrix from a high-dimensional data matrix. The smaller matrix is then used to compute the desired low-rank approximation. Such algorithms are shown to be computationally efficient for approximating matrices with low-rank structure. We present the R package rsvd, and provide a tutorial introduction to randomized matrix decompositions. Specifically, randomized routines for the singular value decomposition, (robust) principal component analysis, interpolative decomposition, and CUR decomposition are discussed. Several examples demonstrate the routines, and show the computational advantage over other methods implemented in R.
  • Diffusion maps are an emerging data-driven technique for non-linear dimensionality reduction, which are especially useful for the analysis of coherent structures and nonlinear embeddings of dynamical systems. However, the computational complexity of the diffusion maps algorithm scales with the number of observations. Thus, long time-series data presents a significant challenge for fast and efficient embedding. We propose integrating the Nystr\"om method with diffusion maps in order to ease the computational demand. We achieve a speedup of roughly two to four times when approximating the dominant diffusion map components.
  • The CANDECOMP/PARAFAC (CP) tensor decomposition is a popular dimensionality-reduction method for multiway data. Dimensionality reduction is often sought since many high-dimensional tensors have low intrinsic rank relative to the dimension of the ambient measurement space. However, the emergence of `big data' poses significant computational challenges for computing this fundamental tensor decomposition. Leveraging modern randomized algorithms, we demonstrate that the coherent structure can be learned from a smaller representation of the tensor in a fraction of the time. Moreover, the high-dimensional signal can be faithfully approximated from the compressed measurements. Thus, this simple but powerful algorithm enables one to compute the approximate CP decomposition even for massive tensors. The approximation error can thereby be controlled via oversampling and the computation of power iterations. In addition to theoretical results, several empirical results demonstrate the performance of the proposed algorithm.
  • We introduce the method of compressed dynamic mode decomposition (cDMD) for background modeling. The dynamic mode decomposition (DMD) is a regression technique that integrates two of the leading data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed sensing and matrix sketching, cDMD eases the computational workload of high resolution video processing. The key principal of cDMD is to obtain the decomposition on a (small) compressed matrix representation of the video feed. Hence, the cDMD algorithm scales with the intrinsic rank of the matrix, rather then the size of the actual video (data) matrix. Selection of the optimal modes characterizing the background is formulated as a sparsity-constrained sparse coding problem. Our results show, that the quality of the resulting background model is competitive, quantified by the F-measure, Recall and Precision. A GPU (graphics processing unit) accelerated implementation is also presented which further boosts the computational efficiency of the algorithm.
  • This paper introduces a fast algorithm for randomized computation of a low-rank Dynamic Mode Decomposition (DMD) of a matrix. Here we consider this matrix to represent the development of a spatial grid through time e.g. data from a static video source. DMD was originally introduced in the fluid mechanics community, but is also suitable for motion detection in video streams and its use for background subtraction has received little previous investigation. In this study we present a comprehensive evaluation of background subtraction, using the randomized DMD and compare the results with leading robust principal component analysis algorithms. The results are convincing and show the random DMD is an efficient and powerful approach for background modeling, allowing processing of high resolution videos in real-time. Supplementary materials include implementations of the algorithms in Python.