• This summary of the doctoral thesis is created to emphasize the close connection of the proposed spectral analysis method with the Discrete Fourier Transform (DFT), the most extensively studied and frequently used approach in the history of signal processing. It is shown that in a typical application case, where uniform data readings are transformed to the same number of uniformly spaced frequencies, the results of the classical DFT and proposed approach coincide. The difference in performance appears when the length of the DFT is selected to be greater than the length of the data. The DFT solves the unknown data problem by padding readings with zeros up to the length of the DFT, while the proposed Extended DFT (EDFT) deals with this situation in a different way, it uses the Fourier integral transform as a target and optimizes the transform basis in the extended frequency range without putting such restrictions on the time domain. Consequently, the Inverse DFT (IDFT) applied to the result of EDFT returns not only known readings, but also the extrapolated data, where classical DFT is able to give back just zeros, and higher resolution are achieved at frequencies where the data has been successfully extended. It has been demonstrated that EDFT able to process data with missing readings or gaps inside or even nonuniformly distributed data. Thus, EDFT significantly extends the usability of the DFT-based methods, where previously these approaches have been considered as not applicable. The EDFT founds the solution in an iterative way and requires repeated calculations to get the adaptive basis, and this makes it numerical complexity much higher compared to DFT. This disadvantage was a serious problem in the 1990s, when the method has been proposed. Fortunately, since then the power of computers has increased so much that nowadays EDFT application could be considered as a real alternative.
  • Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.
  • Deep neural networks have become invaluable tools for supervised machine learning, e.g., classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Important issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.
  • In numerical time-integration with implicit-explicit (IMEX) methods, a within-step adaptable decomposition called residual balanced decomposition is introduced. With this decomposition, the requirement of a small enough residual in the iterative solver can be removed, consequently, this allows to exchange stability for efficiency. This decomposition transfers any residual occurring in the implicit equation of the implicit-step into the explicit part of the decomposition. By balancing the residual, the accuracy of the local truncation error of the time-stepping method becomes independent from the accuracy by which the implicit equation is solved. In order to balance the residual, the original IMEX decomposition is adjusted after the iterative solver has been stopped. For this to work, the traditional IMEX time-stepping algorithm needs to be changed. We call this new method the shortcut-IMEX (SIMEX). SIMEX can gain computational efficiency by exploring the trade-off between the computational effort placed in solving the implicit equation and the size of the numerically stable time-step. Typically, increasing the number of solver iterations increases the largest stable step-size. Both multi-step and Runge-Kutta (RK) methods are suitable for use with SIMEX. Here, we show the efficiency of a SIMEX-RK method in overcoming parabolic stiffness by applying it to a nonlinear reaction-advection-diffusion equation. In order to define a stability region for SIMEX, a region in the complex plane is depicted by applying SIMEX to a suitable PDE model containing diffusion and dispersion. A myriad of stability regions can be reached by changing the RK tableau and the solver.
  • This report describes the computation of gradients by algorithmic differentiation for statistically optimum beamforming operations. Especially the derivation of complex-valued functions is a key component of this approach. Therefore the real-valued algorithmic differentiation is extended via the complex-valued chain rule. In addition to the basic mathematic operations the derivative of the eigenvalue problem with complex-valued eigenvectors is one of the key results of this report. The potential of this approach is shown with experimental results on the CHiME-3 challenge database. There, the beamforming task is used as a front-end for an ASR system. With the developed derivatives a joint optimization of a speech enhancement and speech recognition system w.r.t. the recognition optimization criterion is possible.
  • This work formalizes "Exact Real Computation" (ERC): a paradigm combining (i) ALGEBRAIC imperative programming of/over abstract data types (ADTs) for CONTINUOUS structures with (ii) a selection and sound semantics of primitives computable in the sense of Recursive ANALYSIS, that is, by means of approximations -- yet presented to the user as exact. We specify a small imperative programming language for the ADT of real (i.e., including transcendental) numbers with rigorous semantics: arguments are provided, passed to and received from calls to functions (like $e^x$), and operated on EXACTLY -- with partial inequality predicate and multivalued binary SELECT and continuous conditional (aka PARALLEL-IF) operations -- yet REALIZING a function (again like $e^x$) requires only to APPROXIMATE its return value up to guaranteed absolute error $2^p$ for any given integer $p$: closure under composition is implicit. We prove this language Turing-COMPLETE: it can express precisely those partial real functions computable in the sense of Recursive Analysis; similarly for functionALs. Three basic numerical problems demonstrate both the convenience and novel control-flow considerations of this approach to Reliable Numerics: multivalued integer rounding, solving systems of linear equations, and simple root finding. For rigorously specifying and arguing about such (non-extensional) computations, we propose a first-order theory over two sorts: integers and reals; and prove it both decidable and 'model complete': thus reflecting the elegance inherent to real (as opposed to rational/floating point) numbers. Rules of Hoare Logic are extended to support formal correctness proofs in ERC.
  • Several test function suites are being used for numerical benchmarking of multiobjective optimization algorithms. While they have some desirable properties, like well-understood Pareto sets and Pareto fronts of various shapes, most of the currently used functions possess characteristics that are arguably under-represented in real-world problems. They mainly stem from the easier construction of such functions and result in improbable properties such as separability, optima located exactly at the boundary constraints, and the existence of variables that solely control the distance between a solution and the Pareto front. Here, we propose an alternative way to constructing multiobjective problems-by combining existing single-objective problems from the literature. We describe in particular the bbob-biobj test suite with 55 bi-objective functions in continuous domain, and its extended version with 92 bi-objective functions (bbob-biobj-ext). Both test suites have been implemented in the COCO platform for black-box optimization benchmarking. Finally, we recommend a general procedure for creating test suites for an arbitrary number of objectives. Besides providing the formal function definitions and presenting their (known) properties, this paper also aims at giving the rationale behind our approach in terms of groups of functions with similar properties, objective space normalization, and problem instances. The latter allows us to easily compare the performance of deterministic and stochastic solvers, which is an often overlooked issue in benchmarking.
  • We study and develop (stochastic) primal--dual block-coordinate descent methods for convex problems based on the method due to Chambolle and Pock. Our methods have known convergence rates for the iterates and the ergodic gap: $O(1/N^2)$ if each block is strongly convex, $O(1/N)$ if no convexity is present, and more generally a mixed rate $O(1/N^2)+O(1/N)$ for strongly convex blocks, if only some blocks are strongly convex. Additional novelties of our methods include blockwise-adapted step lengths and acceleration, as well as the ability to update both the primal and dual variables randomly in blocks under a very light compatibility condition. In other words, these variants of our methods are doubly-stochastic. We test the proposed methods on various image processing problems, where we employ pixelwise-adapted acceleration.
  • This paper summarizes the development of Veamy, an object-oriented C++ library for the virtual element method (VEM) on general polygonal meshes, whose modular design is focused on its extensibility. The linear elastostatic and Poisson problems in two dimensions have been chosen as the starting stage for the development of this library. The theory of the VEM, upon which Veamy is built, is presented using a notation and a terminology that resemble the language of the finite element method (FEM) in engineering analysis. Several examples are provided to demonstrate the usage of Veamy, and in particular, one of them features the interaction between Veamy and the polygonal mesh generator PolyMesher. A computational performance comparison between VEM and FEM is also conducted. Veamy is free and open source software.
  • This work studies the linear approximation of high-dimensional dynamical systems using low-rank dynamic mode decomposition (DMD). Searching this approximation in a data-driven approach can be formalised as attempting to solve a low-rank constrained optimisation problem. This problem is non-convex and state-of-the-art algorithms are all sub-optimal. This paper shows that there exists a closed-form solution, which can be computed in polynomial-time, and characterises the $\ell_2$-norm of the optimal approximation error. The theoretical results serve to design low-complexity algorithms building reduced models from the optimal solution, based on singular value decomposition or low-rank DMD. The algorithms are evaluated by numerical simulations using synthetic and physical data benchmarks.
  • We consider importance sampling to estimate the probability $\mu$ of a union of $J$ rare events $H_j$ defined by a random variable $\boldsymbol{x}$. The sampler we study has been used in spatial statistics, genomics and combinatorics going back at least to Karp and Luby (1983). It works by sampling one event at random, then sampling $\boldsymbol{x}$ conditionally on that event happening and it constructs an unbiased estimate of $\mu$ by multiplying an inverse moment of the number of occuring events by the union bound. We prove some variance bounds for this sampler. For a sample size of $n$, it has a variance no larger than $\mu(\bar\mu-\mu)/n$ where $\bar\mu$ is the union bound. It also has a coefficient of variation no larger than $\sqrt{(J+J^{-1}-2)/(4n)}$ regardless of the overlap pattern among the $J$ events. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the $J$ rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by $5772$ constraints in $326$ dimensions, with probability below $10^{-22}$, are estimated with a coefficient of variation of about $0.0024$ with only $n=10{,}000$ sample values.
  • We propose a simple algorithm devoted to locate the "corner" of an L-curve, a function often used to chose the correct regularization parameter for the solution of ill-posed problems. The algorithm involves the Menger curvature of a circumcircle and the golden section search method. It efficiently locates the regularization parameter value corresponding to the maximum positive curvature region of the L-curve. As an example, the application of the algorithm to the data processing of an electrical resistance tomography experiment on thin conductive films is reported.
  • We present a full implementation of the parareal algorithm---an integration technique to solve differential equations in parallel---in the Julia programming language for a fully general, first-order, initial-value problem. We provide a brief overview of Julia---a concurrent programming language for scientific computing. Our implementation of the parareal algorithm accepts both coarse and fine integrators as functional arguments. We use Euler's method and another Runge-Kutta integration technique as the integrators in our experiments. We also present a simulation of the algorithm for purposes of pedagogy and as a tool for investigating the performance of the algorithm.
  • This work is devoted to the design of interior penalty discontinuous Galerkin (dG) schemes that preserve maximum principles at the discrete level for the steady transport and convection-diffusion problems and the respective transient problems with implicit time integration. Monotonic schemes that combine explicit time stepping with dG space discretization are very common, but the design of such schemes for implicit time stepping is rare, and it had only been attained so far for 1D problems. The proposed scheme is based on an artificial diffusion that linearly depends on a shock detector that identifies the troublesome areas. In order to define the new shock detector, we have introduced the concept of discrete local extrema. The diffusion operator is a graph-Laplacian, instead of the more common finite element discretization of the Laplacian operator, which is essential to keep monotonicity on general meshes and in multi-dimension. The resulting nonlinear stabilization is non-smooth and nonlinear solvers can fail to converge. As a result, we propose a smoothed (twice differentiable) version of the nonlinear stabilization, which allows us to use Newton with line search nonlinear solvers and dramatically improve nonlinear convergence. A theoretical numerical analysis of the proposed schemes show that they satisfy the desired monotonicity properties. Further, the resulting operator is Lipschitz continuous and there exists at least one solution of the discrete problem, even in the non-smooth version. We provide a set of numerical results to support our findings.
  • Tensors are a natural way to express correlations among many physical variables, but storing tensors in a computer naively requires memory which scales exponentially in the rank of the tensor. This is not optimal, as the required memory is actually set not by the rank but by the mutual information amongst the variables in question. Representations such as the tensor tree perform near-optimally when the tree decomposition is chosen to reflect the correlation structure in question, but making such a choice is non-trivial and good heuristics remain highly context-specific. In this work I present two new algorithms for choosing efficient tree decompositions, independent of the physical context of the tensor. The first is a brute-force algorithm which produces optimal decompositions up to truncation error but is generally impractical for high-rank tensors, as the number of possible choices grows exponentially in rank. The second is a greedy algorithm, and while it is not optimal it performs extremely well in numerical experiments while having runtime which makes it practical even for tensors of very high rank.
  • This work proposes a space-time least-squares Petrov-Galerkin (ST-LSPG) projection method for model reduction of nonlinear dynamical systems. In contrast to typical nonlinear model-reduction methods that first apply (Petrov-)Galerkin projection in the spatial dimension and subsequently apply time integration to numerically resolve the resulting low-dimensional dynamical system, the proposed method applies projection in space and time simultaneously. To accomplish this, the method first introduces a low-dimensional space-time trial subspace, which can be obtained by computing tensor decompositions of state-snapshot data. The method then computes discrete-optimal approximations in this space-time trial subspace by minimizing the residual arising after time discretization over all space and time in a weighted $\ell^2$-norm. This norm can be defined to enable complexity reduction (i.e., hyper-reduction) in time, which leads to space-time collocation and space-time GNAT variants of the ST-LSPG method. Advantages of the approach relative to typical spatial-projection-based nonlinear model reduction methods such as Galerkin projection and least-squares Petrov-Galerkin projection include: (1) a reduction of both the spatial and temporal dimensions of the dynamical system, (2) the removal of spurious temporal modes (e.g., unstable growth) from the state space, and (3) error bounds that exhibit slower growth in time. Numerical examples performed on model problems in fluid dynamics demonstrate the ability of the method to generate orders-of-magnitude computational savings relative to spatial-projection-based reduced-order models without sacrificing accuracy.
  • Numerical accuracy of floating point computation is a well studied topic which has not made its way to the end-user in scientific computing. Yet, it has become a critical issue with the recent requirements for code modernization to harness new highly parallel hardware and perform higher resolution computation. To democratize numerical accuracy analysis, it is important to propose tools and methodologies to study large use cases in a reliable and automatic way. In this paper, we propose verificarlo, an extension to the LLVM compiler to automatically use Monte Carlo Arithmetic in a transparent way for the end-user. It supports all the major languages including C, C++, and Fortran. Unlike source-to-source approaches, our implementation captures the influence of compiler optimizations on the numerical accuracy. We illustrate how Monte Carlo Arithmetic using the verificarlo tool outperforms the existing approaches on various use cases and is a step toward automatic numerical analysis.
  • Functions of one or more variables are usually approximated with a basis: a complete, linearly-independent system of functions that spans a suitable function space. The topic of this paper is the numerical approximation of functions using the more general notion of frames: that is, complete systems that are generally redundant but provide infinite representations with bounded coefficients. While frames are well-known in image and signal processing, coding theory and other areas of applied mathematics, their use in numerical analysis is far less widespread. Yet, as we show via a series of examples, frames are more flexible than bases, and can be constructed easily in a range of problems where finding orthonormal bases with desirable properties (rapid convergence, high resolution power, etc.) is difficult or impossible. A key concern when using frames is that computing a best approximation requires solving an ill-conditioned linear system. Nonetheless, we construct a frame approximation via regularization with bounded condition number (with respect to perturbations in the data), and which approximates any function up to an error of order $\sqrt{\epsilon}$, or even of order $\epsilon$ with suitable modifications. Here $\epsilon$ is a threshold value that can be chosen by the user. Crucially, rate of decay of the error down to this level is determined by the existence of approximate representations of $f$ in the frame possessing small-norm coefficients. We demonstrate the existence of such representations in all of our examples. Overall, our analysis suggests that frames are a natural generalization of bases in which to develop numerical approximation. In particular, even in the presence of severely ill-conditioned linear systems, the frame condition imposes sufficient mathematical structure in order to give rise to accurate, well-conditioned approximations.
  • There is resurging interest, in statistics and machine learning, in solvers for ordinary differential equations (ODEs) that return probability measures instead of point estimates. Recently, Conrad et al. introduced a sampling-based class of methods that are 'well-calibrated' in a specific sense. But the computational cost of these methods is significantly above that of classic methods. On the other hand, Schober et al. pointed out a precise connection between classic Runge-Kutta ODE solvers and Gaussian filters, which gives only a rough probabilistic calibration, but at negligible cost overhead. By formulating the solution of ODEs as approximate inference in linear Gaussian SDEs, we investigate a range of probabilistic ODE solvers, that bridge the trade-off between computational cost and probabilistic calibration, and identify the inaccurate gradient measurement as the crucial source of uncertainty. We propose the novel filtering-based method Bayesian Quadrature filtering (BQF) which uses Bayesian quadrature to actively learn the imprecision in the gradient measurement by collecting multiple gradient evaluations.
  • This paper presents a convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.
  • We solve tensor balancing, rescaling an Nth order nonnegative tensor by multiplying N tensors of order N - 1 so that every fiber sums to one. This generalizes a fundamental process of matrix balancing used to compare matrices in a wide range of applications from biology to economics. We present an efficient balancing algorithm with quadratic convergence using Newton's method and show in numerical experiments that the proposed algorithm is several orders of magnitude faster than existing ones. To theoretically prove the correctness of the algorithm, we model tensors as probability distributions in a statistical manifold and realize tensor balancing as projection onto a submanifold. The key to our algorithm is that the gradient of the manifold, used as a Jacobian matrix in Newton's method, can be analytically obtained using the Moebius inversion formula, the essential of combinatorial mathematics. Our model is not limited to tensor balancing, but has a wide applicability as it includes various statistical and machine learning models such as weighted DAGs and Boltzmann machines.
  • The paper proposes a combination of the subdomain deflation method and local algebraic multigrid as a scalable distributed memory preconditioner that is able to solve large linear systems of equations. The implementation of the algorithm is made available for the community as part of an open source AMGCL library. The solution targets both homogeneous (CPU-only) and heterogeneous (CPU/GPU) systems, employing hybrid MPI/OpenMP approach in the former and a combination of MPI, OpenMP, and CUDA in the latter cases. The use of OpenMP minimizes the number of MPI processes, thus reducing the communication overhead of the deflation method and improving both weak and strong scalability of the preconditioner. The examples of scalar, Poisson-like, systems as well as non-scalar problems, stemming out of the discretization of the Navier-Stokes equations, are considered in order to estimate performance of the implemented algorithm. A comparison with a traditional global AMG preconditioner based on a well-established Trilinos ML package is provided.
  • Identifying important components in a network is one of the major goals of network analysis. Popular and effective measures of importance of a node or a set of nodes are defined in terms of suitable entries of functions of matrices $f(A)$. These kinds of measures are particularly relevant as they are able to capture the global structure of connections involving a node. However, computing the entries of $f(A)$ requires a significant computational effort. In this work we address the problem of estimating the changes in the entries of $f(A)$ with respect to changes in the edge structure. Intuition suggests that, if the topology of connections in the new graph $\tilde G$ is not significantly distorted, relevant components in $G$ maintain their leading role in $\tilde G$. We propose several bounds giving mathematical reasoning to such intuition and showing, in particular, that the magnitude of the variation of the entry $f(A)_{k\ell}$ decays exponentially with the shortest-path distance in $G$ that separates either $k$ or $\ell$ from the set of nodes touched by the edges that are perturbed. Moreover, we propose a simple method that exploits the computation of $f(A)$ to simultaneously compute the all-pairs shortest-path distances of $G$, with essentially no additional cost. As the nodes whose edge connection tends to change more often or tends to be more often affected by noise have marginal role in the graph and are distant from the most central nodes, the proposed bounds are particularly relevant.
  • Inverse problems appear in many applications, such as image deblurring and inpainting. The common approach to address them is to design a specific algorithm for each problem. The Plug-and-Play (P&P) framework, which has been recently introduced, allows solving general inverse problems by leveraging the impressive capabilities of existing denoising algorithms. While this fresh strategy has found many applications, a burdensome parameter tuning is often required in order to obtain high-quality results. In this work, we propose an alternative method for solving inverse problems using off-the-shelf denoisers, which requires less parameter tuning. First, we transform a typical cost function, composed of fidelity and prior terms, into a closely related, novel optimization problem. Then, we propose an efficient minimization scheme with a plug-and-play property, i.e., the prior term is handled solely by a denoising operation. Finally, we present an automatic tuning mechanism to set the method's parameters. We provide a theoretical analysis of the method, and empirically demonstrate its competitiveness with task-specific techniques and the P&P approach for image inpainting and deblurring.
  • High order cumulant tensors carry information about statistics of non-normally distributed multivariate data. In this work we present a new efficient algorithm for calculation of cumulants of arbitrary order in a sliding window for data streams. We showed that this algorithms enables speedups of cumulants updates compared to current algorithms. This algorithm can be used for processing on-line high-frequency multivariate data and can find applications in, e.g., on-line signal filtering and classification of data streams. To present an application of this algorithm, we propose an estimator of non-Gaussianity of a data stream based on the norms of high-order cumulant tensors. We show how to detect the transition from Gaussian distributed data to non-Gaussian ones in a~data stream. In order to achieve high implementation efficiency of operations on super-symmetric tensors, such as cumulant tensors, we employ the block structure to store and calculate only one hyper-pyramid part of such tensors.