
This summary of the doctoral thesis is created to emphasize the close
connection of the proposed spectral analysis method with the Discrete Fourier
Transform (DFT), the most extensively studied and frequently used approach in
the history of signal processing. It is shown that in a typical application
case, where uniform data readings are transformed to the same number of
uniformly spaced frequencies, the results of the classical DFT and proposed
approach coincide. The difference in performance appears when the length of the
DFT is selected to be greater than the length of the data. The DFT solves the
unknown data problem by padding readings with zeros up to the length of the
DFT, while the proposed Extended DFT (EDFT) deals with this situation in a
different way, it uses the Fourier integral transform as a target and optimizes
the transform basis in the extended frequency range without putting such
restrictions on the time domain. Consequently, the Inverse DFT (IDFT) applied
to the result of EDFT returns not only known readings, but also the
extrapolated data, where classical DFT is able to give back just zeros, and
higher resolution are achieved at frequencies where the data has been
successfully extended. It has been demonstrated that EDFT able to process data
with missing readings or gaps inside or even nonuniformly distributed data.
Thus, EDFT significantly extends the usability of the DFTbased methods, where
previously these approaches have been considered as not applicable. The EDFT
founds the solution in an iterative way and requires repeated calculations to
get the adaptive basis, and this makes it numerical complexity much higher
compared to DFT. This disadvantage was a serious problem in the 1990s, when the
method has been proposed. Fortunately, since then the power of computers has
increased so much that nowadays EDFT application could be considered as a real
alternative.

Bayesian matrix factorization (BMF) is a powerful tool for producing lowrank
representations of matrices and for predicting missing values and providing
confidence intervals. Scaling up the posterior inference for massivescale
matrices is challenging and requires distributing both data and computation
over many workers, making communication the main computational bottleneck.
Embarrassingly parallel inference would remove the communication needed, by
using completely independent computations on different data subsets, but it
suffers from the inherent unidentifiability of BMF solutions. We introduce a
hierarchical decomposition of the joint posterior distribution, which couples
the subset inferences, allowing for embarrassingly parallel computations in a
sequence of at most three stages. Using an efficient approximate
implementation, we show improvements empirically on both real and simulated
data. Our distributed approach is able to achieve a speedup of almost an order
of magnitude over the full posterior, with a negligible effect on predictive
accuracy. Our method outperforms stateoftheart embarrassingly parallel MCMC
methods in accuracy, and achieves results competitive to other available
distributed and parallel implementations of BMF.

Deep neural networks have become invaluable tools for supervised machine
learning, e.g., classification of text or images. While often offering superior
results over traditional techniques and successfully expressing complicated
patterns in data, deep architectures are known to be challenging to design and
train such that they generalize well to new data. Important issues with deep
architectures are numerical instabilities in derivativebased learning
algorithms commonly called exploding or vanishing gradients. In this paper we
propose new forward propagation techniques inspired by systems of Ordinary
Differential Equations (ODE) that overcome this challenge and lead to
wellposed learning problems for arbitrarily deep networks.
The backbone of our approach is our interpretation of deep learning as a
parameter estimation problem of nonlinear dynamical systems. Given this
formulation, we analyze stability and wellposedness of deep learning and use
this new understanding to develop new network architectures. We relate the
exploding and vanishing gradient phenomenon to the stability of the discrete
ODE and present several strategies for stabilizing deep learning for very deep
networks. While our new architectures restrict the solution space, several
numerical experiments show their competitiveness with stateoftheart
networks.

In numerical timeintegration with implicitexplicit (IMEX) methods, a
withinstep adaptable decomposition called residual balanced decomposition is
introduced. With this decomposition, the requirement of a small enough residual
in the iterative solver can be removed, consequently, this allows to exchange
stability for efficiency. This decomposition transfers any residual occurring
in the implicit equation of the implicitstep into the explicit part of the
decomposition. By balancing the residual, the accuracy of the local truncation
error of the timestepping method becomes independent from the accuracy by
which the implicit equation is solved. In order to balance the residual, the
original IMEX decomposition is adjusted after the iterative solver has been
stopped. For this to work, the traditional IMEX timestepping algorithm needs
to be changed. We call this new method the shortcutIMEX (SIMEX). SIMEX can
gain computational efficiency by exploring the tradeoff between the
computational effort placed in solving the implicit equation and the size of
the numerically stable timestep. Typically, increasing the number of solver
iterations increases the largest stable stepsize. Both multistep and
RungeKutta (RK) methods are suitable for use with SIMEX. Here, we show the
efficiency of a SIMEXRK method in overcoming parabolic stiffness by applying
it to a nonlinear reactionadvectiondiffusion equation. In order to define a
stability region for SIMEX, a region in the complex plane is depicted by
applying SIMEX to a suitable PDE model containing diffusion and dispersion. A
myriad of stability regions can be reached by changing the RK tableau and the
solver.

This report describes the computation of gradients by algorithmic
differentiation for statistically optimum beamforming operations. Especially
the derivation of complexvalued functions is a key component of this approach.
Therefore the realvalued algorithmic differentiation is extended via the
complexvalued chain rule. In addition to the basic mathematic operations the
derivative of the eigenvalue problem with complexvalued eigenvectors is one of
the key results of this report. The potential of this approach is shown with
experimental results on the CHiME3 challenge database. There, the beamforming
task is used as a frontend for an ASR system. With the developed derivatives a
joint optimization of a speech enhancement and speech recognition system w.r.t.
the recognition optimization criterion is possible.

This work formalizes "Exact Real Computation" (ERC): a paradigm combining (i)
ALGEBRAIC imperative programming of/over abstract data types (ADTs) for
CONTINUOUS structures with (ii) a selection and sound semantics of primitives
computable in the sense of Recursive ANALYSIS, that is, by means of
approximations  yet presented to the user as exact.
We specify a small imperative programming language for the ADT of real (i.e.,
including transcendental) numbers with rigorous semantics: arguments are
provided, passed to and received from calls to functions (like $e^x$), and
operated on EXACTLY  with partial inequality predicate and multivalued binary
SELECT and continuous conditional (aka PARALLELIF) operations  yet REALIZING
a function (again like $e^x$) requires only to APPROXIMATE its return value up
to guaranteed absolute error $2^p$ for any given integer $p$: closure under
composition is implicit. We prove this language TuringCOMPLETE: it can express
precisely those partial real functions computable in the sense of Recursive
Analysis; similarly for functionALs.
Three basic numerical problems demonstrate both the convenience and novel
controlflow considerations of this approach to Reliable Numerics: multivalued
integer rounding, solving systems of linear equations, and simple root finding.
For rigorously specifying and arguing about such (nonextensional)
computations, we propose a firstorder theory over two sorts: integers and
reals; and prove it both decidable and 'model complete': thus reflecting the
elegance inherent to real (as opposed to rational/floating point) numbers.
Rules of Hoare Logic are extended to support formal correctness proofs in ERC.

Several test function suites are being used for numerical benchmarking of
multiobjective optimization algorithms. While they have some desirable
properties, like wellunderstood Pareto sets and Pareto fronts of various
shapes, most of the currently used functions possess characteristics that are
arguably underrepresented in realworld problems. They mainly stem from the
easier construction of such functions and result in improbable properties such
as separability, optima located exactly at the boundary constraints, and the
existence of variables that solely control the distance between a solution and
the Pareto front. Here, we propose an alternative way to constructing
multiobjective problemsby combining existing singleobjective problems from
the literature. We describe in particular the bbobbiobj test suite with 55
biobjective functions in continuous domain, and its extended version with 92
biobjective functions (bbobbiobjext). Both test suites have been implemented
in the COCO platform for blackbox optimization benchmarking. Finally, we
recommend a general procedure for creating test suites for an arbitrary number
of objectives. Besides providing the formal function definitions and presenting
their (known) properties, this paper also aims at giving the rationale behind
our approach in terms of groups of functions with similar properties, objective
space normalization, and problem instances. The latter allows us to easily
compare the performance of deterministic and stochastic solvers, which is an
often overlooked issue in benchmarking.

We study and develop (stochastic) primaldual blockcoordinate descent
methods for convex problems based on the method due to Chambolle and Pock. Our
methods have known convergence rates for the iterates and the ergodic gap:
$O(1/N^2)$ if each block is strongly convex, $O(1/N)$ if no convexity is
present, and more generally a mixed rate $O(1/N^2)+O(1/N)$ for strongly convex
blocks, if only some blocks are strongly convex. Additional novelties of our
methods include blockwiseadapted step lengths and acceleration, as well as the
ability to update both the primal and dual variables randomly in blocks under a
very light compatibility condition. In other words, these variants of our
methods are doublystochastic. We test the proposed methods on various image
processing problems, where we employ pixelwiseadapted acceleration.

This paper summarizes the development of Veamy, an objectoriented C++
library for the virtual element method (VEM) on general polygonal meshes, whose
modular design is focused on its extensibility. The linear elastostatic and
Poisson problems in two dimensions have been chosen as the starting stage for
the development of this library. The theory of the VEM, upon which Veamy is
built, is presented using a notation and a terminology that resemble the
language of the finite element method (FEM) in engineering analysis. Several
examples are provided to demonstrate the usage of Veamy, and in particular, one
of them features the interaction between Veamy and the polygonal mesh generator
PolyMesher. A computational performance comparison between VEM and FEM is also
conducted. Veamy is free and open source software.

This work studies the linear approximation of highdimensional dynamical
systems using lowrank dynamic mode decomposition (DMD). Searching this
approximation in a datadriven approach can be formalised as attempting to
solve a lowrank constrained optimisation problem. This problem is nonconvex
and stateoftheart algorithms are all suboptimal. This paper shows that
there exists a closedform solution, which can be computed in polynomialtime,
and characterises the $\ell_2$norm of the optimal approximation error. The
theoretical results serve to design lowcomplexity algorithms building reduced
models from the optimal solution, based on singular value decomposition or
lowrank DMD. The algorithms are evaluated by numerical simulations using
synthetic and physical data benchmarks.

We consider importance sampling to estimate the probability $\mu$ of a union
of $J$ rare events $H_j$ defined by a random variable $\boldsymbol{x}$. The
sampler we study has been used in spatial statistics, genomics and
combinatorics going back at least to Karp and Luby (1983). It works by sampling
one event at random, then sampling $\boldsymbol{x}$ conditionally on that event
happening and it constructs an unbiased estimate of $\mu$ by multiplying an
inverse moment of the number of occuring events by the union bound. We prove
some variance bounds for this sampler. For a sample size of $n$, it has a
variance no larger than $\mu(\bar\mu\mu)/n$ where $\bar\mu$ is the union
bound. It also has a coefficient of variation no larger than
$\sqrt{(J+J^{1}2)/(4n)}$ regardless of the overlap pattern among the $J$
events. Our motivating problem comes from power system reliability, where the
phase differences between connected nodes have a joint Gaussian distribution
and the $J$ rare events arise from unacceptably large phase differences. In the
grid reliability problems even some events defined by $5772$ constraints in
$326$ dimensions, with probability below $10^{22}$, are estimated with a
coefficient of variation of about $0.0024$ with only $n=10{,}000$ sample
values.

We propose a simple algorithm devoted to locate the "corner" of an Lcurve, a
function often used to chose the correct regularization parameter for the
solution of illposed problems. The algorithm involves the Menger curvature of
a circumcircle and the golden section search method. It efficiently locates the
regularization parameter value corresponding to the maximum positive curvature
region of the Lcurve. As an example, the application of the algorithm to the
data processing of an electrical resistance tomography experiment on thin
conductive films is reported.

We present a full implementation of the parareal algorithman integration
technique to solve differential equations in parallelin the Julia
programming language for a fully general, firstorder, initialvalue problem.
We provide a brief overview of Juliaa concurrent programming language for
scientific computing. Our implementation of the parareal algorithm accepts both
coarse and fine integrators as functional arguments. We use Euler's method and
another RungeKutta integration technique as the integrators in our
experiments. We also present a simulation of the algorithm for purposes of
pedagogy and as a tool for investigating the performance of the algorithm.

This work is devoted to the design of interior penalty discontinuous Galerkin
(dG) schemes that preserve maximum principles at the discrete level for the
steady transport and convectiondiffusion problems and the respective transient
problems with implicit time integration. Monotonic schemes that combine
explicit time stepping with dG space discretization are very common, but the
design of such schemes for implicit time stepping is rare, and it had only been
attained so far for 1D problems. The proposed scheme is based on an artificial
diffusion that linearly depends on a shock detector that identifies the
troublesome areas. In order to define the new shock detector, we have
introduced the concept of discrete local extrema. The diffusion operator is a
graphLaplacian, instead of the more common finite element discretization of
the Laplacian operator, which is essential to keep monotonicity on general
meshes and in multidimension. The resulting nonlinear stabilization is
nonsmooth and nonlinear solvers can fail to converge. As a result, we propose
a smoothed (twice differentiable) version of the nonlinear stabilization, which
allows us to use Newton with line search nonlinear solvers and dramatically
improve nonlinear convergence. A theoretical numerical analysis of the proposed
schemes show that they satisfy the desired monotonicity properties. Further,
the resulting operator is Lipschitz continuous and there exists at least one
solution of the discrete problem, even in the nonsmooth version. We provide a
set of numerical results to support our findings.

Tensors are a natural way to express correlations among many physical
variables, but storing tensors in a computer naively requires memory which
scales exponentially in the rank of the tensor. This is not optimal, as the
required memory is actually set not by the rank but by the mutual information
amongst the variables in question. Representations such as the tensor tree
perform nearoptimally when the tree decomposition is chosen to reflect the
correlation structure in question, but making such a choice is nontrivial and
good heuristics remain highly contextspecific. In this work I present two new
algorithms for choosing efficient tree decompositions, independent of the
physical context of the tensor. The first is a bruteforce algorithm which
produces optimal decompositions up to truncation error but is generally
impractical for highrank tensors, as the number of possible choices grows
exponentially in rank. The second is a greedy algorithm, and while it is not
optimal it performs extremely well in numerical experiments while having
runtime which makes it practical even for tensors of very high rank.

This work proposes a spacetime leastsquares PetrovGalerkin (STLSPG)
projection method for model reduction of nonlinear dynamical systems. In
contrast to typical nonlinear modelreduction methods that first apply
(Petrov)Galerkin projection in the spatial dimension and subsequently apply
time integration to numerically resolve the resulting lowdimensional dynamical
system, the proposed method applies projection in space and time
simultaneously. To accomplish this, the method first introduces a
lowdimensional spacetime trial subspace, which can be obtained by computing
tensor decompositions of statesnapshot data. The method then computes
discreteoptimal approximations in this spacetime trial subspace by minimizing
the residual arising after time discretization over all space and time in a
weighted $\ell^2$norm. This norm can be defined to enable complexity reduction
(i.e., hyperreduction) in time, which leads to spacetime collocation and
spacetime GNAT variants of the STLSPG method. Advantages of the approach
relative to typical spatialprojectionbased nonlinear model reduction methods
such as Galerkin projection and leastsquares PetrovGalerkin projection
include: (1) a reduction of both the spatial and temporal dimensions of the
dynamical system, (2) the removal of spurious temporal modes (e.g., unstable
growth) from the state space, and (3) error bounds that exhibit slower growth
in time. Numerical examples performed on model problems in fluid dynamics
demonstrate the ability of the method to generate ordersofmagnitude
computational savings relative to spatialprojectionbased reducedorder models
without sacrificing accuracy.

Numerical accuracy of floating point computation is a well studied topic
which has not made its way to the enduser in scientific computing. Yet, it has
become a critical issue with the recent requirements for code modernization to
harness new highly parallel hardware and perform higher resolution computation.
To democratize numerical accuracy analysis, it is important to propose tools
and methodologies to study large use cases in a reliable and automatic way. In
this paper, we propose verificarlo, an extension to the LLVM compiler to
automatically use Monte Carlo Arithmetic in a transparent way for the enduser.
It supports all the major languages including C, C++, and Fortran. Unlike
sourcetosource approaches, our implementation captures the influence of
compiler optimizations on the numerical accuracy. We illustrate how Monte Carlo
Arithmetic using the verificarlo tool outperforms the existing approaches on
various use cases and is a step toward automatic numerical analysis.

Functions of one or more variables are usually approximated with a basis: a
complete, linearlyindependent system of functions that spans a suitable
function space. The topic of this paper is the numerical approximation of
functions using the more general notion of frames: that is, complete systems
that are generally redundant but provide infinite representations with bounded
coefficients. While frames are wellknown in image and signal processing,
coding theory and other areas of applied mathematics, their use in numerical
analysis is far less widespread. Yet, as we show via a series of examples,
frames are more flexible than bases, and can be constructed easily in a range
of problems where finding orthonormal bases with desirable properties (rapid
convergence, high resolution power, etc.) is difficult or impossible.
A key concern when using frames is that computing a best approximation
requires solving an illconditioned linear system. Nonetheless, we construct a
frame approximation via regularization with bounded condition number (with
respect to perturbations in the data), and which approximates any function up
to an error of order $\sqrt{\epsilon}$, or even of order $\epsilon$ with
suitable modifications. Here $\epsilon$ is a threshold value that can be chosen
by the user. Crucially, rate of decay of the error down to this level is
determined by the existence of approximate representations of $f$ in the frame
possessing smallnorm coefficients. We demonstrate the existence of such
representations in all of our examples. Overall, our analysis suggests that
frames are a natural generalization of bases in which to develop numerical
approximation. In particular, even in the presence of severely illconditioned
linear systems, the frame condition imposes sufficient mathematical structure
in order to give rise to accurate, wellconditioned approximations.

There is resurging interest, in statistics and machine learning, in solvers
for ordinary differential equations (ODEs) that return probability measures
instead of point estimates. Recently, Conrad et al. introduced a samplingbased
class of methods that are 'wellcalibrated' in a specific sense. But the
computational cost of these methods is significantly above that of classic
methods. On the other hand, Schober et al. pointed out a precise connection
between classic RungeKutta ODE solvers and Gaussian filters, which gives only
a rough probabilistic calibration, but at negligible cost overhead. By
formulating the solution of ODEs as approximate inference in linear Gaussian
SDEs, we investigate a range of probabilistic ODE solvers, that bridge the
tradeoff between computational cost and probabilistic calibration, and
identify the inaccurate gradient measurement as the crucial source of
uncertainty. We propose the novel filteringbased method Bayesian Quadrature
filtering (BQF) which uses Bayesian quadrature to actively learn the
imprecision in the gradient measurement by collecting multiple gradient
evaluations.

This paper presents a convergence analysis of kernelbased quadrature rules
in misspecified settings, focusing on deterministic quadrature in Sobolev
spaces. In particular, we deal with misspecified settings where a test
integrand is less smooth than a Sobolev RKHS based on which a quadrature rule
is constructed. We provide convergence guarantees based on two different
assumptions on a quadrature rule: one on quadrature weights, and the other on
design points. More precisely, we show that convergence rates can be derived
(i) if the sum of absolute weights remains constant (or does not increase
quickly), or (ii) if the minimum distance between design points does not
decrease very quickly. As a consequence of the latter result, we derive a rate
of convergence for Bayesian quadrature in misspecified settings. We reveal a
condition on design points to make Bayesian quadrature robust to
misspecification, and show that, under this condition, it may adaptively
achieve the optimal rate of convergence in the Sobolev space of a lesser order
(i.e., of the unknown smoothness of a test integrand), under a slightly
stronger regularity condition on the integrand.

We solve tensor balancing, rescaling an Nth order nonnegative tensor by
multiplying N tensors of order N  1 so that every fiber sums to one. This
generalizes a fundamental process of matrix balancing used to compare matrices
in a wide range of applications from biology to economics. We present an
efficient balancing algorithm with quadratic convergence using Newton's method
and show in numerical experiments that the proposed algorithm is several orders
of magnitude faster than existing ones. To theoretically prove the correctness
of the algorithm, we model tensors as probability distributions in a
statistical manifold and realize tensor balancing as projection onto a
submanifold. The key to our algorithm is that the gradient of the manifold,
used as a Jacobian matrix in Newton's method, can be analytically obtained
using the Moebius inversion formula, the essential of combinatorial
mathematics. Our model is not limited to tensor balancing, but has a wide
applicability as it includes various statistical and machine learning models
such as weighted DAGs and Boltzmann machines.

The paper proposes a combination of the subdomain deflation method and local
algebraic multigrid as a scalable distributed memory preconditioner that is
able to solve large linear systems of equations. The implementation of the
algorithm is made available for the community as part of an open source AMGCL
library. The solution targets both homogeneous (CPUonly) and heterogeneous
(CPU/GPU) systems, employing hybrid MPI/OpenMP approach in the former and a
combination of MPI, OpenMP, and CUDA in the latter cases. The use of OpenMP
minimizes the number of MPI processes, thus reducing the communication overhead
of the deflation method and improving both weak and strong scalability of the
preconditioner. The examples of scalar, Poissonlike, systems as well as
nonscalar problems, stemming out of the discretization of the NavierStokes
equations, are considered in order to estimate performance of the implemented
algorithm. A comparison with a traditional global AMG preconditioner based on a
wellestablished Trilinos ML package is provided.

Identifying important components in a network is one of the major goals of
network analysis. Popular and effective measures of importance of a node or a
set of nodes are defined in terms of suitable entries of functions of matrices
$f(A)$. These kinds of measures are particularly relevant as they are able to
capture the global structure of connections involving a node. However,
computing the entries of $f(A)$ requires a significant computational effort. In
this work we address the problem of estimating the changes in the entries of
$f(A)$ with respect to changes in the edge structure. Intuition suggests that,
if the topology of connections in the new graph $\tilde G$ is not significantly
distorted, relevant components in $G$ maintain their leading role in $\tilde
G$. We propose several bounds giving mathematical reasoning to such intuition
and showing, in particular, that the magnitude of the variation of the entry
$f(A)_{k\ell}$ decays exponentially with the shortestpath distance in $G$ that
separates either $k$ or $\ell$ from the set of nodes touched by the edges that
are perturbed. Moreover, we propose a simple method that exploits the
computation of $f(A)$ to simultaneously compute the allpairs shortestpath
distances of $G$, with essentially no additional cost. As the nodes whose edge
connection tends to change more often or tends to be more often affected by
noise have marginal role in the graph and are distant from the most central
nodes, the proposed bounds are particularly relevant.

Inverse problems appear in many applications, such as image deblurring and
inpainting. The common approach to address them is to design a specific
algorithm for each problem. The PlugandPlay (P&P) framework, which has been
recently introduced, allows solving general inverse problems by leveraging the
impressive capabilities of existing denoising algorithms. While this fresh
strategy has found many applications, a burdensome parameter tuning is often
required in order to obtain highquality results. In this work, we propose an
alternative method for solving inverse problems using offtheshelf denoisers,
which requires less parameter tuning. First, we transform a typical cost
function, composed of fidelity and prior terms, into a closely related, novel
optimization problem. Then, we propose an efficient minimization scheme with a
plugandplay property, i.e., the prior term is handled solely by a denoising
operation. Finally, we present an automatic tuning mechanism to set the
method's parameters. We provide a theoretical analysis of the method, and
empirically demonstrate its competitiveness with taskspecific techniques and
the P&P approach for image inpainting and deblurring.

High order cumulant tensors carry information about statistics of
nonnormally distributed multivariate data. In this work we present a new
efficient algorithm for calculation of cumulants of arbitrary order in a
sliding window for data streams. We showed that this algorithms enables
speedups of cumulants updates compared to current algorithms. This algorithm
can be used for processing online highfrequency multivariate data and can
find applications in, e.g., online signal filtering and classification of data
streams.
To present an application of this algorithm, we propose an estimator of
nonGaussianity of a data stream based on the norms of highorder cumulant
tensors.
We show how to detect the transition from Gaussian distributed data to
nonGaussian ones in a~data stream. In order to achieve high implementation
efficiency of operations on supersymmetric tensors, such as cumulant tensors,
we employ the block structure to store and calculate only one hyperpyramid
part of such tensors.