
This paper generalizes the borderedalgebraic knot invariant introduced in an
earlier paper, giving an invariant now with more algebraic structure. It also
introduces signs to define these invariants with integral coefficients. We
describe effective computations of the resulting invariant.

Maximum mean discrepancy (MMD), also called energy distance or Ndistance in
statistics and HilbertSchmidt independence criterion (HSIC), specifically
distance covariance in statistics, are among the most popular and successful
approaches to quantify the difference and independence of random variables,
respectively. Thanks to their kernelbased foundations, MMD and HSIC are
applicable on a wide variety of domains. Despite their tremendous success,
quite little is known about when HSIC characterizes independence and when MMD
with tensor product kernel can discriminate probability distributions. In this
paper, we answer these questions by studying various notions of characteristic
property of the tensor product kernel.

Mean embeddings provide an extremely flexible and powerful tool in machine
learning and statistics to represent probability distributions and define a
semimetric (MMD, maximum mean discrepancy; also called Ndistance or energy
distance), with numerous successful applications. The representation is
constructed as the expectation of the feature map defined by a kernel. As a
mean, its classical empirical estimator, however, can be arbitrary severely
affected even by a single outlier in case of unbounded features. To the best of
our knowledge, unfortunately even the consistency of the existing few
techniques trying to alleviate this serious sensitivity bottleneck is unknown.
In this paper, we show how the recently emerged principle of medianofmeans
can be used to design minimaxoptimal estimators for kernel mean embedding and
MMD, with finitesample strong outlierrobustness guarantees.

We define and study a bigraded knot invariant whose Euler characteristic is
the Alexander polynomial, closely connected to knot Floer homology. The
invariant is the homology of a chain complex whose generators correspond to
Kauffman states for a knot diagram. The definition uses decompositions of knot
diagrams: to a collection of points on the line, we associate a differential
graded algebra; to a partial knot diagram, we associate modules over the
algebra. The knot invariant is obtained from these modules by an appropriate
tensor product.

Knot Floer homology is an invariant for knots discovered by the authors and,
independently, Jacob Rasmussen. The discovery of this invariant grew naturally
out of studying how a certain threemanifold invariant, Heegaard Floer
homology, changes as the threemanifold undergoes Dehn surgery along a knot.
Since its original definition, thanks to the contributions of many researchers,
knot Floer homology has emerged as a useful tool for studying knots in its own
right. We give here a few selected highlights of this theory, and then move on
to some new algebraic developments in the computation of knot Floer homology.

We modify the construction of knot Floer homology to produce a oneparameter
family of homologies for knots in the threesphere. These invariants can be
used to give homomorphisms from the smooth concordance group to the integers,
giving bounds on the fourball genus and the concordance genus of knots. We
give some applications of these homomorphisms.

We propose a novel adaptive test of goodnessoffit, with computational cost
linear in the number of samples. We learn the test features that best indicate
the differences between observed samples and a reference model, by minimizing
the false negative rate. These features are constructed via Stein's method,
meaning that it is not necessary to compute the normalising constant of the
model. We analyse the asymptotic Bahadur efficiency of the new test, and prove
that under a meanshift alternative, our test always has greater relative
efficiency than a previous lineartime kernel test, regardless of the choice of
parameters for that test. In experiments, the performance of our method exceeds
that of the earlier lineartime test, and matches or exceeds the power of a
quadratictime kernel test. In high dimensions and where model structure may be
exploited, our goodness of fit test performs far better than a quadratictime
twosample test based on the Maximum Mean Discrepancy, with samples drawn from
the model.

Two semimetrics on probability distributions are proposed, given as the sum
of differences of expectations of analytic functions evaluated at spatial or
frequency locations (i.e, features). The features are chosen so as to maximize
the distinguishability of the distributions, by optimizing a lower bound on
test power for a statistical test using these features. The result is a
parsimonious and interpretable indication of how and where two distributions
differ locally. An empirical estimate of the test power criterion converges
with increasing sample size, ensuring the quality of the returned features. In
realworld benchmarks on highdimensional text and image data, lineartime
tests using the proposed semimetrics achieve comparable performance to the
stateoftheart quadratictime maximum mean discrepancy test, while returning
humaninterpretable features that explain the test results.

We focus on the distribution regression problem: regressing to vectorvalued
outputs from probability measures. Many important machine learning and
statistical tasks fit into this framework, including multiinstance learning
and point estimation problems without analytical solution (such as
hyperparameter or entropy estimation). Despite the large number of available
heuristics in the literature, the inherent twostage sampled nature of the
problem makes the theoretical analysis quite challenging, since in practice
only samples from sampled distributions are observable, and the estimates have
to rely on similarities computed between sets of points. To the best of our
knowledge, the only existing technique with consistency guarantees for
distribution regression requires kernel density estimation as an intermediate
step (which often performs poorly in practice), and the domain of the
distributions to be compact Euclidean. In this paper, we study a simple,
analytically computable, ridge regressionbased alternative to distribution
regression, where we embed the distributions to a reproducing kernel Hilbert
space, and learn the regressor from the embeddings to the outputs. Our main
contribution is to prove that this scheme is consistent in the twostage
sampled setup under mild conditions (on separable topological domains enriched
with kernels): we present an exact computationalstatistical efficiency
tradeoff analysis showing that our estimator is able to match the onestage
sampled minimax optimal rate [Caponnetto and De Vito, 2007; Steinwart et al.,
2009]. This result answers a 17yearold open question, establishing the
consistency of the classical set kernel [Haussler, 1999; Gaertner et. al, 2002]
in regression. We also cover consistency for more recent kernels on
distributions, including those due to [Christmann and Steinwart, 2010].

A new computationally efficient dependence measure, and an adaptive
statistical test of independence, are proposed. The dependence measure is the
difference between analytic embeddings of the joint distribution and the
product of the marginals, evaluated at a finite set of locations (features).
These features are chosen so as to maximize a lower bound on the test power,
resulting in a test that is dataefficient, and that runs in linear time (with
respect to the sample size n). The optimized features can be interpreted as
evidence to reject the null hypothesis, indicating regions in the joint domain
where the joint distribution and the product of the marginals differ most.
Consistency of the independence test is established, for an appropriate choice
of features. In realworld benchmarks, independence tests using the optimized
features perform comparably to the stateoftheart quadratictime HSIC test,
and outperform competing O(n) and O(n log n) tests.

We study the relationship between BarNatan's perturbation in Khovanov
homology and Szabo's geometric spectral sequence, and construct a link
invariant that generalizes both into a common theory. We study a few properties
of the new invariant, and introduce a family of sinvariants from the new
theory in the same spirit as Rasmussen's sinvariant.

We introduce the Locally Linear Latent Variable Model (LLLVM), a
probabilistic model for nonlinear manifold discovery that describes a joint
distribution over observations, their manifold coordinates and locally linear
maps conditioned on a set of neighbourhood relationships. The model allows
straightforward variational optimisation of the posterior distribution on
coordinates and locally linear maps from the latent space to the observation
space given the data. Thus, the LLLVM encapsulates the localgeometry
preserving intuitions that underlie nonprobabilistic methods such as locally
linear embedding (LLE). Its probabilistic semantics make it easy to evaluate
the quality of hypothesised neighbourhood relationships, select the intrinsic
dimensionality of the manifold, construct outofsample extensions and to
combine the manifold model with additional probabilistic models that capture
the structure of coordinates within the manifold.

We propose Kernel Hamiltonian Monte Carlo (KMC), a gradientfree adaptive
MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities
where classical HMC is not an option due to intractable gradients, KMC
adaptively learns the target's gradient structure by fitting an exponential
family model in a Reproducing Kernel Hilbert Space. Computational costs are
reduced by two novel efficient approximations to this gradient. While being
asymptotically exact, KMC mimics HMC in terms of sampling efficiency, and
offers substantial mixing improvements over stateoftheart gradient free
samplers. We support our claims with experimental studies on both toy and
realworld applications, including Approximate Bayesian Computation and
exactapproximate MCMC.

Kernel methods represent one of the most powerful tools in machine learning
to tackle problems expressed in terms of function values and derivatives due to
their capability to represent and model complex relations. While these methods
show good versatility, they are computationally intensive and have poor
scalability to large data as they require operations on Gram matrices. In order
to mitigate this serious computational limitation, recently randomized
constructions have been proposed in the literature, which allow the application
of fast linear algorithms. Random Fourier features (RFF) are among the most
popular and widely applied constructions: they provide an easily computable,
lowdimensional feature representation for shiftinvariant kernels. Despite the
popularity of RFFs, very little is understood theoretically about their
approximation quality. In this paper, we provide a detailed finitesample
theoretical analysis about the approximation quality of RFFs by (i)
establishing optimal (in terms of the RFF dimension, and growing set size)
performance guarantees in uniform norm, and (ii) presenting guarantees in $L^r$
($1\le r<\infty$) norms. We also propose an RFF approximation to derivatives of
a kernel with a theoretical study on its approximation quality.

In an earlier work, we introduced a family of tmodified knot Floer
homologies, defined by modifying the construction of knot Floer homology
HFKminus. The resulting groups were then used to define concordance
homomorphisms indexed by t in [0,2]. In the present work we elaborate on the
special case t=1, and call the corresponding modified knot Floer homology the
unoriented knot Floer homology. Using elementary methods (based on grid
diagrams and normal forms for surface cobordisms), we show that the resulting
concordance homomorphism gives a lower bound for the smooth 4dimensional
crosscap number of a knot K  the minimal first Betti number of a smooth
(possibly nonorientable) surface in the 4disk that meets the boundary
3sphere along the given knot K.

We propose an efficient nonparametric strategy for learning a message
operator in expectation propagation (EP), which takes as input the set of
incoming messages to a factor node, and produces an outgoing message as output.
This learned operator replaces the multivariate integral required in classical
EP, which may not have an analytic expression. We use kernelbased regression,
which is trained on a set of probability distributions representing the
incoming messages, and the associated outgoing messages. The kernel approach
has two main advantages: first, it is fast, as it is implemented using a novel
twolayer random feature representation of the input message distributions;
second, it has principled uncertainty estimates, and can be cheaply updated
online, meaning it can request and incorporate new training data when it
encounters inputs on which it is uncertain. In experiments, our approach is
able to solve learning problems where a single message operator is required for
multiple, substantially different data sets (logistic regression for a variety
of classification problems), where it is essential to accurately assess
uncertainty and to efficiently and robustly update the message operator.

We focus on the distribution regression problem: regressing to a realvalued
response from a probability distribution. Although there exist a large number
of similarity measures between distributions, very little is known about their
generalization performance in specific learning tasks. Learning problems
formulated on distributions have an inherent twostage sampled difficulty: in
practice only samples from sampled distributions are observable, and one has to
build an estimate on similarities computed between sets of points. To the best
of our knowledge, the only existing method with consistency guarantees for
distribution regression requires kernel density estimation as an intermediate
step (which suffers from slow convergence issues in high dimensions), and the
domain of the distributions to be compact Euclidean. In this paper, we provide
theoretical guarantees for a remarkably simple algorithmic alternative to solve
the distribution regression problem: embed the distributions to a reproducing
kernel Hilbert space, and learn a ridge regressor from the embeddings to the
outputs. Our main contribution is to prove the consistency of this technique in
the twostage sampled setting under mild conditions (on separable, topological
domains endowed with kernels). For a given total number of observations, we
derive convergence rates as an explicit function of the problem difficulty. As
a special case, we answer a 15yearold open question: we establish the
consistency of the classical set kernel [Haussler, 1999; Gartner et. al, 2002]
in regression, and cover more recent kernels on distributions, including those
due to [Christmann and Steinwart, 2010].

We present ITE (information theoretical estimators) a free and open source,
multiplatform, Matlab/Octave toolbox that is capable of estimating many
different variants of entropy, mutual information, divergence, association
measures, cross quantities, and kernels on distributions. Thanks to its highly
modular design, ITE supports additionally (i) the combinations of the
estimation techniques, (ii) the easy construction and embedding of novel
information theoretical estimators, and (iii) their immediate application in
information theoretical optimization problems. ITE also includes a prototype
application in a central problem class of signal processing, independent
subspace analysis and its extensions.

The aim of this paper is to introduce and study a geometric spectral sequence
in Khovanov homology. The construction was motivated by a similar spectral
sequence from Khovanov homology to Heegaard Floer homology.

Estimation of facial expressions, as spatiotemporal processes, can take
advantage of kernel methods if one considers facial landmark positions and
their motion in 3D space. We applied support vector classification with kernels
derived from dynamic timewarping similarity measures. We achieved over 99%
accuracy  measured by area under ROC curve  using only the 'motion pattern'
of the PCA compressed representation of the marker point vector, the socalled
shape parameters. Beyond the classification of full motion patterns, several
expressions were recognized with over 90% accuracy in as few as 56 frames from
their onset, about 200 milliseconds.

We provide an intergral lift of the combinatorial definition of Heegaard
Floer homology for nice diagrams, and show that the proof of independence using
convenient diagrams adapts to this setting.

Information theoretical measures, such as entropy, mutual information, and
various divergences, exhibit robust characteristics in image registration
applications. However, the estimation of these quantities is computationally
intensive in high dimensions. On the other hand, consistent estimation from
pairwise distances of the sample points is possible, which suits random
projection (RP) based low dimensional embeddings. We adapt the RP technique to
this task by means of a simple ensemble method. To the best of our knowledge,
this is the first distributed, RP based information theoretical image
registration approach. The efficiency of the method is demonstrated through
numerical examples.

Assume that \Gamma_{v_0} is a tree with vertex set Vert(\Gamma_{v_0})={v_0,
v_1,..., v_n}, and with an integral framing (weight) attached to each vertex
except v_0. Assume furthermore that the intersection matrix of
G=\Gamma_{v_0}{v_0} is negative definite. We define a filtration on the chain
complex computing the lattice homology of G and show how to use this
information in computing lattice homology groups of a negative definite graph
we get by attaching some framing to v_0. As a simple application we produce
families of graphs which have arbitrarily many bad vertices for which the
lattice homology groups are shown to be isomorphic to the corresponding
Heegaard Floer homology groups.

We show that the knot lattice homology of a knot in an Lspace is equivalent
to the knot Floer homology of the same knot (viewed these invariants as
filtered chain complexes over the polynomial ring Z/2Z [U]). Suppose that G is
a negative definite plumbing tree which contains a vertex w such that Gw is a
union of rational graphs. Using the identification of knot homologies we show
that for such graphs the lattice homology HF(G)$ is isomorphic to the Heegaard
Floer homology HF^(Y_G) of the corresponding rational homology sphere Y_G.

Using the link surgery formula for Heegaard Floer homology we find a spectral
sequence from the lattice homology of a plumbing tree to the Heegaard Floer
homology of the corresponding 3manifold. This spectral sequence shows that for
graphs with at most two "bad" vertices, the lattice homology is isomorphic to
the Heegaard Floer homology of the underlying 3manifold.