
This paper provides an analysis of the tradeoff between asymptotic bias
(suboptimality with unlimited data) and overfitting (additional suboptimality
due to limited data) in the context of reinforcement learning with partial
observability. Our theoretical analysis formally characterizes that while
potentially increasing the asymptotic bias, a smaller state representation
decreases the risk of overfitting. This analysis relies on expressing the
quality of a state representation by bounding L1 error terms of the associated
belief states. Theoretical results are empirically illustrated when the state
representation is a truncated history of observations, both on synthetic POMDPs
and on a largescale POMDP in the context of smartgrids, with realworld data.
Finally, similarly to known results in the fully observable setting, we also
briefly discuss and empirically illustrate how using function approximators and
adapting the discount factor may enhance the tradeoff between asymptotic bias
and overfitting in the partially observable context.

Tensor regression networks achieve high rate of compression of model
parameters in multilayer perceptrons (MLP) while having slight impact on
performances. Tensor regression layer imposes lowrank constraints on the
tensor regression layer which replaces the flattening operation of traditional
MLP. We investigate tensor regression networks using various lowrank tensor
approximations, aiming to leverage the multimodal structure of high
dimensional data by enforcing efficient lowrank constraints. We provide a
theoretical analysis giving insights on the choice of the rank parameters. We
evaluated performance of proposed model with stateoftheart deep
convolutional models. For CIFAR10 dataset, we achieved the compression rate of
0.018 with the sacrifice of accuracy less than 1%.

Weighted finite automata (WFA) can expressively model functions defined over
strings. However, a WFA is inherently a linear model. In this paper, we propose
a neural network based nonlinear WFA model along with a learning algorithm. Our
learning algorithm performs a nonlinear decomposition of the socalled Hankel
matrix (using an encode decoder neural network) from which the transition
operators of the model are recovered. We assessed the performance of the
proposed model in a simulation study.

This paper proposes an efficient algorithm (HOLRR) to handle regression tasks
where the outputs have a tensor structure. We formulate the regression problem
as the minimization of a least square criterion under a multilinear rank
constraint, a difficult non convex problem. HOLRR computes efficiently an
approximate solution of this problem, with solid theoretical guarantees. A
kernel extension is also presented. Experiments on synthetic and real data show
that HOLRR outperforms multivariate and multilinear regression methods and is
considerably faster than existing tensor methods.

We describe a technique to minimize weighted tree automata (WTA), a powerful
formalisms that subsumes probabilistic contextfree grammars (PCFGs) and
latentvariable PCFGs. Our method relies on a singular value decomposition of
the underlying Hankel matrix defined by the WTA. Our main theoretical result is
an efficient algorithm for computing the SVD of an infinite Hankel matrix
implicitly represented as a WTA. We provide an analysis of the approximation
error induced by the minimization, and we evaluate our method on realworld
data originating in newswire treebank. We show that the model achieves lower
perplexity than previous methods for PCFG minimization, and also is much more
stable due to the absence of local optima.

We introduce the notion of Hypergraph Weighted Model (HWM) that generically
associates a tensor network to a hypergraph and then computes a value by tensor
contractions directed by its hyperedges. A series r defined on a hypergraph
family is said to be recognizable if there exists a HWM that computes it. This
model generalizes the notion of rational series on strings and trees. We prove
some properties of the model and study at which conditions finite support
series are recognizable.

This work considers the problem of estimating the parameters of negative
mixture models, i.e. mixture models that possibly involve negative weights. The
contributions of this paper are as follows. (i) We show that every rational
probability distributions on strings, a representation which occurs naturally
in spectral learning, can be computed by a negative mixture of at most two
probabilistic automata (or HMMs). (ii) We propose a method to estimate the
parameters of negative mixture models having a specific tensor structure in
their low order observable moments. Building upon a recent paper on tensor
decompositions for learning latent variable models, we extend this work to the
broader setting of tensors having a symmetric decomposition with positive and
negative weights. We introduce a generalization of the tensor power method for
complex valued tensors, and establish theoretical convergence guarantees. (iii)
We show how our approach applies to negative Gaussian mixture models, for which
we provide some experiments.