• This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.
  • Tensor regression networks achieve high rate of compression of model parameters in multilayer perceptrons (MLP) while having slight impact on performances. Tensor regression layer imposes low-rank constraints on the tensor regression layer which replaces the flattening operation of traditional MLP. We investigate tensor regression networks using various low-rank tensor approximations, aiming to leverage the multi-modal structure of high dimensional data by enforcing efficient low-rank constraints. We provide a theoretical analysis giving insights on the choice of the rank parameters. We evaluated performance of proposed model with state-of-the-art deep convolutional models. For CIFAR-10 dataset, we achieved the compression rate of 0.018 with the sacrifice of accuracy less than 1%.
  • Weighted finite automata (WFA) can expressively model functions defined over strings. However, a WFA is inherently a linear model. In this paper, we propose a neural network based nonlinear WFA model along with a learning algorithm. Our learning algorithm performs a nonlinear decomposition of the so-called Hankel matrix (using an encode decoder neural network) from which the transition operators of the model are recovered. We assessed the performance of the proposed model in a simulation study.
  • This paper proposes an efficient algorithm (HOLRR) to handle regression tasks where the outputs have a tensor structure. We formulate the regression problem as the minimization of a least square criterion under a multilinear rank constraint, a difficult non convex problem. HOLRR computes efficiently an approximate solution of this problem, with solid theoretical guarantees. A kernel extension is also presented. Experiments on synthetic and real data show that HOLRR outperforms multivariate and multilinear regression methods and is considerably faster than existing tensor methods.
  • We describe a technique to minimize weighted tree automata (WTA), a powerful formalisms that subsumes probabilistic context-free grammars (PCFGs) and latent-variable PCFGs. Our method relies on a singular value decomposition of the underlying Hankel matrix defined by the WTA. Our main theoretical result is an efficient algorithm for computing the SVD of an infinite Hankel matrix implicitly represented as a WTA. We provide an analysis of the approximation error induced by the minimization, and we evaluate our method on real-world data originating in newswire treebank. We show that the model achieves lower perplexity than previous methods for PCFG minimization, and also is much more stable due to the absence of local optima.
  • We introduce the notion of Hypergraph Weighted Model (HWM) that generically associates a tensor network to a hypergraph and then computes a value by tensor contractions directed by its hyperedges. A series r defined on a hypergraph family is said to be recognizable if there exists a HWM that computes it. This model generalizes the notion of rational series on strings and trees. We prove some properties of the model and study at which conditions finite support series are recognizable.
  • This work considers the problem of estimating the parameters of negative mixture models, i.e. mixture models that possibly involve negative weights. The contributions of this paper are as follows. (i) We show that every rational probability distributions on strings, a representation which occurs naturally in spectral learning, can be computed by a negative mixture of at most two probabilistic automata (or HMMs). (ii) We propose a method to estimate the parameters of negative mixture models having a specific tensor structure in their low order observable moments. Building upon a recent paper on tensor decompositions for learning latent variable models, we extend this work to the broader setting of tensors having a symmetric decomposition with positive and negative weights. We introduce a generalization of the tensor power method for complex valued tensors, and establish theoretical convergence guarantees. (iii) We show how our approach applies to negative Gaussian mixture models, for which we provide some experiments.