• ### Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction(1709.02251)

Sept. 4, 2017 cs.CV, cs.LG, cs.MM
Continuous dimensional emotion prediction is a challenging task where the fusion of various modalities usually achieves state-of-the-art performance such as early fusion or late fusion. In this paper, we propose a novel multi-modal fusion strategy named conditional attention fusion, which can dynamically pay attention to different modalities at each time step. Long-short term memory recurrent neural networks (LSTM-RNN) is applied as the basic uni-modality model to capture long time dependencies. The weights assigned to different modalities are automatically decided by the current input features and recent history information rather than being fixed at any kinds of situation. Our experimental results on a benchmark dataset AVEC2015 show the effectiveness of our method which outperforms several common fusion strategies for valence prediction.
• ### Generating Video Descriptions with Topic Guidance(1708.09666)

Sept. 4, 2017 cs.CV, cs.CL
Generating video descriptions in natural language (a.k.a. video captioning) is a more challenging task than image captioning as the videos are intrinsically more complicated than images in two aspects. First, videos cover a broader range of topics, such as news, music, sports and so on. Second, multiple topics could coexist in the same video. In this paper, we propose a novel caption model, topic-guided model (TGM), to generate topic-oriented descriptions for videos in the wild via exploiting topic information. In addition to predefined topics, i.e., category tags crawled from the web, we also mine topics in a data-driven way based on training captions by an unsupervised topic mining model. We show that data-driven topics reflect a better topic schema than the predefined topics. As for testing video topic prediction, we treat the topic mining model as teacher to train the student, the topic prediction model, by utilizing the full multi-modalities in the video especially the speech modality. We propose a series of caption models to exploit topic guidance, including implicitly using the topics as input features to generate words related to the topic and explicitly modifying the weights in the decoder with topics to function as an ensemble of topic-aware language decoders. Our comprehensive experimental results on the current largest video caption dataset MSR-VTT prove the effectiveness of our topic-guided model, which significantly surpasses the winning performance in the 2016 MSR video to language challenge.
• ### Video Captioning with Guidance of Multimodal Latent Topics(1708.09667)

Sept. 2, 2017 cs.CV, cs.CL
• ### Distribution-Free Tests of Independence in High Dimensions(1410.4179)

July 21, 2017 math.ST, stat.TH
We consider the testing of mutual independence among all entries in a $d$-dimensional random vector based on $n$ independent observations. We study two families of distribution-free test statistics, which include Kendall's tau and Spearman's rho as important examples. We show that under the null hypothesis the test statistics of these two families converge weakly to Gumbel distributions, and propose tests that control the type I error in the high-dimensional setting where $d>n$. We further show that the two tests are rate-optimal in terms of power against sparse alternatives, and outperform competitors in simulations, especially when $d$ is large.
• ### The Multivariate Hawkes Process in High Dimensions: Beyond Mutual Excitation(1707.04928)

June 19, 2019 stat.ME
The Hawkes process is a class of point processes whose future depends on their own history. Previous theoretical work on the Hawkes process is limited to a special case in which a past event can only increase the occurrence of future events, and the link function is linear. However, in neuronal networks and other real-world applications, inhibitory relationships may be present, and the link function may be non-linear. In this paper, we develop a new approach for investigating the properties of the Hawkes process without the restriction to mutual excitation or linear link functions. To this end, we employ a thinning process representation and a coupling construction to bound the dependence coefficient of the Hawkes process. Using recent developments on weakly dependent sequences, we establish a concentration inequality for second-order statistics of the Hawkes process. We apply this concentration inequality to cross-covariance analysis in the high-dimensional regime, and we verify the theoretical claims with simulation studies.
• ### Network Reconstruction From High Dimensional Ordinary Differential Equations(1610.03177)

Oct. 11, 2016 stat.ME
We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system non-parametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches.
• ### Selection and Estimation for Mixed Graphical Models(1311.0085)

Aug. 1, 2014 stat.ME
We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different parametric form. In particular, we assume that each node's conditional distribution is in the exponential family. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.