• In this paper, we study a new type of spatial sparse recovery problem, that is to infer the fine-grained spatial distribution of certain density data in a region only based on the aggregate observations recorded for each of its subregions. One typical example of this spatial sparse recovery problem is to infer spatial distribution of cellphone activities based on aggregate mobile traffic volumes observed at sparsely scattered base stations. We propose a novel Constrained Spatial Smoothing (CSS) approach, which exploits the local continuity that exists in many types of spatial data to perform sparse recovery via finite-element methods, while enforcing the aggregated observation constraints through an innovative use of the ADMM algorithm. We also improve the approach to further utilize additional geographical attributes. Extensive evaluations based on a large dataset of phone call records and a demographical dataset from the city of Milan show that our approach significantly outperforms various state-of-the-art approaches, including Spatial Spline Regression (SSR).
  • Online real-estate information systems such as Zillow and Trulia have gained increasing popularity in recent years. One important feature offered by these systems is the online home price estimate through automated data-intensive computation based on housing information and comparative market value analysis. State-of-the-art approaches model house prices as a combination of a latent land desirability surface and a regression from house features. However, by using uniformly damping kernels, they are unable to handle irregularly shaped regions or capture land value discontinuities within the same region due to the existence of implicit sub-communities, which are common in real-world scenarios. In this paper, we explore the novel application of recent advances in spatial functional analysis to house price modeling and propose the Hierarchical Spatial Functional Model (HSFM), which decomposes house values into land desirability at both the global scale and hidden local scales as well as the feature regression component. We propose statistical learning algorithms based on finite-element spatial functional analysis and spatial constrained clustering to train our model. Extensive evaluations based on housing data in a major Canadian city show that our proposed approach can reduce the mean relative house price estimation error down to 6.60%.
  • We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we 1) need to accurately and quickly extract distinguishable events from massive streams of long text documents that cover diverse topics and contain highly redundant information, and 2) must develop the structures of event stories in an online manner, without repeatedly restructuring previously formed stories, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. We conducted extensive evaluation based on 60 GB of real-world Chinese news data, although our ideas are not language-dependent and can easily be extended to other languages, through detailed pilot user experience studies. The results demonstrate the superior capability of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers, compared to multiple existing algorithm frameworks.
  • Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorization---a technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).
  • We study stochastic algorithms for solving non-convex optimization problems with a convex yet possibly non-smooth regularizer, which find wide applications in many practical machine learning applications. However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for the non-smooth regularized optimization problems is limited, especially when the objective function is non-convex. To fill this gap, in this paper, we propose and analyze asynchronous parallel stochastic proximal gradient (AsynSPG) methods, including a full update version and a block-wise version, for non-convex problems. We establish an ergodic convergence rate of $O(1/\sqrt{K})$ for the proposed AsynSPG, $K$ being the number of updates made on the model, matching the convergence rate currently known for AsynSGD (for smooth problems). To our knowledge, this is the first work that provides convergence rates of asynchronous parallel SPG algorithms for non-convex problems. Furthermore, our results are also the first to prove convergence of any stochastic proximal methods without assuming an increasing batch size or the use of additional variance reduction techniques. We implement the proposed algorithms on Parameter Server and demonstrate its convergence behavior and near-linear speedup, as the number of workers increases, for sparse learning problems on a real-world dataset.
  • Many machine learning models, including those with non-smooth regularizers, can be formulated as consensus optimization problems, which can be solved by the alternating direction method of multipliers (ADMM). Many recent efforts have been made to develop asynchronous distributed ADMM to handle large amounts of training data. However, all existing asynchronous distributed ADMM methods are based on full model updates and require locking all global model parameters to handle concurrency, which essentially serializes the updates from different workers. In this paper, we present a novel block-wise, asynchronous and distributed ADMM algorithm, which allows different blocks of model parameters to be updated in parallel. The lock-free block-wise algorithm may greatly speedup sparse optimization problems, a common scenario in reality, in which most model updates only modify a subset of all decision variables. We theoretically prove the convergence of our proposed algorithm to stationary points for non-convex general form consensus problems with possibly non-smooth regularizers. We implement the proposed ADMM algorithm on the Parameter Server framework and demonstrate its convergence and near-linear speedup performance as the number of workers increases.
  • Identifying the relationship between two text objects is a core research problem underlying many natural language processing tasks. A wide range of deep learning schemes have been proposed for text matching, mainly focusing on sentence matching, question answering or query document matching. We point out that existing approaches do not perform well at matching long documents, which is critical, for example, to AI-based news article understanding and event or story formation. The reason is that these methods either omit or fail to fully utilize complicated semantic structures in long documents. In this paper, we propose a graph approach to text matching, especially targeting long document matching, such as identifying whether two news articles report the same event in the real world, possibly with different narratives. We propose the Concept Interaction Graph to yield a graph representation for a document, with vertices representing different concepts, each being one or a group of coherent keywords in the document, and with edges representing the interactions between different concepts, connected by sentences in the document. Based on the graph representation of document pairs, we further propose a Siamese Encoded Graph Convolutional Network that learns vertex representations through a Siamese neural network and aggregates the vertex features though Graph Convolutional Networks to generate the matching result. Extensive evaluation of the proposed approach based on two labeled news article datasets created at Tencent for its intelligent news products show that the proposed graph approach to long document matching significantly outperforms a wide range of state-of-the-art methods.
  • The cqrReg package for R is the first to introduce a family of robust, high-dimensional regression models for quantile and composite quantile regression, both with and without an adaptive lasso penalty for variable selection. In this paper, we reformulate these quantile regression problems and present the estimators we implement in cqrReg using alternating direction method of multipliers (ADMM), majorize-minimization (MM), and coordinate descent (CD) algorithms. Our new approaches address the lack of publicly-available methods for (composite) quantile regression, both with and without regularization. We demonstrate the need for a variety of algorithms in later simulation studies. For comparison, we also introduce the widely-used interior point (IP) formulation and test our methods against the advanced IP algorithms in the existing quantreg package. Our simulation studies show that each of our methods, particularly MM and CD, excel in different settings such as with large or high-dimensional data sets, respectively, and outperform the methods currently implemented in quantreg. ADMM offers particular promise for future developments in its amenability to parallelization.
  • Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose \emph{expectile matrix factorization} by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.