• Generative models (GMs) such as Generative Adversary Network (GAN) and Variational Auto-Encoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many real-world inference scenarios with non-hierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NP-hard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we've also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results.
  • We present a new method to approximate posterior probabilities of Bayesian Network using Deep Neural Network. Experiment results on several public Bayesian Network datasets shows that Deep Neural Network is capable of learning joint probability distri- bution of Bayesian Network by learning from a few observation and posterior probability distribution pairs with high accuracy. Compared with traditional approximate method likelihood weighting sampling algorithm, our method is much faster and gains higher accuracy in medium sized Bayesian Network. Another advantage of our method is that our method can be parallelled much easier in GPU without extra effort. We also ex- plored the connection between the accuracy of our model and the number of training examples. The result shows that our model saturate as the number of training examples grow and we don't need many training examples to get reasonably good result. Another contribution of our work is that we have shown discriminative model like Deep Neural Network can approximate generative model like Bayesian Network.
  • Stragglers are commonly believed to have a great impact on the performance of big data system. However, the reason to cause straggler is complicated. Previous works mostly focus on straggler detection, schedule level optimization and coarse-grained cause analysis. These methods cannot provide valuable insights to help users optimize their programs. In this paper, we propose BigRoots, a general method incorporating both framework and system features for root-cause analysis of stragglers in big data system. BigRoots considers features from big data framework such as shuffle read/write bytes and JVM garbage collection time, as well as system resource utilization such as CPU, I/O and network, which is able to detect both internal and external root causes of stragglers. We verify BigRoots by injecting high resource utilization across different system components and perform case studies to analyze different workloads in Hibench. The experimental results demonstrate that BigRoots is effective to identify the root cause of stragglers and provide useful guidance for performance optimization.