• Several studies have been conducted on understanding third-party user tracking on the web. However, web trackers can only track users on sites where they are embedded by the publisher, thus obtaining a fragmented view of a user's online footprint. In this work, we investigate a different form of user tracking, where browser extensions are repurposed to capture the complete online activities of a user and communicate the collected sensitive information to a third-party domain. We conduct an empirical study of spying browser extensions on the Chrome Web Store. First, we present an in-depth analysis of the spying behavior of these extensions. We observe that these extensions steal a variety of sensitive user information, such as the complete browsing history (e.g., the sequence of web traversals), online social network (OSN) access tokens, IP address, and user geolocation. Second, we investigate the potential for automatically detecting spying extensions by applying machine learning schemes. We show that using a Recurrent Neural Network (RNN), the sequences of browser API calls can be a robust feature, outperforming hand-crafted features (used in prior work on malicious extensions) to detect spying extensions. Our RNN based detection scheme achieves a high precision (90.02%) and recall (93.31%) in detecting spying extensions.
  • Major cloud computing operators provide powerful monitoring tools to understand the current (and prior) state of the distributed systems deployed in their infrastructure. While such tools provide a detailed monitoring mechanism at scale, they also pose a significant challenge for the application developers/operators to transform the huge space of monitored metrics into useful insights. These insights are essential to build effective management tools for improving the efficiency, resiliency, and dependability of distributed systems. This paper reports on our experience with building and deploying Sieve - a platform to derive actionable insights from monitored metrics in distributed systems. Sieve builds on two core components: a metrics reduction framework, and a metrics dependency extractor. More specifically, Sieve first reduces the dimensionality of metrics by automatically filtering out unimportant metrics by observing their signal over time. Afterwards, Sieve infers metrics dependencies between distributed components of the system using a predictive-causality model by testing for Granger Causality. We implemented Sieve as a generic platform and deployed it for two microservices-based distributed systems: OpenStack and ShareLatex. Our experience shows that (1) Sieve can reduce the number of metrics by at least an order of magnitude (10 - 100$\times$), while preserving the statistical equivalence to the total number of monitored metrics; (2) Sieve can dramatically improve existing monitoring infrastructures by reducing the associated overheads over the entire system stack (CPU - 80%, storage - 90%, and network - 50%); (3) Lastly, Sieve can be effective to support a wide-range of workflows in distributed systems - we showcase two such workflows: orchestration of autoscaling, and Root Cause Analysis (RCA).
  • Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.