• Recent developments in formal verification have identified approximate liftings (also known as approximate couplings) as a clean, compositional abstraction for proving differential privacy. This construction can be defined in two styles. Earlier definitions require the existence of one or more witness distributions, while a recent definition by Sato uses universal quantification over all sets of samples. These notions have each have their own strengths: the universal version is more general than the existential ones, while existential liftings are known to satisfy more precise composition principles. We propose a novel, existential version of approximate lifting, called $\star$-lifting, and show that it is equivalent to Sato's construction for discrete probability measures. Our work unifies all known notions of approximate lifting, yielding cleaner properties, more general constructions, and more precise composition theorems for both styles of lifting, enabling richer proofs of differential privacy. We also clarify the relation between existing definitions of approximate lifting, and consider more general approximate liftings based on $f$-divergences.
  • Proof by coupling is a classical technique for proving properties about pairs of randomized algorithms by carefully relating (or coupling) two probabilistic executions. In this paper, we show how to automatically construct such proofs for probabilistic programs. First, we present f-coupled postconditions, an abstraction describing two correlated program executions. Second, we show how properties of f-coupled postconditions can imply various probabilistic properties of the original programs. Third, we demonstrate how to reduce the proof-search problem to a purely logical synthesis problem of the form $\exists f\ldotp \forall X\ldotp \phi$, making probabilistic reasoning unnecessary. We develop a prototype implementation to automatically build coupling proofs for probabilistic properties, including uniformity and independence of program expressions.
  • We tackle the problem of deciding whether two probabilistic programs are equivalent in Probabilistic NetKAT, a formal language for specifying and reasoning about the behavior of packet-switched networks. We show that the problem is decidable for the history-free fragment of the language by developing an effective decision procedure based on stochastic matrices. The main challenge lies in reasoning about iteration, which we address by designing an encoding of the program semantics as a finite-state absorbing Markov chain, whose limiting distribution can be computed exactly. In an extended case study on a real-world data center network, we automatically verify various quantitative properties of interest, including resilience in the presence of failures, by analyzing the Markov chain semantics.
  • Research on deductive verification of probabilistic programs has considered expectation-based logics, where pre- and post-conditions are real-valued functions on states, and assertion-based logics, where pre- and post-conditions are boolean predicates on state distributions. Both approaches have developed over nearly four decades, but they have different standings today. Expectation-based systems have managed to formalize many sophisticated case studies, while assertion-based systems today have more limited expressivity and have targeted simpler examples. We present Ellora, a sound and relatively complete assertion-based program logic, and demonstrate its expressivity by verifying several classical examples of randomized algorithms using an implementation in the EasyCrypt proof assistant. Ellora features new proof rules for loops and adversarial code, and supports richer assertions than existing program logics. We also show that Ellora allows convenient reasoning about complex probabilistic concepts by developing a new program logic for probabilistic independence and distribution law, and then smoothly embedding it into Ellora. Our work demonstrates that the assertion-based approach is not fundamentally limited and suggests that some notions are potentially easier to reason about in assertion-based systems.
  • We develop a semantics framework for verifying recent relaxations of differential privacy: R\'enyi differential privacy and zero-concentrated differential privacy. Both notions require a bound on a particular statistical divergence between two probability distributions. In order to reason about such properties compositionally, we introduce approximate span-liftings, generalizing approximate relational liftings previously developed for standard differential privacy to a more general class of divergences, and to continuous distributions. To enable verification of possibly non-terminating programs, our framework supports generalized divergences between subprobability measures. As a concrete application, we use approximate span-liftings to develop a program logic that can prove relaxations of differential privacy and other probabilistic properties based on statistical
  • We define Almost Sure Productivity (ASP), a probabilistic generalization of the productivity condition for coinductively defined structures. Intuitively, a probabilistic coinductive stream or tree is ASP if it produces infinitely many outputs with probability 1. Formally, we define almost sure productivity using a final coalgebra semantics of programs inspired from Kerstan and K\"onig. Then, we introduce a core language for probabilistic streams and trees, and provide two approaches to verify ASP: a sufficient syntactic criterion, and a reduction to model-checking pCTL* formulas on probabilistic pushdown automata. The reduction shows that ASP is decidable for our core language.
  • Differential privacy has emerged as a promising probabilistic formulation of privacy, generating intense interest within academia and industry. We present a push-button, automated technique for verifying $\epsilon$-differential privacy of sophisticated randomized algorithms. We make several conceptual, algorithmic, and practical contributions: (i) Inspired by the recent advances on approximate couplings and randomness alignment, we present a new proof technique called coupling strategies, which casts differential privacy proofs as discovering a strategy in a game where we have finite privacy resources to expend. (ii) To discover a winning strategy, we present a constraint-based formulation of the problem as solving a set of Horn modulo couplings (HMC) constraints, a novel combination of first-order Horn clauses and probabilistic constraints. (iii) We present a technique for solving HMCconstraints by transforming probabilistic constraints into logical constraints with uninterpreted functions. (iv) Finally, we implement our technique and provide the first automated privacy proofs for a number of challenging algorithms from the differential privacy literature, including Report Noisy Max, the Exponential Mechanism, and the Sparse Vector Mechanism.
  • Program sensitivity, also known as Lipschitz continuity, describes how small changes in a program's input lead to bounded changes in the output. We propose an average notion of program sensitivity for probabilistic programs---expected sensitivity---that averages a distance function over a probabilistic coupling of two output distributions from two nearby inputs. By varying the distance, expected sensitivity captures useful notions of probabilistic function sensitivity, including algorithmic stability of machine learning algorithms and convergence of Markov chains. Furthermore, expected sensitivity satisfies clean compositional properties and is amenable to formal verification. We develop a relational program logic called EpRHL for proving expected sensitivity properties. Our logic features two key ideas. First, relational pre-conditions and post-conditions are expressed using distances, which can be seen as a real-valued generalization of typical boolean-valued (relational) assertions. Second, judgments are interpreted in terms of expectation coupling, a novel, quantitative generalization of probabilistic couplings which supports compositional reasoning. We demonstrate our logic on two classes of examples: formalizing uniform stability for the stochastic gradient algorithm from machine learning, and rapid mixing for a model of asexual population dynamics. We also extend our logic with a transitivity principle for expectation couplings to capture the path coupling proof technique by Bubley and Dyer, and we demonstrate our extension by proving rapid mixing of the Glauber dynamics from statistical physics.
  • Program sensitivity measures how robust a program is to small changes in its input, and is a fundamental notion in domains ranging from differential privacy to cyber-physical systems. A natural way to formalize program sensitivity is in terms of metrics on the input and output spaces, requiring that an $r$-sensitive function map inputs that are at distance $d$ to outputs that are at distance at most $r \cdot d$. Program sensitivity is thus an analogue of Lipschitz continuity for programs. Reed and Pierce introduced Fuzz, a functional language with a linear type system that can express program sensitivity. They show soundness operationally, in the form of a metric preservation property. Inspired by their work, we study program sensitivity and metric preservation from a denotational point of view. In particular, we introduce metric CPOs, a novel semantic structure for reasoning about computation on metric spaces, by endowing CPOs with a compatible notion of distance. This structure is useful for reasoning about metric properties of programs, and specifically about program sensitivity. We demonstrate metric CPOs by giving a model for the deterministic fragment of Fuzz.
  • Proof by coupling is a classical proof technique for establishing probabilistic properties of two probabilistic processes, like stochastic dominance and rapid mixing of Markov chains. More recently, couplings have been investigated as a useful abstraction for formal reasoning about relational properties of probabilistic programs, in particular for modeling reduction-based cryptographic proofs and for verifying differential privacy. In this paper, we demonstrate that probabilistic couplings can be used for verifying non-relational probabilistic properties. Specifically, we show that the program logic pRHL---whose proofs are formal versions of proofs by coupling---can be used for formalizing uniformity and probabilistic independence. We formally verify our main examples using the EasyCrypt proof assistant.
  • In this paper, we develop compositional methods for formally verifying differential privacy for algorithms whose analysis goes beyond the composition theorem. Our methods are based on the observation that differential privacy has deep connections with a generalization of probabilistic couplings, an established mathematical tool for reasoning about stochastic processes. Even when the composition theorem is not helpful, we can often prove privacy by a coupling argument. We demonstrate our methods on two algorithms: the Exponential mechanism and the Above Threshold algorithm, the critical component of the famous Sparse Vector algorithm. We verify these examples in a relational program logic apRHL+, which can construct approximate couplings. This logic extends the existing apRHL logic with more general rules for the Laplace mechanism and the one-sided Laplace mechanism, and new structural rules enabling pointwise reasoning about privacy; all the rules are inspired by the connection with coupling. While our paper is presented from a formal verification perspective, we believe that its main insight is of independent interest for the differential privacy community.
  • In mechanism design, the gold standard solution concepts are dominant strategy incentive compatibility and Bayesian incentive compatibility. These solution concepts relieve the (possibly unsophisticated) bidders from the need to engage in complicated strategizing. While incentive properties are simple to state, their proofs are specific to the mechanism and can be quite complex. This raises two concerns. From a practical perspective, checking a complex proof can be a tedious process, often requiring experts knowledgeable in mechanism design. Furthermore, from a modeling perspective, if unsophisticated agents are unconvinced of incentive properties, they may strategize in unpredictable ways. To address both concerns, we explore techniques from computer-aided verification to construct formal proofs of incentive properties. Because formal proofs can be automatically checked, agents do not need to manually check the properties, or even understand the proof. To demonstrate, we present the verification of a sophisticated mechanism: the generic reduction from Bayesian incentive compatible mechanism design to algorithm design given by Hartline, Kleinberg, and Malekian. This mechanism presents new challenges for formal verification, including essential use of randomness from both the execution of the mechanism and from the prior type distributions. As an immediate consequence, our work also formalizes Bayesian incentive compatibility for the entire family of mechanisms derived via this reduction. Finally, as an intermediate step in our formalization, we provide the first formal verification of incentive compatibility for the celebrated Vickrey-Clarke-Groves mechanism.
  • Couplings are a powerful mathematical tool for reasoning about pairs of probabilistic processes. Recent developments in formal verification identify a close connection between couplings and pRHL, a relational program logic motivated by applications to provable security, enabling formal construction of couplings from the probability theory literature. However, existing work using pRHL merely shows existence of a coupling and does not give a way to prove quantitative properties about the coupling, which are need to reason about mixing and convergence of probabilistic processes. Furthermore, pRHL is inherently incomplete, and is not able to capture some advanced forms of couplings such as shift couplings. We address both problems as follows. First, we define an extension of pRHL, called xpRHL, which explicitly constructs the coupling in a pRHL derivation in the form of a probabilistic product program that simulates two correlated runs of the original program. Existing verification tools for probabilistic programs can then be directly applied to the probabilistic product to prove quantitative properties of the coupling. Second, we equip pRHL with a new rule for while loops, where reasoning can freely mix synchronized and unsynchronized loop iterations. Our proof rule can capture examples of shift couplings, and the logic is relatively complete for deterministic programs. We show soundness of xpRHL and use it to analyze two classes of examples. First, we verify rapid mixing using different tools from coupling: standard coupling, shift coupling, and path coupling, a compositional principle for combining local couplings into a global coupling. Second, we verify (approximate) equivalence between a source and an optimized program for several instances of loop optimizations from the literature.
  • We consider a private variant of the classical allocation problem: given k goods and n agents with individual, private valuation functions over bundles of goods, how can we partition the goods amongst the agents to maximize social welfare? An important special case is when each agent desires at most one good, and specifies her (private) value for each good: in this case, the problem is exactly the maximum-weight matching problem in a bipartite graph. Private matching and allocation problems have not been considered in the differential privacy literature, and for good reason: they are plainly impossible to solve under differential privacy. Informally, the allocation must match agents to their preferred goods in order to maximize social welfare, but this preference is exactly what agents wish to hide. Therefore, we consider the problem under the relaxed constraint of joint differential privacy: for any agent i, no coalition of agents excluding i should be able to learn about the valuation function of agent i. In this setting, the full allocation is no longer published---instead, each agent is told what good to get. We first show that with a small number of identical copies of each good, it is possible to efficiently and accurately solve the maximum weight matching problem while guaranteeing joint differential privacy. We then consider the more general allocation problem, when bidder valuations satisfy the gross substitutes condition. Finally, we prove that the allocation problem cannot be solved to non-trivial accuracy under joint differential privacy without requiring multiple copies of each type of good.
  • Differential privacy is a promising formal approach to data privacy, which provides a quantitative bound on the privacy cost of an algorithm that operates on sensitive information. Several tools have been developed for the formal verification of differentially private algorithms, including program logics and type systems. However, these tools do not capture fundamental techniques that have emerged in recent years, and cannot be used for reasoning about cutting-edge differentially private algorithms. Existing techniques fail to handle three broad classes of algorithms: 1) algorithms where privacy depends accuracy guarantees, 2) algorithms that are analyzed with the advanced composition theorem, which shows slower growth in the privacy cost, 3) algorithms that interactively accept adaptive inputs. We address these limitations with a new formalism extending apRHL, a relational program logic that has been used for proving differential privacy of non-interactive algorithms, and incorporating aHL, a (non-relational) program logic for accuracy properties. We illustrate our approach through a single running example, which exemplifies the three classes of algorithms and explores new variants of the Sparse Vector technique, a well-studied algorithm from the privacy literature. We implement our logic in EasyCrypt, and formally verify privacy. We also introduce a novel coupling technique called \emph{optimal subset coupling} that may be of independent interest.
  • We present PrivInfer, an expressive framework for writing and verifying differentially private Bayesian machine learning algorithms. Programs in PrivInfer are written in a rich functional probabilistic programming language with constructs for performing Bayesian inference. Then, differential privacy of programs is established using a relational refinement type system, in which refinements on probability types are indexed by a metric on distributions. Our framework leverages recent developments in Bayesian inference, probabilistic programming languages, and in relational refinement types. We demonstrate the expressiveness of PrivInfer by verifying privacy for several examples of private Bayesian inference.
  • Walrasian equilibrium prices can be said to coordinate markets: They support a welfare optimal allocation in which each buyer is buying bundle of goods that is individually most preferred. However, this clean story has two caveats. First, the prices alone are not sufficient to coordinate the market, and buyers may need to select among their most preferred bundles in a coordinated way to find a feasible allocation. Second, we don't in practice expect to encounter exact equilibrium prices tailored to the market, but instead only approximate prices, somehow encoding "distributional" information about the market. How well do prices work to coordinate markets when tie-breaking is not coordinated, and they encode only distributional information? We answer this question. First, we provide a genericity condition such that for buyers with Matroid Based Valuations, overdemand with respect to equilibrium prices is at most 1, independent of the supply of goods, even when tie-breaking is done in an uncoordinated fashion. Second, we provide learning-theoretic results that show that such prices are robust to changing the buyers in the market, so long as all buyers are sampled from the same (unknown) distribution.
  • When analyzing probabilistic computations, a powerful approach is to first find a martingale---an expression on the program variables whose expectation remains invariant---and then apply the optional stopping theorem in order to infer properties at termination time. One of the main challenges, then, is to systematically find martingales. We propose a novel procedure to synthesize martingale expressions from an arbitrary initial expression. Contrary to state-of-the-art approaches, we do not rely on constraint solving. Instead, we use a symbolic construction based on Doob's decomposition. This procedure can produce very complex martingales, expressed in terms of conditional expectations. We show how to automatically generate and simplify these martingales, as well as how to apply the optional stopping theorem to infer properties at termination time. This last step typically involves some simplification steps, and is usually done manually in current approaches. We implement our techniques in a prototype tool and demonstrate our process on several classical examples. Some of them go beyond the capability of current semi-automatic approaches.
  • We propose a probabilistic Hoare logic aHL based on the union bound, a tool from basic probability theory. While the union bound is simple, it is an extremely common tool for analyzing randomized algorithms. In formal verification terms, the union bound allows flexible and compositional reasoning over possible ways an algorithm may go wrong. It also enables a clean separation between reasoning about probabilities and reasoning about events, which are expressed as standard first-order formulas in our logic. Notably, assertions in our logic are non-probabilistic, even though we can conclude probabilistic facts from the judgments. Our logic can also prove accuracy properties for interactive programs, where the program must produce intermediate outputs as soon as pieces of the input arrive, rather than accessing the entire input at once. This setting also enables adaptivity, where later inputs may depend on earlier intermediate outputs. We show how to prove accuracy for several examples from the differential privacy literature, both interactive and non-interactive.
  • We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example, our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.
  • Probabilistic coupling is a powerful tool for analyzing pairs of probabilistic processes. Roughly, coupling two processes requires finding an appropriate witness process that models both processes in the same probability space. Couplings are powerful tools proving properties about the relation between two processes, include reasoning about convergence of distributions and stochastic dominance---a probabilistic version of a monotonicity property. While the mathematical definition of coupling looks rather complex and cumbersome to manipulate, we show that the relational program logic pRHL---the logic underlying the EasyCrypt cryptographic proof assistant---already internalizes a generalization of probabilistic coupling. With this insight, constructing couplings is no harder than constructing logical proofs. We demonstrate how to express and verify classic examples of couplings in pRHL, and we mechanically verify several couplings in EasyCrypt.
  • We investigate the problem of heterogeneous task assignment in crowdsourcing markets from the point of view of the requester, who has a collection of tasks. Workers arrive online one by one, and each declare a set of feasible tasks they can solve, and desired payment for each feasible task. The requester must decide on the fly which task (if any) to assign to the worker, while assigning workers only to feasible tasks. The goal is to maximize the number of assigned tasks with a fixed overall budget. We provide an online algorithm for this problem and prove an upper bound on the competitive ratio of this algorithm against an arbitrary (possibly worst-case) sequence of workers who want small payments relative to the requester's total budget. We further show an almost matching lower bound on the competitive ratio of any algorithm in this setting. Finally, we propose a different algorithm that achieves an improved competitive ratio in the random permutation model, where the order of arrival of the workers is chosen uniformly at random. Apart from these strong theoretical guarantees, we carry out experiments on simulated data which demonstrates the practical applicability of our algorithms.
  • Recent works have shown the power of linear indexed type systems for enforcing complex program properties. These systems combine linear types with a language of type-level indices, allowing more fine-grained analyses. Such systems have been fruitfully applied in diverse domains, including implicit complexity and differential privacy. A natural way to enhance the expressiveness of this approach is by allowing the indices to depend on runtime information, in the spirit of dependent types. This approach is used in DFuzz, a language for differential privacy. The DFuzz type system relies on an index language supporting real and natural number arithmetic over constants and variables. Moreover, DFuzz uses a subtyping mechanism to make types more flexible. By themselves, linearity, dependency, and subtyping each require delicate handling when performing type checking or type inference; their combination increases this challenge substantially, as the features can interact in non-trivial ways. In this paper, we study the type-checking problem for DFuzz. We show how we can reduce type checking for (a simple extension of) DFuzz to constraint solving over a first-order theory of naturals and real numbers which, although undecidable, can often be handled in practice by standard numeric solvers.
  • In this paper, we give efficient algorithms and lower bounds for solving the heavy hitters problem while preserving differential privacy in the fully distributed local model. In this model, there are n parties, each of which possesses a single element from a universe of size N. The heavy hitters problem is to find the identity of the most common element shared amongst the n parties. In the local model, there is no trusted database administrator, and so the algorithm must interact with each of the $n$ parties separately, using a differentially private protocol. We give tight information-theoretic upper and lower bounds on the accuracy to which this problem can be solved in the local model (giving a separation between the local model and the more common centralized model of privacy), as well as computationally efficient algorithms even in the case where the data universe N may be exponentially large.
  • In this paper we present an extremely general method for approximately solving a large family of convex programs where the solution can be divided between different agents, subject to joint differential privacy. This class includes multi-commodity flow problems, general allocation problems, and multi-dimensional knapsack problems, among other examples. The accuracy of our algorithm depends on the \emph{number} of constraints that bind between individuals, but crucially, is \emph{nearly independent} of the number of primal variables and hence the number of agents who make up the problem. As the number of agents in a problem grows, the error we introduce often becomes negligible. We also consider the setting where agents are strategic and have preferences over their part of the solution. For any convex program in this class that maximizes \emph{social welfare}, there is a generic reduction that makes the corresponding optimization \emph{approximately dominant strategy truthful} by charging agents prices for resources as a function of the approximately optimal dual variables, which are themselves computed under differential privacy. Our results substantially expand the class of problems that are known to be solvable under both privacy and incentive constraints.