• Unambiguous automata are nondeterministic automata in which every word has at most one accepting run. In this paper we give a polynomial-time algorithm for model checking discrete-time Markov chains against $\omega$-regular specifications represented as unambiguous automata. We furthermore show that the complexity of this model checking problem lies in NC: the subclass of P comprising those problems solvable in poly-logarithmic parallel time. These complexity bounds match the known bounds for model checking Markov chains against specifications given as deterministic automata, notwithstanding the fact that unambiguous automata can be exponentially more succinct than deterministic automata. We report on an implementation of our procedure, including an experiment in which the implementation is used to model check LTL formulas on Markov chains.
  • Given two labelled Markov decision processes (MDPs), the trace-refinement problem asks whether for all strategies of the first MDP there exists a strategy of the second MDP such that the induced labelled Markov chains are trace-equivalent. We show that this problem is decidable in polynomial time if the second MDP is a Markov chain. The algorithm is based on new results on a particular notion of bisimulation between distributions over the states. However, we show that the general trace-refinement problem is undecidable, even if the first MDP is a Markov chain. Decidability of those problems was stated as open in 2008. We further study the decidability and complexity of the trace-refinement problem provided that the strategies are restricted to be memoryless.
  • We prove results on the decidability and complexity of computing the total variation distance (equivalently, the $L_1$-distance) of hidden Markov models (equivalently, labelled Markov chains). This distance measures the difference between the distributions on words that two hidden Markov models induce. The main results are: (1) it is undecidable whether the distance is greater than a given threshold; (2) approximation is #P-hard and in PSPACE.
  • We study 2-player turn-based perfect-information stochastic games with countably infinite state space. The players aim at maximizing/minimizing the probability of a given event (i.e., measurable set of infinite plays), such as reachability, B\"uchi, omega-regular or more general objectives. These games are known to be weakly determined, i.e., they have value. However, strong determinacy of threshold objectives (given by an event and a threshold $c \in [0,1]$) was open in many cases: is it always the case that the maximizer or the minimizer has a winning strategy, i.e., one that enforces, against all strategies of the other player, that the objective is satisfied with probability $\ge c$ (resp. $< c$)? We show that almost-sure objectives (where $c=1$) are strongly determined. This vastly generalizes a previous result on finite games with almost-sure tail objectives. On the other hand we show that $\ge 1/2$ (co-)B\"uchi objectives are not strongly determined, not even if the game is finitely branching. Moreover, for almost-sure reachability and almost-sure B\"uchi objectives in finitely branching games, we strengthen strong determinacy by showing that one of the players must have a memoryless deterministic (MD) winning strategy.
  • We study countably infinite MDPs with parity objectives, and special cases with a bounded number of colors in the Mostowski hierarchy (including reachability, safety, Buchi and co-Buchi). In finite MDPs there always exist optimal memoryless deterministic (MD) strategies for parity objectives, but this does not generally hold for countably infinite MDPs. In particular, optimal strategies need not exist. For countable infinite MDPs, we provide a complete picture of the memory requirements of optimal (resp., $\epsilon$-optimal) strategies for all objectives in the Mostowski hierarchy. In particular, there is a strong dichotomy between two different types of objectives. For the first type, optimal strategies, if they exist, can be chosen MD, while for the second type optimal strategies require infinite memory. (I.e., for all objectives in the Mostowski hierarchy, if finite-memory randomized strategies suffice then also MD strategies suffice.) Similarly, some objectives admit $\epsilon$-optimal MD-strategies, while for others $\epsilon$-optimal strategies require infinite memory. Such a dichotomy also holds for the subclass of countably infinite MDPs that are finitely branching, though more objectives admit MD-strategies here.
  • We consider the problem of minimising the number of states in a multiplicity tree automaton over the field of rational numbers. We give a minimisation algorithm that runs in polynomial time assuming unit-cost arithmetic. We also show that a polynomial bound in the standard Turing model would require a breakthrough in the complexity of polynomial identity testing by proving that the latter problem is logspace equivalent to the decision version of minimisation. The developed techniques also improve the state of the art in multiplicity word automata: we give an NC algorithm for minimising multiplicity word automata. Finally, we consider the minimal consistency problem: does there exist an automaton with $n$ states that is consistent with a given finite sample of weight-labelled words or trees? We show that this decision problem is complete for the existential theory of the rationals, both for words and for trees of a fixed alphabet rank.
  • Nonnegative matrix factorization (NMF) is the problem of decomposing a given nonnegative $n \times m$ matrix $M$ into a product of a nonnegative $n \times d$ matrix $W$ and a nonnegative $d \times m$ matrix $H$. A longstanding open question, posed by Cohen and Rothblum in 1993, is whether a rational matrix $M$ always has an NMF of minimal inner dimension $d$ whose factors $W$ and $H$ are also rational. We answer this question negatively, by exhibiting a matrix for which $W$ and $H$ require irrational entries.
  • Nonnegative matrix factorization (NMF) is the problem of decomposing a given nonnegative $n \times m$ matrix $M$ into a product of a nonnegative $n \times d$ matrix $W$ and a nonnegative $d \times m$ matrix $H$. Restricted NMF requires in addition that the column spaces of $M$ and $W$ coincide. Finding the minimal inner dimension $d$ is known to be NP-hard, both for NMF and restricted NMF. We show that restricted NMF is closely related to a question about the nature of minimal probabilistic automata, posed by Paz in his seminal 1971 textbook. We use this connection to answer Paz's question negatively, thus falsifying a positive answer claimed in 1974. Furthermore, we investigate whether a rational matrix $M$ always has a restricted NMF of minimal inner dimension whose factors $W$ and $H$ are also rational. We show that this holds for matrices $M$ of rank at most $3$ and we exhibit a rank-$4$ matrix for which $W$ and $H$ require irrational entries.
  • Hidden Markov Chains (HMCs) are commonly used mathematical models of probabilistic systems. They are employed in various fields such as speech recognition, signal processing, and biological sequence analysis. We consider the problem of distinguishing two given HMCs based on an observation sequence that one of the HMCs generates. More precisely, given two HMCs and an observation sequence, a distinguishing algorithm is expected to identify the HMC that generates the observation sequence. Two HMCs are called distinguishable if for every $\varepsilon > 0$ there is a distinguishing algorithm whose error probability is less than $\varepsilon$. We show that one can decide in polynomial time whether two HMCs are distinguishable. Further, we present and analyze two distinguishing algorithms for distinguishable HMCs. The first algorithm makes a decision after processing a fixed number of observations, and it exhibits two-sided error. The second algorithm processes an unbounded number of observations, but the algorithm has only one-sided error. The error probability, for both algorithms, decays exponentially with the number of processed observations. We also provide an algorithm for distinguishing multiple HMCs. Finally, we discuss an application in stochastic runtime verification.
  • Herman's self-stabilisation algorithm, introduced 25 years ago, is a well-studied synchronous randomised protocol for enabling a ring of $N$ processes collectively holding any odd number of tokens to reach a stable state in which a single token remains. Determining the worst-case expected time to stabilisation is the central outstanding open problem about this protocol. It is known that there is a constant $h$ such that any initial configuration has expected stabilisation time at most $h N^2$. Ten years ago, McIver and Morgan established a lower bound of $4/27 \approx 0.148$ for $h$, achieved with three equally-spaced tokens, and conjectured this to be the optimal value of $h$. A series of papers over the last decade gradually reduced the upper bound on $h$, with the present record (achieved in 2014) standing at approximately $0.156$. In this paper, we prove McIver and Morgan's conjecture and establish that $h = 4/27$ is indeed optimal.
  • A cost Markov chain is a Markov chain whose transitions are labelled with non-negative integer costs. A fundamental problem on this model, with applications in the verification of stochastic systems, is to compute information about the distribution of the total cost accumulated in a run. This includes the probability of large total costs, the median cost, and other quantiles. While expectations can be computed in polynomial time, previous work has demonstrated that the computation of cost quantiles is harder but can be done in PSPACE. In this paper we show that cost quantiles in cost Markov chains can be computed in the counting hierarchy, thus providing evidence that computing those quantiles is likely not PSPACE-hard. We obtain this result by exhibiting a tight link to a problem in formal language theory: counting the number of words that are both accepted by a given automaton and have a given Parikh image. Motivated by this link, we comprehensively investigate the complexity of the latter problem. Among other techniques, we rely on the so-called BEST theorem for efficiently computing the number of Eulerian circuits in a directed graph.
  • We show that the Kth largest subset problem and the Kth largest m-tuple problem are in PP and hard for PP under polynomial-time Turing reductions. Several problems from the literature were previously shown NP-hard via reductions from those two problems, and by our main result they become PP-hard as well. We also provide complementary PP-upper bounds for some of them.
  • We study the pattern frequency vector for runs in probabilistic Vector Addition Systems with States (pVASS). Intuitively, each configuration of a given pVASS is assigned one of finitely many patterns, and every run can thus be seen as an infinite sequence of these patterns. The pattern frequency vector assigns to each run the limit of pattern frequencies computed for longer and longer prefixes of the run. If the limit does not exist, then the vector is undefined. We show that for one-counter pVASS, the pattern frequency vector is defined and takes only finitely many values for almost all runs. Further, these values and their associated probabilities can be approximated up to an arbitrarily small relative error in polynomial time. For stable two-counter pVASS, we show the same result, but we do not provide any upper complexity bound. As a byproduct of our study, we discover counterexamples falsifying some classical results about stochastic Petri nets published in the~80s.
  • Tree Buffers (1504.04757)

    May 14, 2015 cs.DS
    In runtime verification, the central problem is to decide if a given program execution violates a given property. In online runtime verification, a monitor observes a program's execution as it happens. If the program being observed has hard real-time constraints, then the monitor inherits them. In the presence of hard real-time constraints it becomes a challenge to maintain enough information to produce error traces, should a property violation be observed. In this paper we introduce a data structure, called tree buffer, that solves this problem in the context of automata-based monitors: If the monitor itself respects hard real-time constraints, then enriching it by tree buffers makes it possible to provide error traces, which are essential for diagnosing defects. We show that tree buffers are also useful in other application domains. For example, they can be used to implement functionality of capturing groups in regular expressions. We prove optimal asymptotic bounds for our data structure, and validate them using empirical data from two sources: regular expression searching through Wikipedia, and runtime verification of execution traces obtained from the DaCapo test suite.
  • Given Markov chains and Markov decision processes (MDPs) whose transitions are labelled with non-negative integer costs, we study the computational complexity of deciding whether the probability of paths whose accumulated cost satisfies a Boolean combination of inequalities exceeds a given threshold. For acyclic Markov chains, we show that this problem is PP-complete, whereas it is hard for the PosSLP problem and in PSPACE for general Markov chains. Moreover, for acyclic and general MDPs, we prove PSPACE- and EXP-completeness, respectively. Our results have direct implications on the complexity of computing reward quantiles in succinctly represented stochastic systems.
  • Labelled Markov chains (LMCs) are widely used in probabilistic verification, speech recognition, computational biology, and many other fields. Checking two LMCs for equivalence is a classical problem subject to extensive studies, while the total variation distance provides a natural measure for the "inequivalence" of two LMCs: it is the maximum difference between probabilities that the LMCs assign to the same event. In this paper we develop a theory of the total variation distance between two LMCs, with emphasis on the algorithmic aspects: (1) we provide a polynomial-time algorithm for determining whether two LMCs have distance 1, i.e., whether they can almost always be distinguished; (2) we provide an algorithm for approximating the distance with arbitrary precision; and (3) we show that the threshold problem, i.e., whether the distance exceeds a given threshold, is NP-hard and hard for the square-root-sum problem. We also make a connection between the total variation distance and Bernoulli convolutions.
  • We consider the state-minimisation problem for weighted and probabilistic automata. We provide a numerically stable polynomial-time minimisation algorithm for weighted automata, with guaranteed bounds on the numerical error when run with floating-point arithmetic. Our algorithm can also be used for "lossy" minimisation with bounded error. We show an application in image compression. In the second part of the paper we study the complexity of the minimisation problem for probabilistic automata. We prove that the problem is NP-hard and in PSPACE, improving a recent EXPTIME-result.
  • We study the qualitative and quantitative zero-reachability problem in probabilistic multi-counter systems. We identify the undecidable variants of the problems, and then we concentrate on the remaining two cases. In the first case, when we are interested in the probability of all runs that visit zero in some counter, we show that the qualitative zero-reachability is decidable in time which is polynomial in the size of a given pMC and doubly exponential in the number of counters. Further, we show that the probability of all zero-reaching runs can be effectively approximated up to an arbitrarily small given error epsilon > 0 in time which is polynomial in log(epsilon), exponential in the size of a given pMC, and doubly exponential in the number of counters. In the second case, we are interested in the probability of all runs that visit zero in some counter different from the last counter. Here we show that the qualitative zero-reachability is decidable and SquareRootSum-hard, and the probability of all zero-reaching runs can be effectively approximated up to an arbitrarily small given error epsilon > 0 (these result applies to pMC satisfying a suitable technical condition that can be verified in polynomial time). The proof techniques invented in the second case allow to construct counterexamples for some classical results about ergodicity in stochastic Petri nets.
  • Basic Parallel Processes (BPPs) are a well-known subclass of Petri Nets. They are the simplest common model of concurrent programs that allows unbounded spawning of processes. In the probabilistic version of BPPs, every process generates other processes according to a probability distribution. We study the decidability and complexity of fundamental qualitative problems over probabilistic BPPs -- in particular reachability with probability 1 of different classes of target sets (e.g. upward-closed sets). Our results concern both the Markov-chain model, where processes are scheduled randomly, and the MDP model, where processes are picked by a scheduler.
  • This paper is concerned with the computational complexity of equivalence and minimisation for automata with transition weights in the field Q of rational numbers. We use polynomial identity testing and the Isolation Lemma to obtain complexity bounds, focussing on the class NC of problems within P solvable in polylogarithmic parallel time. For finite Q-weighted automata, we give a randomised NC procedure that either outputs that two automata are equivalent or returns a word on which they differ. We also give an NC procedure for deciding whether a given automaton is minimal, as well as a randomised NC procedure that minimises an automaton. We consider probabilistic automata with rewards, similar to Markov Decision Processes. For these automata we consider two notions of equivalence: expectation equivalence and distribution equivalence. The former requires that two automata have the same expected reward on each input word, while the latter requires that each input word induce the same distribution on rewards in each automaton. For both notions we give algorithms for deciding equivalence by reduction to equivalence of Q-weighted automata. Finally we show that the equivalence problem for Q-weighted visibly pushdown automata is logspace equivalent to the polynomial identity testing problem.
  • Given a basic process algebra (BPA) and two stack symbols, the BPA bisimilarity problem asks whether the two stack symbols are bisimilar. We show that this problem is EXPTIME-hard.
  • Given two pushdown systems, the bisimilarity problem asks whether they are bisimilar. While this problem is known to be decidable our main result states that it is nonelementary, improving EXPTIME-hardness, which was the previously best known lower bound for this problem. Our lower bound result holds for normed pushdown systems as well.
  • We study the bisimilarity problem for probabilistic pushdown automata (pPDA) and subclasses thereof. Our definition of pPDA allows both probabilistic and non-deterministic branching, generalising the classical notion of pushdown automata (without epsilon-transitions). Our first contribution is a general construction that reduces checking bisimilarity of probabilistic transition systems to checking bisimilarity of non-deterministic transition systems. This construction directly yields decidability of bisimilarity for pPDA, as well as an elementary upper bound for the bisimilarity problem on the subclass of probabilistic basic process algebras, i.e., single-state pPDA. We further show that, with careful analysis, the general reduction can be used to prove an EXPTIME upper bound for bisimilarity of probabilistic visibly pushdown automata. Here we also provide a matching lower bound, establishing EXPTIME-completeness. Finally we prove that deciding bisimilarity of probabilistic one-counter automata, another subclass of pPDA, is PSPACE-complete. Here we use a more specialised argument to obtain optimal complexity bounds.
  • Stochastic branching processes are a classical model for describing random trees, which have applications in numerous fields including biology, physics, and natural language processing. In particular, they have recently been proposed to describe parallel programs with stochastic process creation. In this paper, we consider the problem of model checking stochastic branching process. Given a branching process and a deterministic parity tree automaton, we are interested in computing the probability that the generated random tree is accepted by the automaton. We show that this probability can be compared with any rational number in PSPACE, and with 0 and 1 in polynomial time. In a second part, we suggest a tree extension of the logic PCTL, and develop a PSPACE algorithm for model checking a branching process against a formula of this logic. We also show that the qualitative fragment of this logic can be model checked in polynomial time.
  • We study termination time and recurrence time in programs with unbounded recursion, which are either randomized or operate on some statistically quantified inputs. As the underlying formal model for such programs we use probabilistic pushdown automata (pPDA) which are equivalent to probabilistic recursive state machines. We obtain tail bounds for the distribution of termination time for pPDA. We also study the recurrence time for probabilistic recursive programs that are not supposed to terminate (such as system daemons, network servers, etc.). Typically, such programs react to certain requests generated by their environment, and hence operate in finite request-service cycles. We obtain bounds for the frequency of long request-service cycles.