• ### Efficient Dynamic Dictionary Matching with DAWGs and AC-automata(1710.03395)

Feb. 21, 2019 cs.DS
The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem in $O(d\log\sigma)$ preprocessing time and $O(n\log\sigma+occ)$ matching time, where $d$ is the total length of the patterns in $D$, $n$ is the length of the text, $\sigma$ is the alphabet size, and $occ$ is the total number of occurrences of all the patterns in the text. The dynamic dictionary matching is a variant where patterns may dynamically be inserted into and deleted from $D$. This problem is called semi-dynamic dictionary matching if only insertions are allowed. In this paper, we propose two efficient algorithms. For a pattern of length $m$, our first algorithm supports insertions in $O(m\log\sigma+\log d/\log\log d)$ time and pattern matching in $O(n\log\sigma+occ)$ time for the semi-dynamic setting and supports both insertions and deletions in $O(\sigma m+\log d/\log\log d)$ time and pattern matching in $O(n(\log d/\log\log d+\log\sigma)+occ(\log d/\log\log d))$ time for the dynamic setting by some modifications. This algorithm is based on the directed acyclic word graph. Our second algorithm, which is based on the AC-automaton, supports insertions in $O(m\log \sigma+u_f+u_o)$ time for the semi-dynamic setting and supports both insertions and deletions in $O(\sigma m+u_f+u_o)$ time for the dynamic setting, where $u_f$ and $u_o$ respectively denote the numbers of states in which the failure function and the output function need to be updated. This algorithm performs pattern matching in $O(n\log\sigma+occ)$ time for both settings. Our algorithm achieves optimal update time for AC-automaton based methods over constant-size alphabets, since any algorithm which explicitly maintains the AC-automaton requires $\Omega(m+u_f+u_o)$ update time.
• ### Fully-Online Suffix Tree and Directed Acyclic Word Graph Construction for Multiple Texts(1507.07622)

July 12, 2018 cs.DS
We consider construction of the suffix tree and the directed acyclic word graph (DAWG) indexing data structures for a collection $\mathcal{T}$ of texts, where a new symbol may be appended to any text in $\mathcal{T} = \{T_1, \ldots, T_K\}$, at any time. This fully-online scenario, which arises in dynamically indexing multi-sensor data, is a natural generalization of the long solved semi-online text indexing problem, where texts $T_1, \ldots, T_{k}$ are permanently fixed before the next text $T_{k+1}$ is processed for each $1 \leq k < K$. We present fully-online algorithms that construct the suffix tree and the DAWG for $\mathcal{T}$ in $O(N \log \sigma)$ time and $O(N)$ space, where $N$ is the total lengths of the strings in $\mathcal{T}$ and $\sigma$ is their alphabet size. The standard explicit representation of the suffix tree leaf edges and some DAWG edges must be relaxed in our fully-online scenario, since too many updates on these edges are required in the worst case. Instead, we provide access to the updated suffix tree leaf edge labels and the DAWG edges to be redirected via auxiliary data structures, in $O(\log \sigma)$ time per added character.
• ### Efficient computation of longest single-arm-gapped palindromes in a string(1609.03000)

Oct. 31, 2019 cs.DS
In this paper, we introduce new types of approximate palindromes called single-arm-gapped palindromes (shortly SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either $wguc u^R w^R$ or $wuc u^Rgw^R$, where $w$ and $u$ are non-empty strings, $w^R$ and $u^R$ are respectively the reversed strings of $w$ and $u$, $g$ is a string called a gap, and $c$ is either a single character or the empty string. Here we call $wu$ and $u^R w^R$ the arm of the SAGP, and $|uv|$ the length of the arm. We classify SAGPs into two groups: those which have $ucu^R$ as a maximal palindrome (type-1), and the others (type-2). We propose several algorithms to compute type-1 SAGPs with longest arms occurring in a given string, based on suffix arrays. Then, we propose a linear-time algorithm to compute all type-1 SAGPs with longest arms, based on suffix trees. Also, we show how to compute type-2 SAGPs with longest arms in linear time. We also perform some preliminary experiments to show practical performances of the proposed methods.