
A seed in a word is a relaxed version of a period in which the occurrences of
the repeating subword may overlap. We show a lineartime algorithm computing a
linearsize representation of all the seeds of a word (the number of seeds
might be quadratic). In particular, one can easily derive the shortest seed and
the number of seeds from our representation. Thus, we solve an open problem
stated in the survey by Smyth (2000) and improve upon a previous O(n log n)
algorithm by Iliopoulos, Moore, and Park (1996). Our approach is based on
combinatorial relations between seeds and subword complexity (used here for the
first time in context of seeds). In the previous papers, the compact
representation of seeds consisted of two independent parts operating on the
suffix tree of the word and the suffix tree of the reverse of the word,
respectively. Our second contribution is a simpler representation of all seeds
which avoids dealing with the reversed word.
A preliminary version of this work, with a much more complex algorithm
constructing the earlier representation of seeds, was presented at the 23rd
Annual ACMSIAM Symposium of Discrete Algorithms (SODA 2012).

For a partial word $w$ the longest common compatible prefix of two positions
$i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k1]\uparrow
w[j,j+k1]$, where $\uparrow$ is the compatibility relation of partial words
(it is not an equivalence relation). The LCCP problem is to preprocess a
partial word in such a way that any query $lccp(i,j)$ about this word can be
answered in $O(1)$ time. It is a natural generalization of the longest common
prefix (LCP) problem for regular words, for which an $O(n)$ preprocessing time
and $O(1)$ query time solution exists.
Recently an efficient algorithm for this problem has been given by F.
BlanchetSadri and J. Lazarow (LATA 2013). The preprocessing time was
$O(nh+n)$, where $h$ is the number of "holes" in $w$. The algorithm was
designed for partial words over a constant alphabet and was quite involved.
We present a simple solution to this problem with slightly better runtime
that works for any linearlysortable alphabet. Our preprocessing is in time
$O(n\mu+n)$, where $\mu$ is the number of blocks of holes in $w$. Our algorithm
uses ideas from alignment algorithms and dynamic programming.

Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted
to Theor. Comp. Sci.) introduced orderpreserving pattern matching. In this
problem we are looking for consecutive substrings of the text that have the
same "shape" as a given pattern. These results include a lineartime
orderpreserving pattern matching algorithm for polynomiallybounded alphabet
and an extension of this result to pattern matching with multiple patterns. We
make one step forward in the analysis and give an
$O(\frac{n\log{n}}{\log\log{n}})$ time randomized algorithm constructing suffix
trees in the orderpreserving setting. We show a number of applications of
orderpreserving suffix trees to identify patterns and repetitions in time
series.

We derive a simple efficient algorithm for Abelian periods knowing all
Abelian squares in a string. An efficient algorithm for the latter problem was
given by Cummings and Smyth in 1997. By the way we show an alternative
algorithm for Abelian squares. We also obtain a linear time algorithm finding
all `long' Abelian periods. The aim of the paper is a (new) reduction of the
problem of all Abelian periods to that of (already solved) all Abelian squares
which provides new insight into both connected problems.

The notion of the cover is a generalization of a period of a string, and
there are linear time algorithms for finding the shortest cover. The seed is a
more complicated generalization of periodicity, it is a cover of a superstring
of a given string, and the shortest seed problem is of much higher algorithmic
difficulty. The problem is not well understood, no linear time algorithm is
known. In the paper we give linear time algorithms for some of its versions 
computing shortest leftseed array, longest leftseed array and checking for
seeds of a given length. The algorithm for the last problem is used to compute
the seed array of a string (i.e., the shortest seeds for all the prefixes of
the string) in $O(n^2)$ time. We describe also a simpler alternative algorithm
computing efficiently the shortest seeds. As a byproduct we obtain an
$O(n\log{(n/m)})$ time algorithm checking if the shortest seed has length at
least $m$ and finding the corresponding seed. We also correct some important
details missing in the previously known shortestseed algorithm (Iliopoulos et
al., 1996).

A run is an inclusion maximal occurrence in a string (as a subinterval) of a
repetition $v$ with a period $p$ such that $2p \le v$. The exponent of a run
is defined as $v/p$ and is $\ge 2$. We show new bounds on the maximal sum of
exponents of runs in a string of length $n$. Our upper bound of $4.1n$ is
better than the best previously known proven bound of $5.6n$ by Crochemore &
Ilie (2008). The lower bound of $2.035n$, obtained using a family of binary
words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the
maximal sum of exponents of runs in a string of length $n$ is smaller than $2n$

We investigate the problem of the maximum number of cubic subwords (of the
form $www$) in a given word. We also consider square subwords (of the form
$ww$). The problem of the maximum number of squares in a word is not well
understood. Several new results related to this problem are produced in the
paper. We consider two simple problems related to the maximum number of
subwords which are squares or which are highly repetitive; then we provide a
nontrivial estimation for the number of cubes. We show that the maximum number
of squares $xx$ such that $x$ is not a primitive word (nonprimitive squares) in
a word of length $n$ is exactly $\lfloor \frac{n}{2}\rfloor  1$, and the
maximum number of subwords of the form $x^k$, for $k\ge 3$, is exactly $n2$.
In particular, the maximum number of cubes in a word is not greater than $n2$
either. Using very technical properties of occurrences of cubes, we improve
this bound significantly. We show that the maximum number of cubes in a word of
length $n$ is between $(1/2)n$ and $(4/5)n$. (In particular, we improve the
lower bound from the conference version of the paper.)

A run is a maximal occurrence of a repetition $v$ with a period $p$ such that
$2p \le v$. The maximal number of runs in a string of length $n$ was studied
by several authors and it is known to be between $0.944 n$ and $1.029 n$. We
investigate highly periodic runs, in which the shortest period $p$ satisfies
$3p \le v$. We show the upper bound $0.5n$ on the maximal number of such runs
in a string of length $n$ and construct a sequence of words for which we obtain
the lower bound $0.406 n$.