
Particle physics has an ambitious and broad experimental programme for the
coming decades. This programme requires large investments in detector hardware,
either to build new facilities and experiments, or to upgrade existing ones.
Similarly, it requires commensurate investment in the R&D of software to
acquire, manage, process, and analyse the shear amounts of data to be recorded.
In planning for the HLLHC in particular, it is critical that all of the
collaborating stakeholders agree on the software goals and priorities, and that
the efforts complement each other. In this spirit, this white paper describes
the R&D activities required to prepare for this software upgrade.

Recently, Cilleruelo, Luca, & Baxter proved, for all bases b >= 5, that every
natural number is the sum of at most 3 natural numbers whose baseb
representation is a palindrome. However, the cases b = 2, 3, 4 were left
unresolved.
We prove, using a decision procedure based on automata, that every natural
number is the sum of at most 4 natural numbers whose base2 representation is a
palindrome. Here the constant 4 is optimal. We obtain similar results for bases
3 and 4, thus completely resolving the problem.
We consider some other variations on this problem, and prove similar results.
We argue that heavily casebased proofs are a good signal that a decision
procedure may help to automate the proof.

Using a novel rewriting problem, we show that several natural decision
problems about finite automata are undecidable (i.e., recursively unsolvable).
In contrast, we also prove three related problems are decidable. We apply one
result to prove the undecidability of a related problem about kautomatic sets
of rational numbers.

In the classic problem of sequence prediction, a predictor receives a
sequence of values from an emitter and tries to guess the next value before it
appears. The predictor masters the emitter if there is a point after which all
of the predictor's guesses are correct. In this paper we consider the case in
which the predictor is an automaton and the emitted values are drawn from a
finite set; i.e., the emitted sequence is an infinite word. We examine the
predictive capabilities of finite automata, pushdown automata, stack automata
(a generalization of pushdown automata), and multihead finite automata. We
relate our predicting automata to purely periodic words, ultimately periodic
words, and multilinear words, describing novel prediction algorithms for
mastering these sequences.

Data from High Energy Physics (HEP) experiments are collected with
significant financial and human effort and are mostly unique. An
interexperimental study group on HEP data preservation and longterm analysis
was convened as a panel of the International Committee for Future Accelerators
(ICFA). The group was formed by large colliderbased experiments and
investigated the technical and organizational aspects of HEP data preservation.
An intermediate report was released in November 2009 addressing the general
issues of data preservation in HEP and an extended blueprint paper was
published in 2012. In July 2014 the DPHEP collaboration was formed as a result
of the signature of the Collaboration Agreement by seven large funding agencies
(others have since joined or are in the process of acquisition) and in June
2015 the first DPHEP Collaboration Workshop and Collaboration Board meeting
took place.
This status report of the DPHEP collaboration details the progress during the
period from 2013 to 2015 inclusive.

We characterize the infinite words determined by indexed languages. An
infinite language $L$ determines an infinite word $\alpha$ if every string in
$L$ is a prefix of $\alpha$. If $L$ is regular or contextfree, it is known
that $\alpha$ must be ultimately periodic. We show that if $L$ is an indexed
language, then $\alpha$ is a morphic word, i.e., $\alpha$ can be generated by
iterating a morphism under a coding. Since the other direction, that every
morphic word is determined by some indexed language, also holds, this implies
that the infinite words determined by indexed languages are exactly the morphic
words. To obtain this result, we prove a new pumping lemma for the indexed
languages, which may be of independent interest.

Data from highenergy physics (HEP) experiments are collected with
significant financial and human effort and are mostly unique. An
interexperimental study group on HEP data preservation and longterm analysis
was convened as a panel of the International Committee for Future Accelerators
(ICFA). The group was formed by large colliderbased experiments and
investigated the technical and organisational aspects of HEP data preservation.
An intermediate report was released in November 2009 addressing the general
issues of data preservation in HEP. This paper includes and extends the
intermediate report. It provides an analysis of the research case for data
preservation and a detailed description of the various projects at experiment,
laboratory and international levels. In addition, the paper provides a concrete
proposal for an international organisation in charge of the data management and
policies in highenergy physics.

Having built up Linux clusters to more than 1000 nodes over the past five
years, we already have practical experience confronting some of the LHC scale
computing challenges: scalability, automation, hardware diversity, security,
and rolling OS upgrades. This paper describes the tools and processes we have
implemented, working in close collaboration with the EDG project [1],
especially with the WP4 subtask, to improve the manageability of our clusters,
in particular in the areas of system installation, configuration, and
monitoring. In addition to the purely technical issues, providing shared
interactive and batch services which can adapt to meet the diverse and changing
requirements of our users is a significant challenge. We describe the
developments and tuning that we have introduced on our LSF based systems to
maximise both responsiveness to users and overall system utilisation. Finally,
this paper will describe the problems we are facing in enlarging our
heterogeneous Linux clusters, the progress we have made in dealing with the
current issues and the steps we are taking to gridify the clusters