• Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
• ### Sums of Palindromes: an Approach via Automata(1706.10206)

Aug. 30, 2017 math.CO, math.NT, cs.FL
Recently, Cilleruelo, Luca, & Baxter proved, for all bases b >= 5, that every natural number is the sum of at most 3 natural numbers whose base-b representation is a palindrome. However, the cases b = 2, 3, 4 were left unresolved. We prove, using a decision procedure based on automata, that every natural number is the sum of at most 4 natural numbers whose base-2 representation is a palindrome. Here the constant 4 is optimal. We obtain similar results for bases 3 and 4, thus completely resolving the problem. We consider some other variations on this problem, and prove similar results. We argue that heavily case-based proofs are a good signal that a decision procedure may help to automate the proof.
• ### Undecidability and Finite Automata(1702.01394)

Feb. 27, 2017 cs.FL
Using a novel rewriting problem, we show that several natural decision problems about finite automata are undecidable (i.e., recursively unsolvable). In contrast, we also prove three related problems are decidable. We apply one result to prove the undecidability of a related problem about k-automatic sets of rational numbers.
• ### Prediction of Infinite Words with Automata(1603.02597)

March 8, 2016 cs.LG, cs.FL
In the classic problem of sequence prediction, a predictor receives a sequence of values from an emitter and tries to guess the next value before it appears. The predictor masters the emitter if there is a point after which all of the predictor's guesses are correct. In this paper we consider the case in which the predictor is an automaton and the emitted values are drawn from a finite set; i.e., the emitted sequence is an infinite word. We examine the predictive capabilities of finite automata, pushdown automata, stack automata (a generalization of pushdown automata), and multihead finite automata. We relate our predicting automata to purely periodic words, ultimately periodic words, and multilinear words, describing novel prediction algorithms for mastering these sequences.
• ### Status Report of the DPHEP Collaboration: A Global Effort for Sustainable Data Preservation in High Energy Physics(1512.02019)

Feb. 17, 2016 hep-ex, cs.DL
Data from High Energy Physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organizational aspects of HEP data preservation. An intermediate report was released in November 2009 addressing the general issues of data preservation in HEP and an extended blueprint paper was published in 2012. In July 2014 the DPHEP collaboration was formed as a result of the signature of the Collaboration Agreement by seven large funding agencies (others have since joined or are in the process of acquisition) and in June 2015 the first DPHEP Collaboration Workshop and Collaboration Board meeting took place. This status report of the DPHEP collaboration details the progress during the period from 2013 to 2015 inclusive.
• ### On Infinite Words Determined by Indexed Languages(1406.3373)

June 17, 2014 cs.FL
We characterize the infinite words determined by indexed languages. An infinite language $L$ determines an infinite word $\alpha$ if every string in $L$ is a prefix of $\alpha$. If $L$ is regular or context-free, it is known that $\alpha$ must be ultimately periodic. We show that if $L$ is an indexed language, then $\alpha$ is a morphic word, i.e., $\alpha$ can be generated by iterating a morphism under a coding. Since the other direction, that every morphic word is determined by some indexed language, also holds, this implies that the infinite words determined by indexed languages are exactly the morphic words. To obtain this result, we prove a new pumping lemma for the indexed languages, which may be of independent interest.
• ### Status Report of the DPHEP Study Group: Towards a Global Effort for Sustainable Data Preservation in High Energy Physics(1205.4667)

May 21, 2012 hep-ex, cs.DL
Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organisational aspects of HEP data preservation. An intermediate report was released in November 2009 addressing the general issues of data preservation in HEP. This paper includes and extends the intermediate report. It provides an analysis of the research case for data preservation and a detailed description of the various projects at experiment, laboratory and international levels. In addition, the paper provides a concrete proposal for an international organisation in charge of the data management and policies in high-energy physics.
• ### Installing, Running and Maintaining Large Linux Clusters at CERN(cs/0306058)

June 12, 2003 cs.DC
Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS upgrades. This paper describes the tools and processes we have implemented, working in close collaboration with the EDG project [1], especially with the WP4 subtask, to improve the manageability of our clusters, in particular in the areas of system installation, configuration, and monitoring. In addition to the purely technical issues, providing shared interactive and batch services which can adapt to meet the diverse and changing requirements of our users is a significant challenge. We describe the developments and tuning that we have introduced on our LSF based systems to maximise both responsiveness to users and overall system utilisation. Finally, this paper will describe the problems we are facing in enlarging our heterogeneous Linux clusters, the progress we have made in dealing with the current issues and the steps we are taking to gridify the clusters