Particle physics has an ambitious and broad experimental programme for the
coming decades. This programme requires large investments in detector hardware,
either to build new facilities and experiments, or to upgrade existing ones.
Similarly, it requires commensurate investment in the R&D of software to
acquire, manage, process, and analyse the shear amounts of data to be recorded.
In planning for the HL-LHC in particular, it is critical that all of the
collaborating stakeholders agree on the software goals and priorities, and that
the efforts complement each other. In this spirit, this white paper describes
the R&D activities required to prepare for this software upgrade.
Data from High Energy Physics (HEP) experiments are collected with
significant financial and human effort and are mostly unique. An
inter-experimental study group on HEP data preservation and long-term analysis
was convened as a panel of the International Committee for Future Accelerators
(ICFA). The group was formed by large collider-based experiments and
investigated the technical and organizational aspects of HEP data preservation.
An intermediate report was released in November 2009 addressing the general
issues of data preservation in HEP and an extended blueprint paper was
published in 2012. In July 2014 the DPHEP collaboration was formed as a result
of the signature of the Collaboration Agreement by seven large funding agencies
(others have since joined or are in the process of acquisition) and in June
2015 the first DPHEP Collaboration Workshop and Collaboration Board meeting
This status report of the DPHEP collaboration details the progress during the
period from 2013 to 2015 inclusive.
Scientific results in high-energy physics and in many other fields often rely
on complex software stacks. In order to support reproducibility and scrutiny of
the results, it is good practice to use open source software and to cite
software packages and versions. With ever-growing complexity of scientific
software on one side and with IT life-cycles of only a few years on the other
side, however, it turns out that despite source code availability the setup and
the validation of a minimal usable analysis environment can easily become
prohibitively expensive. We argue that there is a substantial gap between
merely having access to versioned source code and the ability to create a data
analysis runtime environment. In order to preserve all the different variants
of the data analysis runtime environment, we developed a snapshotting file
system optimized for software distribution. We report on our experience in
preserving the analysis environment for high-energy physics such as the
software landscape used to discover the Higgs boson at the Large Hadron
PROOF, the Parallel ROOT Facility, is a ROOT-based framework which enables
interactive parallelism for event-based tasks on a cluster of computing nodes.
Although PROOF can be used simply from within a ROOT session with no additional
requirements, deploying and configuring a PROOF cluster used to be not as
straightforward. Recently great efforts have been spent to make the
provisioning of generic PROOF analysis facilities with zero configuration, with
the added advantages of positively affecting both stability and scalability,
making the deployment operations feasible even for the end user. Since a
growing amount of large-scale computing resources are nowadays made available
by Cloud providers in a virtualized form, we have developed the Virtual
PROOF-based Analysis Facility: a cluster appliance combining the solid CernVM
ecosystem and PoD (PROOF on Demand), ready to be deployed on the Cloud and
leveraging some peculiar Cloud features such as elasticity. We will show how
this approach is effective both for sysadmins, who will have little or no
configuration to do to run it on their Clouds, and for the end users, who are
ultimately in full control of their PROOF cluster and can even easily restart
it by themselves in the unfortunate event of a major failure. We will also show
how elasticity leads to a more optimal and uniform usage of Cloud resources.