At the heart of experimental high energy physics (HEP) is the development of
facilities and instrumentation that provide sensitivity to new phenomena. Our
understanding of nature at its most fundamental level is advanced through the
analysis and interpretation of data from sophisticated detectors in HEP
experiments. The goal of data analysis systems is to realize the maximum
possible scientific potential of the data within the constraints of computing
and human resources in the least time. To achieve this goal, future analysis
systems should empower physicists to access the data with a high level of
interactivity, reproducibility and throughput capability. As part of the HEP
Software Foundation Community White Paper process, a working group on Data
Analysis and Interpretation was formed to assess the challenges and
opportunities in HEP data analysis and develop a roadmap for activities in this
area over the next decade. In this report, the key findings and recommendations
of the Data Analysis and Interpretation Working Group are presented.
Particle physics has an ambitious and broad experimental programme for the
coming decades. This programme requires large investments in detector hardware,
either to build new facilities and experiments, or to upgrade existing ones.
Similarly, it requires commensurate investment in the R&D of software to
acquire, manage, process, and analyse the shear amounts of data to be recorded.
In planning for the HL-LHC in particular, it is critical that all of the
collaborating stakeholders agree on the software goals and priorities, and that
the efforts complement each other. In this spirit, this white paper describes
the R&D activities required to prepare for this software upgrade.
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2
hardware at the following sites: the California Institute of Technology, Fermi
National Accelerator Laboratory, the University of California at San Diego, and
the University of Florida at Gainesville. The IGT runs jobs using the Globus
Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is
managed using VO management scripts from the European Data Grid (EDG). Gridwide
monitoring is accomplished using local tools such as Ganglia interfaced into
the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa.
Domain specific software is packaged and installed using the Distrib ution
After Release (DAR) tool of CMS, while middleware under the auspices of the
Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us
two month span in Fall of 2002, over 1 million official CMS GEANT based Monte
Carlo events were generated and returned to CERN for analysis while being
demonstrated at SC2002. In this paper, we describe the process that led to one
of the world's first continuously available, functioning grids.