When processing large amounts of data, the rate at which reading and writing
can take place is a critical factor. High energy physics data processing
relying on ROOT is no exception. The recent parallelisation of LHC experiments'
software frameworks and the analysis of the ever increasing amount of collision
data collected by experiments further emphasized this issue underlying the need
of increasing the implicit parallelism expressed within the ROOT I/O. In this
contribution we highlight the improvements of the ROOT I/O subsystem which
targeted a satisfactory scaling behaviour in a multithreaded context. The
effect of parallelism on the individual steps which are chained by ROOT to read
and write data, namely (de)compression, (de)serialisation, access to storage
backend, are discussed. Performance measurements are discussed through real
life examples coming from CMS production workflows on traditional server
platforms and highly parallel architectures such as Intel Xeon Phi.
Particle physics has an ambitious and broad experimental programme for the
coming decades. This programme requires large investments in detector hardware,
either to build new facilities and experiments, or to upgrade existing ones.
Similarly, it requires commensurate investment in the R&D of software to
acquire, manage, process, and analyse the shear amounts of data to be recorded.
In planning for the HL-LHC in particular, it is critical that all of the
collaborating stakeholders agree on the software goals and priorities, and that
the efforts complement each other. In this spirit, this white paper describes
the R&D activities required to prepare for this software upgrade.
ROOT is an object-oriented C++ framework conceived in the high-energy physics
(HEP) community, designed for storing and analyzing petabytes of data in an
efficient way. Any instance of a C++ class can be stored into a ROOT file in a
machine-independent compressed binary format. In ROOT the TTree object
container is optimized for statistical data analysis over very large data sets
by using vertical data storage techniques. These containers can span a large
number of files on local disks, the web, or a number of different shared file
systems. In order to analyze this data, the user can chose out of a wide set of
mathematical and statistical functions, including linear algebra classes,
numerical algorithms such as integration and minimization, and various methods
for performing regression analysis (fitting). In particular, ROOT offers
packages for complex data modeling and fitting, as well as multivariate
classification based on machine learning techniques. A central piece in these
analysis tools are the histogram classes which provide binning of one- and
multi-dimensional data. Results can be saved in high-quality graphical formats
like Postscript and PDF or in bitmap formats like JPG or GIF. The result can
also be stored into ROOT macros that allow a full recreation and rework of the
graphics. Users typically create their analysis macros step by step, making use
of the interactive C++ interpreter CINT, while running over small data samples.
Once the development is finished, they can run these macros at full compiled
speed over large data sets, using on-the-fly compilation, or by creating a
stand-alone batch program. Finally, if processing farms are available, the user
can reduce the execution time of intrinsically parallel tasks - e.g. data
mining in HEP - by using PROOF, which will take care of optimally distributing
the work over the available resources in a transparent way.
In this talk we will review the major additions and improvements made to the
ROOT system in the last 18 months and present our plans for future
developments. The additons and improvements range from modifications to the I/O
sub-system to allow users to save and restore objects of classes that have not
been instrumented by special ROOT macros, to the addition of a geometry package
designed for building, browsing, tracking and visualizing detector geometries.
Other improvements include enhancements to the quick analysis sub-system
(TTree::Draw()), the addition of classes that allow inter-file object
references (TRef, TRefArray), better support for templated and STL classes,
amelioration of the Automatic Script Compiler and the incorporation of new
fitting and mathematical tools. Efforts have also been made to increase the
modularity of the ROOT system with the introduction of more abstract interfaces
and the development of a plug-in manager. In the near future, we intend to
continue the development of PROOF and its interfacing with GRID environments.
We plan on providing an interface between Geant3, Geant4 and Fluka and the new
geometry package. The ROOT GUI classes will finally be available on Windows and
we plan to release a GUI inspector and builder. In the last year, ROOT has
drawn the endorsement of additional experiments and institutions. It is now
officially supported by CERN and used as key I/O component by the LCG project.