Particle physics has an ambitious and broad experimental programme for the
coming decades. This programme requires large investments in detector hardware,
either to build new facilities and experiments, or to upgrade existing ones.
Similarly, it requires commensurate investment in the R&D of software to
acquire, manage, process, and analyse the shear amounts of data to be recorded.
In planning for the HL-LHC in particular, it is critical that all of the
collaborating stakeholders agree on the software goals and priorities, and that
the efforts complement each other. In this spirit, this white paper describes
the R&D activities required to prepare for this software upgrade.
PROOF, the Parallel ROOT Facility, is a ROOT-based framework which enables
interactive parallelism for event-based tasks on a cluster of computing nodes.
Although PROOF can be used simply from within a ROOT session with no additional
requirements, deploying and configuring a PROOF cluster used to be not as
straightforward. Recently great efforts have been spent to make the
provisioning of generic PROOF analysis facilities with zero configuration, with
the added advantages of positively affecting both stability and scalability,
making the deployment operations feasible even for the end user. Since a
growing amount of large-scale computing resources are nowadays made available
by Cloud providers in a virtualized form, we have developed the Virtual
PROOF-based Analysis Facility: a cluster appliance combining the solid CernVM
ecosystem and PoD (PROOF on Demand), ready to be deployed on the Cloud and
leveraging some peculiar Cloud features such as elasticity. We will show how
this approach is effective both for sysadmins, who will have little or no
configuration to do to run it on their Clouds, and for the end users, who are
ultimately in full control of their PROOF cluster and can even easily restart
it by themselves in the unfortunate event of a major failure. We will also show
how elasticity leads to a more optimal and uniform usage of Cloud resources.
The aim of the recently EU-funded MammoGrid project is, in the light of
emerging Grid technology, to develop a European-wide database of mammograms
that will be used to develop a set of important healthcare applications and
investigate the potential of this Grid to support effective co-working between
healthcare professionals throughout the EU. The MammoGrid consortium intends to
use a Grid model to enable distributed computing that spans national borders.
This Grid infrastructure will be used for deploying novel algorithms as
software directly developed or enhanced within the project. Using the MammoGrid
clinicians will be able to harness the use of massive amounts of medical image
data to perform epidemiological studies, advanced image processing,
radiographic education and ultimately, tele-diagnosis over communities of
medical "virtual organisations". This is achieved through the use of
Grid-compliant services  for managing (versions of) massively distributed
files of mammograms, for handling the distributed execution of mammograms
analysis software, for the development of Grid-aware algorithms and for the
sharing of resources between multiple collaborating medical centres. All this
is delivered via a novel software and hardware information infrastructure that,
in addition guarantees the integrity and security of the medical data. The
MammoGrid implementation is based on AliEn, a Grid framework developed by the
ALICE Collaboration. AliEn provides a virtual file catalogue that allows
transparent access to distributed data-sets and provides top to bottom
implementation of a lightweight Grid applicable to cases when handling of a
large number of files is required. This paper details the architecture that
will be implemented by the MammoGrid project.
AliEn (ALICE Environment) is a lightweight GRID framework developed by the
Alice Collaboration. When the experiment starts running, it will collect data
at a rate of approximately 2 PB per year, producing O(109) files per year. All
these files, including all simulated events generated during the preparation
phase of the experiment, must be accounted and reliably tracked in the GRID
environment. The backbone of AliEn is a distributed file catalogue, which
associates universal logical file name to physical file names for each dataset
and provides transparent access to datasets independently of physical location.
The file replication and transport is carried out under the control of the File
Transport Broker. In addition, the file catalogue maintains information about
every job running in the system. The jobs are distributed by the Job Resource
Broker that is implemented using a simplified pull (as opposed to traditional
push) architecture. This paper describes the Job and File Transport Resource
Brokers and shows that a similar architecture can be applied to solve both