-
Nowadays there is no field research which is not flooded with data. Among the
sciences, Astrophysics has always been driven by the analysis of massive
amounts of data. The development of new and more sophisticated observation
facilities, both ground-based and spaceborne, has led data more and more
complex (Variety), an exponential growth of both data Volume (i.e., in the
order of petabytes), and Velocity in terms of production and transmission.
Therefore, new and advanced processing solutions will be needed to process this
huge amount of data. We investigate some of these solutions, based on machine
learning models as well as tools and architectures for Big Data analysis that
can be exploited in the astrophysical context.
-
Astronomy is undergoing through a methodological revolution triggered by an
unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining &
Exploration Web Application and REsource) is a general purpose, Web-based,
Virtual Observatory compliant, distributed data mining framework specialized in
massive data sets exploration with machine learning methods. We present the
DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the
scientific community to perform data mining and exploratory experiments on
massive data sets, by using a simple web browser. DAMEWARE offers several tools
which can be seen as working environments where to choose data analysis
functionalities such as clustering, classification, regression, feature
extraction etc., together with models and algorithms.
-
Astronomy is undergoing through a methodological revolution triggered by an
unprecedented wealth of complex and accurate data. The new panchromatic,
synoptic sky surveys require advanced tools for discovering patterns and trends
hidden behind data which are both complex and of high dimensionality. We
present DAMEWARE (DAta Mining & Exploration Web Application REsource): a
general purpose, web-based, distributed data mining environment developed for
the exploration of large datasets, and finely tuned for astronomical
applications. By means of graphical user interfaces, it allows the user to
perform classification, regression or clustering tasks with machine learning
methods. Salient features of DAMEWARE include its capability to work on large
datasets with minimal human intervention, and to deal with a wide variety of
real problems such as the classification of globular clusters in the galaxy
NGC1399, the evaluation of photometric redshifts and, finally, the
identification of candidate Active Galactic Nuclei in multiband photometric
surveys. In all these applications, DAMEWARE allowed to achieve better results
than those attained with more traditional methods. With the aim of providing
potential users with all needed information, in this paper we briefly describe
the technological background of DAMEWARE, give a short introduction to some
relevant aspects of data mining, followed by a summary of some science cases
and, finally, we provide a detailed description of a template use case.
-
We present a multi-purpose genetic algorithm, designed and implemented with
GPGPU / CUDA parallel computing technology. The model was derived from our CPU
serial implementation, named GAME (Genetic Algorithm Model Experiment). It was
successfully tested and validated on the detection of candidate Globular
Clusters in deep, wide-field, single band HST images. The GPU version of GAME
will be made available to the community by integrating it into the web
application DAMEWARE (DAta Mining Web Application REsource
(http://dame.dsf.unina.it/beta_info.html), a public data mining service
specialized on massive astrophysical data. Since genetic algorithms are
inherently parallel, the GPGPU computing paradigm leads to a speedup of a
factor of 200x in the training phase with respect to the CPU based version.
-
We present a multi-purpose genetic algorithm, designed and implemented with
GPGPU / CUDA parallel computing technology. The model was derived from a
multi-core CPU serial implementation, named GAME, already scientifically
successfully tested and validated on astrophysical massive data classification
problems, through a web application resource (DAMEWARE), specialized in data
mining based on Machine Learning paradigms. Since genetic algorithms are
inherently parallel, the GPGPU computing paradigm has provided an exploit of
the internal training features of the model, permitting a strong optimization
in terms of processing performances and scalability.
-
Nowadays, many scientific areas share the same need of being able to deal
with massive and distributed datasets and to perform on them complex knowledge
extraction tasks. This simple consideration is behind the international efforts
to build virtual organizations such as, for instance, the Virtual Observatory
(VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose,
Web-based, VObs compliant, distributed data mining infrastructure specialized
in Massive Data Sets exploration with machine learning methods. Initially fine
tuned to deal with astronomical data only, DAME has evolved in a general
purpose platform which has found applications also in other domains of human
endeavor. We present the products and a short outline of a science case,
together with a detailed description of main features available in the beta
release of the web application now released.