• Nowadays there is no field research which is not flooded with data. Among the sciences, Astrophysics has always been driven by the analysis of massive amounts of data. The development of new and more sophisticated observation facilities, both ground-based and spaceborne, has led data more and more complex (Variety), an exponential growth of both data Volume (i.e., in the order of petabytes), and Velocity in terms of production and transmission. Therefore, new and advanced processing solutions will be needed to process this huge amount of data. We investigate some of these solutions, based on machine learning models as well as tools and architectures for Big Data analysis that can be exploited in the astrophysical context.
  • Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining & Exploration Web Application and REsource) is a general purpose, Web-based, Virtual Observatory compliant, distributed data mining framework specialized in massive data sets exploration with machine learning methods. We present the DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the scientific community to perform data mining and exploratory experiments on massive data sets, by using a simple web browser. DAMEWARE offers several tools which can be seen as working environments where to choose data analysis functionalities such as clustering, classification, regression, feature extraction etc., together with models and algorithms.
  • Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. The new panchromatic, synoptic sky surveys require advanced tools for discovering patterns and trends hidden behind data which are both complex and of high dimensionality. We present DAMEWARE (DAta Mining & Exploration Web Application REsource): a general purpose, web-based, distributed data mining environment developed for the exploration of large datasets, and finely tuned for astronomical applications. By means of graphical user interfaces, it allows the user to perform classification, regression or clustering tasks with machine learning methods. Salient features of DAMEWARE include its capability to work on large datasets with minimal human intervention, and to deal with a wide variety of real problems such as the classification of globular clusters in the galaxy NGC1399, the evaluation of photometric redshifts and, finally, the identification of candidate Active Galactic Nuclei in multiband photometric surveys. In all these applications, DAMEWARE allowed to achieve better results than those attained with more traditional methods. With the aim of providing potential users with all needed information, in this paper we briefly describe the technological background of DAMEWARE, give a short introduction to some relevant aspects of data mining, followed by a summary of some science cases and, finally, we provide a detailed description of a template use case.
  • We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of GAME will be made available to the community by integrating it into the web application DAMEWARE (DAta Mining Web Application REsource (http://dame.dsf.unina.it/beta_info.html), a public data mining service specialized on massive astrophysical data. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm leads to a speedup of a factor of 200x in the training phase with respect to the CPU based version.
  • We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances and scalability.
  • Nowadays, many scientific areas share the same need of being able to deal with massive and distributed datasets and to perform on them complex knowledge extraction tasks. This simple consideration is behind the international efforts to build virtual organizations such as, for instance, the Virtual Observatory (VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose, Web-based, VObs compliant, distributed data mining infrastructure specialized in Massive Data Sets exploration with machine learning methods. Initially fine tuned to deal with astronomical data only, DAME has evolved in a general purpose platform which has found applications also in other domains of human endeavor. We present the products and a short outline of a science case, together with a detailed description of main features available in the beta release of the web application now released.