• The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive datasets. Machine learning has proved particularly useful to perform this task. Fully automatized methods have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study. We synthetically created 4,520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the SDSS. We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a kNN algorithm, leading to a tree of feature sets. The branches of the tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented. The methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task.
  • The need to analyze the available large synoptic multi-band surveys drives the development of new data-analysis methods. Photometric redshift estimation is one field of application where such new methods improved the results, substantially. Up to now, the vast majority of applied redshift estimation methods have utilized photometric features. We aim to develop a method to derive probabilistic photometric redshift directly from multi-band imaging data, rendering pre-classification of objects and feature extraction obsolete. A modified version of a deep convolutional network was combined with a mixture density network. The estimates are expressed as Gaussian mixture models representing the probability density functions (PDFs) in the redshift space. In addition to the traditional scores, the continuous ranked probability score (CRPS) and the probability integral transform (PIT) were applied as performance criteria. We have adopted a feature based random forest and a plain mixture density network to compare performances on experiments with data from SDSS (DR9). We show that the proposed method is able to predict redshift PDFs independently from the type of source, for example galaxies, quasars or stars. Thereby the prediction performance is better than both presented reference methods and is comparable to results from the literature. The presented method is extremely general and allows us to solve of any kind of probabilistic regression problems based on imaging data, for example estimating metallicity or star formation rate of galaxies. This kind of methodology is tremendously important for the next generation of surveys.
  • The need for accurate photometric redshifts estimation is a topic that has fundamental importance in Astronomy, due to the necessity of efficiently obtaining redshift information without the need of spectroscopic analysis. We propose a method for determining accurate multimodal photo-z probability density functions (PDFs) using Mixture Density Networks (MDN) and Deep Convolutional Networks (DCN). A comparison with a Random Forest (RF) is performed.
  • Photometric redshifts play an important role as a measure of distance for various cosmological topics. Spectroscopic redshifts are only available for a very limited number of objects but can be used for creating statistical models. A broad variety of photometric catalogues provide uncertain low resolution spectral information for galaxies and quasars that can be used to infer a redshift. Many different techniques have been developed to produce those redshift estimates with increasing precision. Instead of providing a point estimate only, astronomers start to generate probabilistic density functions (PDFs) which should provide a characterisation of the uncertainties of the estimation. In this work we present two simple approaches on how to generate those PDFs. We use the example of generating the photometric redshift PDFs of quasars from SDSS(DR7) to validate our approaches and to compare them with point estimates. We do not aim for presenting a new best performing method, but we choose an intuitive approach that is based on well known machine learning algorithms. Furthermore we introduce proper tools for evaluating the performance of PDFs in the context of astronomy. The continuous ranked probability score (CRPS) and the probability integral transform (PIT) are well accepted in the weather forecasting community. Both tools reflect how well the PDFs reproduce the real values of the analysed objects. As we show, nearly all currently used measures in astronomy show severe weaknesses when used to evaluate PDFs.
  • The exploitation of present and future synoptic (multi-band and multi-epoch) surveys requires an extensive use of automatic methods for data processing and data interpretation. In this work, using data extracted from the Catalina Real Time Transient Survey (CRTS), we investigate the classification performance of some well tested methods: Random Forest, MLPQNA (Multi Layer Perceptron with Quasi Newton Algorithm) and K-Nearest Neighbors, paying special attention to the feature selection phase. In order to do so, several classification experiments were performed. Namely: identification of cataclysmic variables, separation between galactic and extra-galactic objects and identification of supernovae.