
The explosion of data in recent years has generated an increasing need for
new analysis techniques in order to extract knowledge from massive datasets.
Machine learning has proved particularly useful to perform this task. Fully
automatized methods have recently gathered great popularity, even though those
methods often lack physical interpretability. In contrast, feature based
approaches can provide both wellperforming models and understandable
causalities with respect to the correlations found between features and
physical processes. Efficient feature selection is an essential tool to boost
the performance of machine learning models. In this work, we propose a forward
selection method in order to compute, evaluate, and characterize better
performing features for regression and classification problems. Given the
importance of photometric redshift estimation, we adopt it as our case study.
We synthetically created 4,520 features by combining magnitudes, errors, radii,
and ellipticities of quasars, taken from the SDSS. We apply a forward selection
process, a recursive method in which a huge number of feature sets is tested
through a kNN algorithm, leading to a tree of feature sets. The branches of the
tree are then used to perform experiments with the random forest, in order to
validate the best set with an alternative model. We demonstrate that the sets
of features determined with our approach improve the performances of the
regression models significantly when compared to the performance of the classic
features from the literature. The found features are unexpected and surprising,
being very different from the classic features. Therefore, a method to
interpret some of the found features in a physical context is presented. The
methodology described here is very general and can be used to improve the
performance of machine learning models for any regression or classification
task.

The need to analyze the available large synoptic multiband surveys drives
the development of new dataanalysis methods. Photometric redshift estimation
is one field of application where such new methods improved the results,
substantially. Up to now, the vast majority of applied redshift estimation
methods have utilized photometric features. We aim to develop a method to
derive probabilistic photometric redshift directly from multiband imaging
data, rendering preclassification of objects and feature extraction obsolete.
A modified version of a deep convolutional network was combined with a mixture
density network. The estimates are expressed as Gaussian mixture models
representing the probability density functions (PDFs) in the redshift space. In
addition to the traditional scores, the continuous ranked probability score
(CRPS) and the probability integral transform (PIT) were applied as performance
criteria. We have adopted a feature based random forest and a plain mixture
density network to compare performances on experiments with data from SDSS
(DR9). We show that the proposed method is able to predict redshift PDFs
independently from the type of source, for example galaxies, quasars or stars.
Thereby the prediction performance is better than both presented reference
methods and is comparable to results from the literature. The presented method
is extremely general and allows us to solve of any kind of probabilistic
regression problems based on imaging data, for example estimating metallicity
or star formation rate of galaxies. This kind of methodology is tremendously
important for the next generation of surveys.

The need for accurate photometric redshifts estimation is a topic that has
fundamental importance in Astronomy, due to the necessity of efficiently
obtaining redshift information without the need of spectroscopic analysis. We
propose a method for determining accurate multimodal photoz probability
density functions (PDFs) using Mixture Density Networks (MDN) and Deep
Convolutional Networks (DCN). A comparison with a Random Forest (RF) is
performed.

Photometric redshifts play an important role as a measure of distance for
various cosmological topics. Spectroscopic redshifts are only available for a
very limited number of objects but can be used for creating statistical models.
A broad variety of photometric catalogues provide uncertain low resolution
spectral information for galaxies and quasars that can be used to infer a
redshift. Many different techniques have been developed to produce those
redshift estimates with increasing precision. Instead of providing a point
estimate only, astronomers start to generate probabilistic density functions
(PDFs) which should provide a characterisation of the uncertainties of the
estimation. In this work we present two simple approaches on how to generate
those PDFs. We use the example of generating the photometric redshift PDFs of
quasars from SDSS(DR7) to validate our approaches and to compare them with
point estimates. We do not aim for presenting a new best performing method, but
we choose an intuitive approach that is based on well known machine learning
algorithms. Furthermore we introduce proper tools for evaluating the
performance of PDFs in the context of astronomy. The continuous ranked
probability score (CRPS) and the probability integral transform (PIT) are well
accepted in the weather forecasting community. Both tools reflect how well the
PDFs reproduce the real values of the analysed objects. As we show, nearly all
currently used measures in astronomy show severe weaknesses when used to
evaluate PDFs.

The exploitation of present and future synoptic (multiband and multiepoch)
surveys requires an extensive use of automatic methods for data processing and
data interpretation. In this work, using data extracted from the Catalina Real
Time Transient Survey (CRTS), we investigate the classification performance of
some well tested methods: Random Forest, MLPQNA (Multi Layer Perceptron with
Quasi Newton Algorithm) and KNearest Neighbors, paying special attention to
the feature selection phase. In order to do so, several classification
experiments were performed. Namely: identification of cataclysmic variables,
separation between galactic and extragalactic objects and identification of
supernovae.