• Information theory is a mathematical theory of learning with deep connections with topics as diverse as artificial intelligence, statistical physics, and biological evolution. Many primers on information theory paint a broad picture with relatively little mathematical sophistication, while many others develop specific application areas in detail. In contrast, these informal notes aim to outline some elements of the information-theoretic "way of thinking," by cutting a rapid and interesting path through some of the theory's foundational concepts and results. They are aimed at practicing systems scientists who are interested in exploring potential connections between information theory and their own fields. The main mathematical prerequisite for the notes is comfort with elementary probability, including sample spaces, conditioning, and expectations. We take the Kullback-Leibler divergence as our most basic concept, and then proceed to develop the entropy and mutual information. We discuss some of the main results, including the Chernoff bounds as a characterization of the divergence; Gibbs' Theorem; and the Data Processing Inequality. A recurring theme is that the definitions of information theory support natural theorems that sound ``obvious'' when translated into English. More pithily, ``information theory makes common sense precise.'' Since the focus of the notes is not primarily on technical details, proofs are provided only where the relevant techniques are illustrative of broader themes. Otherwise, proofs and intriguing tangents are referenced in liberally-sprinkled footnotes. The notes close with a highly nonexhaustive list of references to resources and other perspectives on the field.
  • Silicon drift detectors (SDDs) revolutionized spectroscopy in fields as diverse as geology and dentistry. For a subset of experiments at ultra-fast, x-ray free-electron lasers (FELs), SDDs can make substantial contributions. Often the unknown spectrum is interesting, carrying science data, or the background measurement is useful to identify unexpected signals. Many measurements involve only several discrete photon energies known a priori, allowing single event decomposition of pile-up and spectroscopic photon counting. We designed a pulse function and demonstrated that the signal amplitude and rise time are obtained for each pulse by fitting, thus removing the need for pulse shaping. By avoiding pulse shaping, rise times of tens of nanoseconds resulted in reduced pulse pile-up and allowed decomposition of remaining pulse pile-up at photon separation times down to hundreds of nanoseconds while yielding time-of-arrival information with precision of 10 nanoseconds. Waveform fitting yields simultaneously high energy resolution and high counting rates (2 orders of magnitude higher than current digital pulse processors). We showed that pile-up spectrum fitting is relatively simple and preferable to pile-up spectrum deconvolution. We developed a photon pile-up statistical model for constant intensity sources, extended it to variable intensity sources (typical for FELs) and used it to fit a complex pile-up spectrum. We subsequently developed a Bayesian pile-up decomposition method that allows decomposing pile-up of single events with up to 6 photons from 6 monochromatic lines with 99% accuracy. The usefulness of SDDs will continue into the x-ray FEL era of science. Their successors, the ePixS hybrid pixel detectors, already offer hundreds of pixels, each with similar performance to an SDD, in a compact, robust and affordable package
  • Cryo-electron microscopy provides 2-D projection images of the 3-D electron scattering intensity of many instances of the particle under study (e.g., a virus). Both symmetry (rotational point groups) and heterogeneity are important aspects of biological particles and both aspects can be combined by describing the electron scattering intensity of the particle as a stochastic process with a symmetric probability law and therefore symmetric moments. A maximum likelihood estimator implemented by an expectation-maximization algorithm is described which estimates the unknown statistics of the electron scattering intensity stochastic process from images of instances of the particle. The algorithm is demonstrated on the bacteriophage HK97 and the virus N$\omega$V. The results are contrasted with existing algorithms which assume that each instance of the particle has the symmetry rather than the less restrictive assumption that the probability law has the symmetry.
  • The "backward simulation" of a stochastic process is defined as the stochastic dynamics that trace a time-reversed path from the target region to the initial configuration. If the probabilities calculated by the original simulation are easily restored from those obtained by backward dynamics, we can use it as a computational tool. It is shown that the naive approach to backward simulation does not work as expected. As a remedy, the Time Reverse Monte Carlo method (TRMC) based on the ideas of Sequential Importance Sampling (SIS) and Sequential Monte Carlo (SMC) is proposed and successfully tested with a stochastic typhoon model and the Lorenz 96 model. TRMC with SMC, which contains resampling steps, is shown to be more efficient for simulations with a larger number of time steps. A limitation of TRMC and its relation to the Bayes formula are also discussed.
  • We introduce an infinite family of quantifiers of quantum correlations beyond entanglement which vanish on both classical-quantum and quantum-classical states and are in one-to-one correspondence with the metric-adjusted skew informations. The `quantum $f-$correlations' are defined as the maximum metric-adjusted $f-$correlations between pairs of local observables with the same fixed equispaced spectrum. We show that these quantifiers are entanglement monotones when restricted to pure states of qubit-qudit systems. We also evaluate the quantum $f-$correlations in closed form for two-qubit systems and discuss their behaviour under local commutativity preserving channels. We finally provide a physical interpretation for the quantifier corresponding to the average of the Wigner-Yanase-Dyson skew informations.
  • We present a Bayesian analysis of the epistemology of analogue experiments with particular reference to Hawking radiation. First, we prove that such experiments can be confirmatory in Bayesian terms based upon appeal to 'universality arguments'. Second, we provide a formal model for the scaling behaviour of the confirmation measure for multiple distinct realisations of the analogue system and isolate a generic saturation feature. Finally, we demonstrate that different potential analogue realisations could provide different levels of confirmation. Our results provide a basis both to formalise the epistemic value of analogue experiments that have been conducted and to advise scientists as to the respective epistemic value of future analogue experiments.
  • Manifold-learning techniques are routinely used in mining complex spatiotemporal data to extract useful, parsimonious data representations/parametrizations; these are, in turn, useful in nonlinear model identification tasks. We focus here on the case of time series data that can ultimately be modelled as a spatially distributed system (e.g. a partial differential equation, PDE), but where we do not know the space in which this PDE should be formulated. Hence, even the spatial coordinates for the distributed system themselves need to be identified - to emerge from - the data mining process. We will first validate this emergent space reconstruction for time series sampled without space labels in known PDEs; this brings up the issue of observability of physical space from temporal observation data, and the transition from spatially resolved to lumped (order-parameter-based) representations by tuning the scale of the data mining kernels. We will then present actual emergent space discovery illustrations. Our illustrative examples include chimera states (states of coexisting coherent and incoherent dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics, arising in partial differential equations and/or in heterogeneous networks. We also discuss how data-driven spatial coordinates can be extracted in ways invariant to the nature of the measuring instrument. Such gauge-invariant data mining can go beyond the fusion of heterogeneous observations of the same system, to the possible matching of apparently different systems.
  • We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given $k$ fixed protons in $\mathbb{R}^d,$ and $k$ electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time.
  • This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.
  • The problem of information fusion from multiple data-sets acquired by multimodal sensors has drawn significant research attention over the years. In this paper, we focus on a particular problem setting consisting of a physical phenomenon or a system of interest observed by multiple sensors. We assume that all sensors measure some aspects of the system of interest with additional sensor-specific and irrelevant components. Our goal is to recover the variables relevant to the observed system and to filter out the nuisance effects of the sensor-specific variables. We propose an approach based on manifold learning, which is particularly suitable for problems with multiple modalities, since it aims to capture the intrinsic structure of the data and relies on minimal prior model knowledge. Specifically, we propose a nonlinear filtering scheme, which extracts the hidden sources of variability captured by two or more sensors, that are independent of the sensor-specific components. In addition to presenting a theoretical analysis, we demonstrate our technique on real measured data for the purpose of sleep stage assessment based on multiple, multimodal sensor measurements. We show that without prior knowledge on the different modalities and on the measured system, our method gives rise to a data-driven representation that is well correlated with the underlying sleep process and is robust to noise and sensor-specific effects.
  • We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as expansions in terms of various stochastic functions. The algorithm predicts the classification/regression values of new data by combining (via voting) the outputs of these numerous linear expansions in randomly chosen functions. The few parameters (typically only one parameter is used in all studied examples) that this model has may be automatically optimized. The algorithm has been tested on 10 diverse training data sets of various types and feature space dimensions. It has been shown to consistently exhibit high accuracy and readily allow for optimization of parameters, while simultaneously avoiding pitfalls of existing algorithms such as those associated with class imbalance. We very briefly speculate on whether spatial coordinates in physical theories may be viewed as emergent "features" that enable a robust machine learning type description of data with generic low order smooth functions.
  • Considerable efforts were made in recent years in devising optimization algorithms for influence maximization in networks. Here we ask: "When do we need optimization?" We use results from statistical mechanics and direct simulations on ER networks, small-world networks, power-law networks and a dataset of real-world networks to characterize the parameter-space region where optimization is required. We show that in both synthetic and real-world networks this optimization region is due to a well known physical phase transition of the network, and that it vanishes as a power-law with the network size. We then show that also from a utility-maximization perspective (when considering the costs of the optimization process), for large networks standard optimization is profitable only in a vanishing parameter region near the phase transition. Finally, we introduce a novel constant-time optimization approach, and demonstrate it through a simple algorithm that manages to give similar results to standard optimization methods in terms of the influenced-set size, while improving the results in terms of the net utility.
  • This paper has incorporated the stochasticity into the Newell car following model. Three stochastic driving factors have been considered: (i) Driver's acceleration is bounded. (ii) Driver's deceleration includes stochastic component, which is depicted by a deceleration with the randomization probability that is assumed to increase with the speed. (iii) Vehicles in the jam state have a larger randomization probability. Two simulation scenarios are conducted to test the model. In the first scenario, traffic flow on a circular road is investigated. In the second scenario, empirical traffic flow patterns in the NGSIM data induced by a rubberneck bottleneck is studied, and the simulated traffic oscillations and synchronized traffic flow are consistent with the empirical patterns. Moreover, two experiments of model calibration and validation are conducted. The first is to calibrate and validate using experimental data, which illustrates that the concave growth pattern has been quantitatively simulated. The second is to calibrate and cross validate vehicles' trajectories using NGSIM data, which exhibits that the car following behaviors of single vehicles can be well described. Therefore, our study highlights the importance of speed dependent stochasticity in traffic flow modeling, which cannot be ignored as in most car-following studies.
  • Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
  • Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are statistical packages for estimating various parametric forms of the preferential attachment function, there is no such package implementing non-parametric estimation procedures. The non-parametric approach to the estimation of the preferential attachment function allows for comparatively finer-grained investigations of the `rich-get-richer' phenomenon that could lead to novel insights in the search to explain certain nonstandard structural properties observed in real-world networks. This paper introduces the R package PAFit, which implements non-parametric procedures for estimating the preferential attachment function and node fitnesses in a growing network, as well as a number of functions for generating complex networks from these two mechanisms. The main computational part of the package is implemented in C++ with OpenMP to ensure scalability to large-scale networks. We first introduce the main functionalities of PAFit through simulated examples, and then use the package to analyze a collaboration network between scientists in the field of complex networks. The results indicate the joint presence of `rich-get-richer' and `fit-get-richer' phenomena in the collaboration network. The estimated attachment function is observed to be near-linear, which we interpret as meaning that the chance an author gets a new collaborator is proportional to their current number of collaborators. Furthermore, the estimated author fitnesses reveal a host of familiar faces from the complex networks community among the field's topmost fittest network scientists.
  • So far, the problem of unmixing large or multitemporal hyperspectral datasets has been specifically addressed in the remote sensing literature only by a few dedicated strategies. Among them, some attempts have been made within a distributed estimation framework, in particular relying on the alternating direction method of multipliers (ADMM). In this paper, we propose to study the interest of a partially asynchronous distributed unmixing procedure based on a recently proposed asynchronous algorithm. Under standard assumptions, the proposed algorithm inherits its convergence properties from recent contributions in non-convex optimization, while allowing the problem of interest to be efficiently addressed. Comparisons with a distributed synchronous counterpart of the proposed unmixing procedure allow its interest to be assessed on synthetic and real data. Besides, thanks to its genericity and flexibility, the procedure investigated in this work can be implemented to address various matrix factorization problems.
  • Trends change rapidly in today's world, prompting this key question: What is the mechanism behind the emergence of new trends? By representing real-world dynamic systems as complex networks, the emergence of new trends can be symbolized by vertices that "shine." That is, at a specific time interval in a network's life, certain vertices become increasingly connected to other vertices. This process creates new high-degree vertices, i.e., network stars. Thus, to study trends, we must look at how networks evolve over time and determine how the stars behave. In our research, we constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations resulted: (a) links are most prevalent among vertices that join a network at a similar time; (b) the rate that new vertices join a network is a central factor in molding a network's topology; and (c) the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a flexible network-generation model based on large-scale, real-world data. This model gives a better understanding of how stars rise and fall within networks, and is applicable to dynamic systems both in nature and society.
  • In this paper, we investigate the crossing statistics of speckle patterns formed in Fresnel diffraction region by a laser beam scattering through a nanofluid. We extend $zero-crossing$ statistics to assess dynamical properties of nanofluid. According to joint probability density function of laser beam fluctuation and its time derivative, theoretical framework for Gaussian and non-Gaussian regimes are revisited. We count number of crossings not only at {\it zero} level but also for all available thresholds to determine the average speed of moving particles. Using probabilistic framework in determining crossing statistics, {\it a priori} Gaussianity is not essentially considered, therefore even in presence of deviation from Gaussian fluctuation, this modified approach is capable to compute relevant quantities such as mean value of speed more precisely. Generalized total crossing which represents the weighted summation of crossings for all thresholds to quantify small deviation from Gaussian statistics is introduced. This criterion can also manipulate the contribution of noises and trends to infer reliable physical quantities. The characteristic time scale for having successive crossings at a given threshold is defined. In our experimental setup, we find that increasing sample temperature leads to more consistency between Gaussian and perturbative non-Gaussian predictions. The maximum number of crossing does not necessarily occur at mean level indicating that we should take into account other levels in addition to {\it zero} level to achieve more accurate assessments.
  • We introduce a pipeline including multifractal detrended cross-correlation analysis (MF-DXA) modified by either singular value decomposition or the adaptive method to examine the statistical properties of the pulsar timing residual ($PTR$) induced by a gravitational wave (GW) signal. We propose a new algorithm, the so-called irregular-MF-DXA, to deal with irregular data sampling. Inspired by the quadrupolar nature of the spatial cross-correlation function of a gravitational wave background, a new cross-correlation function, $\bar{\sigma}_{\times}$, derived from irregular-MF-DXA will be introduced. We show that, this measure reveals the quadrupolar signature in the $PTRs$ induced by stochastic GWB. We propose four strategies based on the $y$-intercept of fluctuation functions, the generalized Hurst exponent, and the width of the singularity spectrum to determine the dimensionless amplitude and power-law exponent of the characteristic strain spectrum as $\mathcal{H}_c(f)\sim\mathcal{A}_{yr}(f/f_{yr})^{\zeta}$ for stochastic GWB. Using the value of Hurst exponent, one can clarify the type of GWs. We apply our pipeline to explore 20 millisecond pulsars observed by Parkes Pulsar Timing Array. The computed scaling exponents confirm that all data are classified into a nonstationary class implying the universality feature. The value of the Hurst exponent is in the range $H\in [0.56,0.87]$. The $q$-dependency of the generalized Hurst exponent demonstrates that the observed $PTRs$ have multifractal behavior, and the source of this multifractality is mainly attributed to the correlation of data which is another universality of the observed datasets. Multifractal analysis of available $PTRs$ datasets reveals an upper bound on the dimensionless amplitude of the GWB, $\mathcal{A}_{yr}< 2.0\times 10^{-15}$.
  • We introduce dynamic nested sampling: a generalisation of the nested sampling algorithm in which the number of "live points" varies to allocate samples more efficiently. In empirical tests the new method significantly improves calculation accuracy compared to standard nested sampling with the same number of samples; this increase in accuracy is equivalent to speeding up the computation by factors of up to ~72 for parameter estimation and ~7 for evidence calculations. We also show that the accuracy of both parameter estimation and evidence calculations can be improved simultaneously. In addition, unlike in standard nested sampling, more accurate results can be obtained by continuing the calculation for longer. Popular standard nested sampling implementations can be easily adapted to perform dynamic nested sampling, and several dynamic nested sampling software packages are now publicly available.
  • I review recent works showing that information geometry is a useful framework to characterize quantum coherence and entanglement. Quantum systems exhibit peculiar properties which cannot be justified by classical physics, e.g. quantum coherence and quantum correlations. Once confined to thought experiments, they are nowadays created and manipulated by exerting an exquisite experimental control of atoms, molecules and photons. It is important to identify and quantify such quantum features, as they are deemed to be key resources to achieve supraclassical performances in computation and communication protocols. The information geometry viewpoint elucidates the advantage provided by quantum superpositions in phase estimation. Also, it enables to link measures of coherence and entanglement to observables, which can be evaluated in a laboratory by a limited number of measurements.
  • Effects of non-stationarity on the performance of hybrid ensemble filters are studied (by hybrid filters we mean those which blend ensemble covariances with some other regularizing covariances). To isolate effects of non-stationarity from effects due to nonlinearity (and the non-Gaussianity it causes), a new doubly stochastic advection-diffusion-decay model (DSADM) is proposed. The model is hierarchical: it is a linear stochastic partial differential equation whose coefficients are random fields defined through their own stochastic partial differential equations. DSADM generates conditionally Gaussian spatiotemporal random fields with a tunable degree of non-stationarity in space and time. DSADM allows the use of the exact Kalman filter as a baseline benchmark. In numerical experiments with DSADM as the "model of truth", the relative importance of the three kinds of covariance blending is studied: with static, time-smoothed, and space-smoothed covariances. It is shown that the stronger the non-stationarity, the less useful the static covariance matrix becomes and the more beneficial the time-smoothed covariances are. Time-smoothing of background-error covariances proved to be systematically more useful than their space-smoothing. Under non-stationarity, a filter that extends the (previously proposed by the authors) Hierarchical Bayes Ensemble Filter and accommodates the three covariance-blending techniques is shown to outperform all other configurations of the filters tested. The R code of the model and the filters is available from github.com/cyrulnic/NoStRa.
  • The data torrent unleashed by current and upcoming astronomical surveys demands scalable analysis methods. Many machine learning approaches scale well, but separating the instrument measurement from the physical effects of interest, dealing with variable errors, and deriving parameter uncertainties is often an after-thought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here I present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, the number of physical model evaluations scales sub-linearly with the number of data sets, and no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations need to be made. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.
  • We consider the problem of detecting a random walk on a graph, based on observations of the graph nodes. When visited by the walk, each node of the graph observes a signal of elevated mean, which we assume can be different across different nodes. Outside of the path of the walk, and also in its absence, nodes measure only noise. Assuming the Neyman-Pearson setting, our goal then is to characterize detection performance by computing the error exponent for the probability of a miss, under a constraint on the probability of false alarm. Since exact computation of the error exponent is known to be difficult, equivalent to computation of the Lyapunov exponent, we approximate its value by finding a tractable lower bound. The bound reveals an interesting detectability condition: the walk is detectable whenever the entropy of the walk is smaller than one half of the expected signal-to-noise ratio. We derive the bound by extending the notion of Markov types to Gauss-Markov types. These are sequences of state-observation pairs with a given number of node-to-node transition counts and the same average signal values across nodes, computed from the measurements made during the times the random walk was visiting each node's respective location. The lower bound has an intuitive interpretation: among all Gauss-Markov types that are asymptotically feasible in the absence of the walk, the bound finds the most typical one under the presence of the walk. Finally, we show by a sequence of judicious problem reformulations that computing the bound reduces to solving a convex optimization problem, which is a result in its own right.
  • Astronomical data often suffer from noise and incompleteness. We extend the common mixtures-of-Gaussians density estimation approach to account for situations with a known sample incompleteness by simultaneous imputation from the current model. The method, called GMMis, generalizes existing Expectation-Maximization techniques for truncated data to arbitrary truncation geometries and probabilistic rejection processes, as long as they can be specified and do not depend on the density itself. The method accounts for independent multivariate normal measurement errors for each of the observed samples and recovers an estimate of the error-free distribution from which both observed and unobserved samples are drawn. It can perform a separation of a mixtures-of-Gaussian signal from a specified background distribution whose amplitude may be unknown. We compare GMMis to the standard Gaussian mixture model for simple test cases with different types of incompleteness, and apply it to observational data from the NASA Chandra X-ray telescope. The python code is released as an open-source package at https://github.com/pmelchior/pyGMMis