• It is becoming increasingly common to see large collections of network data objects -- that is, data sets in which a network is viewed as a fundamental unit of observation. As a result, there is a pressing need to develop network-based analogues of even many of the most basic tools already standard for scalar and vector data. In this paper, our focus is on averages of unlabeled, undirected networks with edge weights. Specifically, we (i) characterize a certain notion of the space of all such networks, (ii) describe key topological and geometric properties of this space relevant to doing probability and statistics thereupon, and (iii) use these properties to establish the asymptotic behavior of a generalized notion of an empirical mean under sampling from a distribution supported on this space. Our results rely on a combination of tools from geometry, probability theory, and statistical shape analysis. In particular, the lack of vertex labeling necessitates working with a quotient space modding out permutations of labels. This results in a nontrivial geometry for the space of unlabeled networks, which in turn is found to have important implications on the types of probabilistic and statistical results that may be obtained and the techniques needed to obtain them.
  • Current understanding holds that financial contagion is driven mainly by the system-wide interconnectedness of institutions. A distinction has been made between systematic and idiosyncratic channels of contagion, with shocks transmitted through the latter expected to be substantially more likely to lead to systemic crisis than through the former. Idiosyncratic connectivity is thought to be driven not simply by obviously shared characteristics among institutions, but more by latent characteristics that lead to the holding of related securities. We develop a graphical model for multivariate financial time series with interest in uncovering the latent positions of nodes in a network intended to capture idiosyncratic relationships. We propose a hierarchical model consisting of a VAR, a covariance graphical model (CGM) and a latent position model (LPM). The VAR enables us to extract useful information on the idiosyncratic components, which are used by the CGM to model the network and the LPM uncovers the spatial position of the nodes. We also develop a Markov chain Monte Carlo algorithm that iterates between sampling parameters of the CGM and the LPM, using samples from the latter to update prior information for covariance graph selection. We show empirically that modeling the idiosyncratic channel of contagion using our approach can relate latent institutional features to systemic vulnerabilities prior to a crisis.
  • The need to produce accurate estimates of vertex degree in a large network, based on observation of a subnetwork, arises in a number of practical settings. We study a formalized version of this problem, wherein the goal is, given a randomly sampled subnetwork from a large parent network, to estimate the actual degree of the sampled nodes. Depending on the sampling scheme, trivial method of moments estimators (MMEs) can be used. However, the MME is not expected, in general, to use all relevant network information. In this study, we propose a handful of novel estimators derived from a risk-theoretic perspective, which make more sophisticated use of the information in the sampled network. Theoretical assessment of the new estimators characterizes under what conditions they can offer improvement over the MME, while numerical comparisons show that when such improvement obtains, it can be substantial. Illustration is provided on a human trafficking network.
  • Consider observing an undirected network that is `noisy' in the sense that there are Type I and Type II errors in the observation of edges. Such errors can arise, for example, in the context of inferring gene regulatory networks in genomics or functional connectivity networks in neuroscience. Given a single observed network then, to what extent are summary statistics for that network representative of their analogues for the true underlying network? Can we infer such statistics more accurately by taking into account the noise in the observed network edges? In this paper, we answer both of these questions. In particular, we develop a spectral-based methodology using the adjacency matrix to `denoise' the observed network data and produce more accurate inference of the summary statistics of the true network. We characterize performance of our methodology through bounds on appropriate notions of risk in the $L^2$ sense, and conclude by illustrating the practical impact of this work on synthetic and real-world data.