• One of the fundamental problems in the realm of peer-to-peer systems is that of determining their service capacities. In this paper, we focus on P2P scalability issues and propose models to compute the achievable throughput under distinct policies for selecting both peers and blocks. From these models, we obtain novel insights on the behavior of P2P swarming systems that motivate new mechanisms for publishers and peers to improve the overall performance. In particular, we obtain operational regions for swarm system. In addition, we show that system capacity significantly increases if publishers adopt the most deprived peer selection and peers reduce their service rates when they have all the file blocks but one.
  • Covert communication conceals the transmission of the message from an attentive adversary. Recent work on the limits of covert communication in additive white Gaussian noise (AWGN) channels has demonstrated that a covert transmitter (Alice) can reliably transmit a maximum of $\mathcal{O}\left(\sqrt{n}\right)$ bits to a covert receiver (Bob) without being detected by an adversary (Warden Willie) in $n$ channel uses. This paper focuses on the scenario where other friendly nodes distributed according to a two-dimensional Poisson point process with density $m$ are present in the environment. We propose a strategy where the friendly node closest to the adversary, without close coordination with Alice, produces artificial noise. We show that this method allows Alice to reliably and covertly send $\mathcal{O}(\min\{{n,m^{\gamma/2}\sqrt{n}}\})$ bits to Bob in $n$ channel uses, where $\gamma$ is the path-loss exponent. Moreover, we also consider a setting where there are $N_{\mathrm{w}}$ collaborating adversaries uniformly and randomly located in the environment and show that in $n$ channel uses, Alice can reliably and covertly send $\mathcal{O}\left(\min\left\{n,\frac{m^{\gamma/2} \sqrt{n}}{N_{\mathrm{w}}^{\gamma}}\right\}\right)$ bits to Bob when $\gamma >2$, and $\mathcal{O}\left(\min\left\{n,\frac{m \sqrt{n}}{N_{\mathrm{w}}^{2}\log^2 {N_{\mathrm{w}}}}\right\}\right)$ when $\gamma = 2$. Conversely, we demonstrate that no higher covert throughput is possible for $\gamma>2$.
  • Estimating distributions of node characteristics (labels) such as number of connections or citizenship of users in a social network via edge and node sampling is a vital part of the study of complex networks. Due to its low cost, sampling via a random walk (RW) has been proposed as an attractive solution to this task. Most RW methods assume either that the network is undirected or that walkers can traverse edges regardless of their direction. Some RW methods have been designed for directed networks where edges coming into a node are not directly observable. In this work, we propose Directed Unbiased Frontier Sampling (DUFS), a sampling method based on a large number of coordinated walkers, each starting from a node chosen uniformly at random. It is applicable to directed networks with invisible incoming edges because it constructs, in real-time, an undirected graph consistent with the walkers trajectories, and due to the use of random jumps which prevent walkers from being trapped. DUFS generalizes previous RW methods and is suited for undirected networks and to directed networks regardless of in-edges visibility. We also propose an improved estimator of node label distributions that combines information from the initial walker locations with subsequent RW observations. We evaluate DUFS, compare it to other RW methods, investigate the impact of its parameters on estimation accuracy and provide practical guidelines for choosing them. In estimating out-degree distributions, DUFS yields significantly better estimates of the head of the distribution than other methods, while matching or exceeding estimation accuracy of the tail. Last, we show that DUFS outperforms uniform node sampling when estimating distributions of node labels of the top 10% largest degree nodes, even when sampling a node uniformly has the same cost as RW steps.
  • The modeling and analysis of an LRU cache is extremely challenging as exact results for the main performance metrics (e.g. hit rate) are either lacking or cannot be used because of their high computational complexity for large caches. As a result, various approximations have been proposed. The state-of-the-art method is the so-called TTL approximation, first proposed and shown to be asymptotically exact for IRM requests by Fagin. It has been applied to various other workload models and numerically demonstrated to be accurate but without theoretical justification. In this paper we provide theoretical justification for the approximation in the case where distinct contents are described by independent stationary and ergodic processes. We show that this approximation is exact as the cache size and the number of contents go to infinity. This extends earlier results for the independent reference model. Moreover, we establish results not only for the aggregate cache hit probability but also for every individual content. Last, we obtain bounds on the rate of convergence.
  • The design of caching algorithms to maximize hit probability has been extensively studied. In this paper, we associate each content with a utility, which is a function of either corresponding content hit rate or hit probability. We formulate a cache optimization problem to maximize the sum of utilities over all contents under stationary and ergodic request process, which is non-convex in general. We find that the problem can be reformulated as a convex optimization problem when the inter-request distribution has a non-increasing hazard rate function. We provide explicit optimal solutions for some inter-request distributions, and compare the solutions of the hit-rate based (HRB) and hit-probability based (HPB) problems. We also propose decentralized algorithms that can be implemented using limited information and are guaranteed to provide optimal solutions. We find that decentralized algorithms that solve HRB are more robust than decentralized HPB algorithms. Informed by these results, we further propose lightweight Poisson approximate decentralized and online algorithms that are accurate and efficient in achieving optimal hit rates and hit probabilities.
  • In covert communication, Alice tries to communicate with Bob without being detected by a warden Willie. When the distance between Alice and Bob becomes large compared to the distance between Alice and Willie(s), the performance of covert communication will be degraded. In this case, multi-hop message transmission via intermediate relays can help to improve performance. Hence, in this work multi-hop covert communication over a moderate size network and in the presence of multiple collaborating Willies is considered. The relays can transmit covertly using either a single key for all relays, or different independent keys at the relays. For each case, we develop efficient algorithms to find optimal paths with maximum throughput and minimum end-to-end delay between Alice and Bob. As expected, employing multiple hops significantly improves the ability to communicate covertly versus the case of a single-hop transmission. Furthermore, at the expense of more shared key bits, analytical results and numerical simulations demonstrate that multi-hop covert communication with different independent keys at the relays has better performance than multi-hop covert communication with a single key.
  • We consider the problem of optimally compressing and caching data across a communication network. Given the data generated at edge nodes and a routing path, our goal is to determine the optimal data compression ratios and caching decisions across the network in order to minimize average latency, which can be shown to be equivalent to maximizing the compression and caching gain under an energy consumption constraint. We show that this problem is NP-hard in general and the hardness is caused by the caching decision subproblem, while the compression sub-problem is polynomial-time solvable. We then propose an approximation algorithm that achieves a $(1-1/e)$-approximation solution to the optimum in strongly polynomial time. We show that our proposed algorithm achieve the near-optimal performance in synthetic-based evaluations. In this paper, we consider a tree-structured network as an illustrative example, but our results easily extend to general network topology at the expense of more complicated notations.
  • In recent years, along with the overwhelming advances in the field of neural information processing, quantum information processing (QIP) has shown significant progress in solving problems that are intractable on classical computers. Quantum machine learning (QML) explores the ways in which these fields can learn from one another. We propose quantum walk neural networks (QWNN), a new graph neural network architecture based on quantum random walks, the quantum parallel to classical random walks. A QWNN learns a quantum walk on a graph to construct a diffusion operator which can be applied to a signal on a graph. We demonstrate the use of the network for prediction tasks for graph structured signals.
  • The Transmission Control Protocol (TCP) utilizes congestion avoidance and control mechanisms as a preventive measure against congestive collapse and as an adaptive measure in the presence of changing network conditions. The set of available congestion control algorithms is diverse, and while many have been studied from empirical and simulation perspectives, there is a notable lack of analytical work for some variants. To gain more insight into the dynamics of these algorithms, we: (1) propose a general modeling scheme consisting of a set of functional differential equations of retarded type (RFDEs) and of the congestion window as a function of time; (2) apply this scheme to TCP Reno and demonstrate its equivalence to a previous, well known model for TCP Reno; (3) show an application of the new framework to the widely-deployed congestion control algorithm TCP CUBIC, for which analytical models are few and limited; and (4) validate the model using simulations. Our modeling framework yields a fluid model for TCP CUBIC. From a theoretical analysis of this model, we discover that TCP CUBIC is locally uniformly asymptotically stable -- a property of the algorithm previously unknown.
  • Remote quantum entanglement can enable numerous applications including distributed quantum computation, secure communication, and precision sensing. In this paper, we consider how a quantum network-nodes equipped with limited quantum processing capabilities connected via lossy optical links-can distribute high-rate entanglement simultaneously between multiple pairs of users (multiple flows). We develop protocols for such quantum "repeater" nodes, which enable a pair of users to achieve large gains in entanglement rates over using a linear chain of quantum repeaters, by exploiting the diversity of multiple paths in the network. Additionally, we develop repeater protocols that enable multiple user pairs to generate entanglement simultaneously at rates that can far exceed what is possible with repeaters time sharing among assisting individual entanglement flows. Our results suggest that the early-stage development of quantum memories with short coherence times and implementations of probabilistic Bell-state measurements can have a much more profound impact on quantum networks than may be apparent from analyzing linear repeater chains. This framework should spur the development of a general quantum network theory, bringing together quantum memory physics, quantum information theory, and computer network theory.
  • Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the necessity of UNI but still enjoy the similar benefits as RWwJ. We observe that many graphs under study, called target graphs, do not exist in isolation. In many situations, a target graph is related to an auxiliary graph and a bipartite graph, and they together form a better connected {\em two-layered network structure}. This new viewpoint brings extra benefits to graph sampling: if directly sampling a target graph is difficult, we can sample it indirectly with the assistance of the other two graphs. We propose a series of new graph sampling techniques by exploiting such a two-layered network structure to estimate target graph characteristics. Experiments conducted on both synthetic and real-world networks demonstrate the effectiveness and usefulness of these new techniques.
  • In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., people send/receive more greetings to/from their friends on holidays than regular days. We also observe that some videos or hashtags suddenly go viral through people's sharing on online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with sudden surges of user interactions in networks, which we call "bursts" in this work. We uncover that the emergence of a burst is accompanied with the formation of triangles in some properly defined networks. This finding motivates us to propose a new and robust method to detect bursts on OSNs. We first introduce a new measure, "triadic cardinality distribution", corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, in a network. We show that this distribution not only changes when a burst occurs, but it also has a robustness property that it is immunized against common spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can more reliably detect bursts than simply counting interactions on OSNs. To avoid handling massive activity data generated by OSN users during the triadic tracking, we design an efficient "sample-estimate" framework to provide maximum likelihood estimate on the triadic cardinality distribution. We propose several sampling methods, and provide insights about their performance difference, through both theoretical analysis and empirical experiments on real world networks.
  • Recent work has established that when transmitter Alice wishes to communicate reliably to recipient Bob without detection by warden Willie, with additive white Gaussian noise (AWGN) channels between all parties, communication is limited to $\mathcal{O}(\sqrt{n})$ bits in $n$ channel uses. However, this assumes Willie has an accurate statistical characterization of the channel. When Willie has uncertainty about such and his receiver is limited to a threshold test on the received power, Alice can transmit covertly with a power that does not decrease with $n$, thus conveying $\mathcal{O}(n)$ bits covertly and reliably in $n$ uses of an AWGN channel. Here, we consider covert communication of $\mathcal{O}(n)$ bits in $n$ channel uses while generalizing the environment and removing any restrictions on Willie's receiver. We assume an uninformed "jammer" is present to help Alice, and we consider AWGN and block fading channels. In some scenarios, Willie's optimal detector is a threshold test on the received power. When the channel between the jammer and Willie has multiple fading blocks per codeword, a threshold test on the received power is not optimal. However, we establish that Alice can remain covert with a transmit power that does not decrease with $n$ even when Willie employs an optimal detector.
  • Cumulative advantage (CA) refers to the notion that accumulated resources foster the accumulation of further resources in competitions, a phenomenon that has been empirically observed in various contexts. The oldest and arguably simplest mathematical model that embodies this general principle is the P\'olya urn process, which finds applications in a myriad of problems. The original model captures the dynamics of competitions between two equally fit agents under linear CA effects, which can be readily generalized to incorporate different fitnesses and nonlinear CA effects. We study two statistics of competitions under the generalized model, namely duration (i.e., time of the last tie) and intensity (i.e., number of ties). We give rigorous mathematical characterizations of the tail distributions of both duration and intensity under the various regimes for fitness and nonlinearity, which reveal very interesting behaviors. For example, fitness superiority induces much shorter competitions in the sublinear regime while much longer competitions in the superlinear regime. Our findings can shed light on the application of P\'olya urn processes in more general contexts where fitness and nonlinearity may be present.
  • Active search (AS) on graphs focuses on collecting certain labeled nodes (targets) given global knowledge of the network topology and its edge weights under a query budget. However, in most networks, nodes, topology and edge weights are all initially unknown. We introduce selective harvesting, a variant of AS where the next node to be queried must be chosen among the neighbors of the current queried node set; the available training data for deciding which node to query is restricted to the subgraph induced by the queried set (and their node attributes) and their neighbors (without any node or edge attributes). Therefore, selective harvesting is a sequential decision problem, where we must decide which node to query at each step. A classifier trained in this scenario suffers from a tunnel vision effect: without recourse to independent sampling, the urge to query promising nodes forces classifiers to gather increasingly biased training data, which we show significantly hurts the performance of AS methods and standard classifiers. We find that it is possible to collect a much larger set of targets by using multiple classifiers, not by combining their predictions as an ensemble, but switching between classifiers used at each step, as a way to ease the tunnel vision effect. We discover that switching classifiers collects more targets by (a) diversifying the training data and (b) broadening the choices of nodes that can be queried next. This highlights an exploration, exploitation, and diversification trade-off in our problem that goes beyond the exploration and exploitation duality found in classic sequential decision problems. From these observations we propose D3TS, a method based on multi-armed bandits for non-stationary stochastic processes that enforces classifier diversity, matching or exceeding the performance of competing methods on seven real network datasets in our evaluation.
  • Dynamic networks are increasingly being usedd to model real world datasets. A challenging task in their analysis is to detect and characterize clusters. It is useful for analyzing real-world data such as detecting evolving communities in networks. We propose a temporal clustering framework based on a set of network generative models to address this problem. We use PARAFAC decomposition to learn network models from datasets.We then use $K$-means for clustering, the Silhouette criterion to determine the number of clusters, and a similarity score to order the clusters and retain the significant ones. In order to address the time-dependent aspect of these clusters, we propose a segmentation algorithm to detect their formations, dissolutions and lifetimes. Synthetic networks with ground truth and real-world datasets are used to test our method against state-of-the-art, and the results show that our method has better performance in clustering and lifetime detection than previous methods.
  • In this paper, we consider the problem of allocating cache resources among multiple content providers. The cache can be partitioned into slices and each partition can be dedicated to a particular content provider, or shared among a number of them. It is assumed that each partition employs the LRU policy for managing content. We propose utility-driven partitioning, where we associate with each content provider a utility that is a function of the hit rate observed by the content provider. We consider two scenarios: i)~content providers serve disjoint sets of files, ii)~there is some overlap in the content served by multiple content providers. In the first case, we prove that cache partitioning outperforms cache sharing as cache size and numbers of contents served by providers go to infinity. In the second case, It can be beneficial to have separate partitions for overlapped content. In the case of two providers, it is usually always beneficial to allocate a cache partition to serve all overlapped content and separate partitions to serve the non-overlapped contents of both providers. We establish conditions when this is true asymptotically but also present an example where it is not true asymptotically. We develop online algorithms that dynamically adjust partition sizes in order to maximize the overall utility and prove that they converge to optimal solutions, and through numerical evaluations, we show they are effective.
  • Large organizations often have users in multiple sites which are connected over the Internet. Since resources are limited, communication between these sites needs to be carefully orchestrated for the most benefit to the organization. We present a Mission-optimized Overlay Network (MON), a hybrid overlay network architecture for maximizing utility to the organization. We combine an offline and an online system to solve non-concave utility maximization problems. The offline tier, the Predictive Flow Optimizer (PFO), creates plans for routing traffic using a model of network conditions. The online tier, MONtra, is aware of the precise local network conditions and is able to react quickly to problems within the network. Either tier alone is insufficient. The PFO may take too long to react to network changes. MONtra only has local information and cannot optimize non-concave mission utilities. However, by combining the two systems, MON is robust and achieves near-optimal utility under a wide range of network conditions. While best-effort overlay networks are well studied, our work is the first to design overlays that are optimized for mission utility.
  • Any quantum algorithm can be implemented by an adaptive sequence of single node measurements on an entangled cluster of qubits in a square lattice topology. Photons are a promising candidate for encoding qubits but assembling a photonic entangled cluster with linear optical elements relies on probabilistic operations. Given a supply of $n$-photon-entangled microclusters, using a linear optical circuit and photon detectors, one can assemble a random entangled state of photons that can be subsequently "renormalized" into a logical cluster for universal quantum computing. In this paper, we prove that there is a fundamental tradeoff between $n$ and the minimum success probability $\lambda_c^{(n)}$ that each two-photon linear-optical fusion operation must have, in order to guarantee that the resulting state can be renormalized: $\lambda_c^{(n)} \ge 1/(n-1)$. We present a new way of formulating this problem where $\lambda_c^{(n)}$ is the bond percolation threshold of a logical graph and provide explicit constructions to produce a percolated cluster using $n=3$ photon microclusters (GHZ states) as the initial resource. We settle a heretofore open question by showing that a renormalizable cluster can be created with $3$-photon microclusters over a 2D graph without feedforward, which makes the scheme extremely attractive for an integrated-photonic realization. We also provide lattice constructions, which show that $0.5 \le \lambda_c^{(3)} \le 0.5898$, improving on a recent result of $\lambda_c^{(3)} \le 0.625$. Finally, we discuss how losses affect the bounds on the threshold, using loss models inspired by a recently-proposed method to produce photonic microclusters using quantum dot emitters.
  • Content delivery networks deliver much of the web and video content in the world by deploying a large distributed network of servers. We model and analyze a simple paradigm for client-side server selection that is commonly used in practice where each user independently measures the performance of a set of candidate servers and selects the one that performs the best. For web (resp., video) delivery, we propose and analyze a simple algorithm where each user randomly chooses two or more candidate servers and selects the server that provided the best hit rate (resp., bit rate). We prove that the algorithm converges quickly to an optimal state where all users receive the best hit rate (resp., bit rate), with high probability. We also show that if each user chose just one random server instead of two, some users receive a hit rate (resp., bit rate) that tends to zero. We simulate our algorithm and evaluate its performance with varying choices of parameters, system load, and content popularity.
  • Consider a channel where authorized transmitter Jack sends packets to authorized receiver Steve according to a Poisson process with rate $\lambda$ packets per second for a time period $T$. Suppose that covert transmitter Alice wishes to communicate information to covert receiver Bob on the same channel without being detected by a watchful adversary Willie. We consider two scenarios. In the first scenario, we assume that warden Willie cannot look at packet contents but rather can only observe packet timings, and Alice must send information by inserting her own packets into the channel. We show that the number of packets that Alice can covertly transmit to Bob is on the order of the square root of the number of packets that Jack transmits to Steve; conversely, if Alice transmits more than that, she will be detected by Willie with high probability. In the second scenario, we assume that Willie can look at packet contents but that Alice can communicate across an $M/M/1$ queue to Bob by altering the timings of the packets going from Jack to Steve. First, Alice builds a codebook, with each codeword consisting of a sequence of packet timings to be employed for conveying the information associated with that codeword. However, to successfully employ this codebook, Alice must always have a packet to send at the appropriate time. Hence, leveraging our result from the first scenario, we propose a construction where Alice covertly slows down the packet stream so as to buffer packets to use during a succeeding codeword transmission phase. Using this approach, Alice can covertly and reliably transmit $\mathcal{O}(\lambda T)$ covert bits to Bob in time period $T$ over an $M/M/1$ queue with service rate $\mu > \lambda$.
  • Covert communication, also known as low probability of detection (LPD) communication, prevents the adversary from knowing that a communication is taking place. Recent work has demonstrated that, in a three-party scenario with a transmitter (Alice), intended recipient (Bob), and adversary (Warden Willie), the maximum number of bits that can be transmitted reliably from Alice to Bob without detection by Willie, when additive white Gaussian noise (AWGN) channels exist between all parties, is on the order of the square root of the number of channel uses. In this paper, we begin consideration of network scenarios by studying the case where there are additional "friendly" nodes present in the environment that can produce artificial noise to aid in hiding the communication. We establish achievability results by considering constructions where the system node closest to the warden produces artificial noise and demonstrate a significant improvement in the throughput achieved covertly, without requiring close coordination between Alice and the noise-generating node. Conversely, under mild restrictions on the communication strategy, we demonstrate no higher covert throughput is possible. Extensions to the consideration of the achievable covert throughput when multiple wardens randomly located in the environment collaborate to attempt detection of the transmitter are also considered.
  • Security and privacy are major concerns in modern communication networks. In recent years, the information theory of covert communications, where the very presence of the communication is undetectable to a watchful and determined adversary, has been of great interest. This emerging body of work has focused on additive white Gaussian noise (AWGN), discrete memoryless channels (DMCs), and optical channels. In contrast, our recent work introduced the information-theoretic limits for covert communications over packet channels whose packet timings are governed by a Poisson point process. However, actual network packet arrival times do not generally conform to the Poisson process assumption, and thus here we consider the extension of our work to timing channels characterized by more general renewal processes of rate $\lambda$. We consider two scenarios. In the first scenario, the source of the packets on the channel cannot be authenticated by Willie, and therefore Alice can insert packets into the channel. We show that if the total number of transmitted packets by Jack is $N$, Alice can covertly insert $\mathcal{O}\left(\sqrt{N}\right)$ packets and, if she transmits more, she will be detected by Willie. In the second scenario, packets are authenticated by Willie but we assume that Alice and Bob share a secret key; hence, Alice alters the timings of the packets according to a pre-shared codebook with Bob to send information to him over a $G/M/1$ queue with service rate $\mu>\lambda$. We show that Alice can covertly and reliably transmit $\mathcal{O}(N)$ bits to Bob when the total number of packets sent from Jack to Steve is $N$.
  • The recent square root law (SRL) for covert communication demonstrates that Alice can reliably transmit $\mathcal{O}(\sqrt{n})$ bits to Bob in $n$ uses of an additive white Gaussian noise (AWGN) channel while keeping ineffective any detector employed by the adversary; conversely, exceeding this limit either results in detection by the adversary with high probability or non-zero decoding error probability at Bob. This SRL is under the assumption that the adversary knows when Alice transmits (if she transmits); however, in many operational scenarios he does not know this. Hence, here we study the impact of the adversary's ignorance of the time of the communication attempt. We employ a slotted AWGN channel model with $T(n)$ slots each containing $n$ symbol periods, where Alice may use a single slot out of $T(n)$. Provided that Alice's slot selection is secret, the adversary needs to monitor all $T(n)$ slots for possible transmission. We show that this allows Alice to reliably transmit $\mathcal{O}(\min\{\sqrt{n\log T(n)},n\})$ bits to Bob (but no more) while keeping the adversary's detector ineffective. To achieve this gain over SRL, Bob does not have to know the time of transmission provided $T(n)<2^{c_{\rm T}n}$, $c_{\rm T}=\mathcal{O}(1)$.
  • We present diffusion-convolutional neural networks (DCNNs), a new model for graph-structured data. Through the introduction of a diffusion-convolution operation, we show how diffusion-based representations can be learned from graph-structured data and used as an effective basis for node classification. DCNNs have several attractive qualities, including a latent representation for graphical data that is invariant under isomorphism, as well as polynomial-time prediction and learning that can be represented as tensor operations and efficiently implemented on the GPU. Through several experiments with real structured datasets, we demonstrate that DCNNs are able to outperform probabilistic relational models and kernel-on-graph methods at relational node classification tasks.