• Password Cracking: The Effect of Hash Function Bias on the Average Guesswork(1608.02132)

March 3, 2019 cs.IT, math.IT, cs.CR
Modern authentication systems store hashed values of passwords of users using cryptographic hash functions. Therefore, to crack a password an attacker needs to guess a hash function input that is mapped to the hashed value, as opposed to the password itself. We call a hash function that maps the same number of inputs to each bin, as \textbf{unbiased}. However, cryptographic hash functions in use have not been proven to be unbiased (i.e., they may have an unequal number of inputs mapped to different bins). A cryptographic hash function has the property that it is computationally difficult to find an input mapped to a bin. In this work we introduce a structured notion of biased hash functions for which we analyze the average guesswork under certain types of brute force attacks. This work shows that the level of security depends on the set of hashed values of valid users as well as the statistical profile of a hash function, resulting from bias. We examine the average guesswork conditioned on the set of hashed values, and model the statistical profile through the empirical distribution of the number of inputs that are mapped to a bin. In particular, we focus on a class of statistical profiles (capturing the bias) , which we call type-class statistical profiles, that has an empirical distribution related to the probability of the type classes defined in the method of types. For such profiles, we show that the average guesswork is related to basic measures in information theory such as entropy and divergence. We use this to show that the effect of bias on the conditional average guesswork is limited compared to other system parameters such as the number of valid users who store their hashed passwords in the system.
• Degrees of Freedom of Cache-Aided Wireless Interference Networks(1606.03175)

Aug. 2, 2018 cs.IT, math.IT
We study the role of caches in wireless interference networks. We focus on content caching and delivery across a Gaussian interference network, where both transmitters and receivers are equipped with caches. We provide a constant-factor approximation of the system's degrees of freedom (DoF), for arbitrary number of transmitters, number of receivers, content library size, receiver cache size, and transmitter cache size (as long as the transmitters combined can store the entire content library among them). We demonstrate approximate optimality with respect to information-theoretic bounds that do not impose any restrictions on the caching and delivery strategies. Our characterization reveals three key insights. First, the approximate DoF is achieved using a strategy that separates the physical and network layers. This separation architecture is thus approximately optimal. Second, we show that increasing transmitter cache memory beyond what is needed to exactly store the entire library between all transmitters does not provide more than a constant-factor benefit to the DoF. A consequence is that transmit zero-forcing is not needed for approximate optimality. Third, we derive an interesting trade-off between the receiver memory and the number of transmitters needed for approximately maximal performance. In particular, if each receiver can store a constant fraction of the content library, then only a constant number of transmitters are needed. Our solution to the caching problem requires formulating and solving a new communication problem, the symmetric multiple multicast X-channel, for which we provide an exact DoF characterization.
• Approximate Capacity of Fast Fading Interference Channels with No Instantaneous CSIT(1706.03659)

June 3, 2018 cs.IT, math.IT
• Caching with Partial Adaptive Matching(1705.03852)

May 7, 2018 cs.IT, math.IT
We study the caching problem when we are allowed to match each user to one of a subset of caches after its request is revealed. We focus on non-uniformly popular content, specifically when the file popularities obey a Zipf distribution. We study two extremal schemes, one focusing on coded server transmissions while ignoring matching capabilities, and the other focusing on adaptive matching while ignoring potential coding opportunities. We derive the rates achieved by these schemes and characterize the regimes in which one outperforms the other. We also compare them to information-theoretic outer bounds, and finally propose a hybrid scheme that generalizes ideas from the two schemes and performs at least as well as either of them in most memory regimes.
• Leveraging mm-Wave Communication for Security(1803.08188)

March 24, 2018 cs.CR
The fact that Millimeter Wave (mmWave) communication needs to be directional is usually perceived as a challenge; in this paper we argue that it enables efficient secret key sharing that are unconditionally secure from passive eavesdroppers, by building on packet erasures. We showcase the potential of our approach in two setups: beam sweeping to align receiver/transmitter pairs, and vehicle platooning. We show that in the first case, we can establish a few hundred secret bits at virtually no additional transmission cost; and in the second case, we can potentially establish keys in the order of tenths of Mbps. These first results are based on some simplifying assumptions, yet we believe they give incentives to further explore such techniques.
• Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning(1803.05397)

March 14, 2018 cs.DC, math.OC, cs.LG, stat.ML
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at every iteration, whose loss is compensated by the embedded redundancy. We show that oblivious application of several popular optimization algorithms on encoded data, including gradient descent, L-BFGS, proximal gradient under data parallelism, and coordinate descent under model parallelism, converge to either approximate or exact solutions of the original problem when stragglers are treated as erasures. These convergence results are deterministic, i.e., they establish sample path convergence for arbitrary sequences of delay patterns or distributions on the nodes, and are independent of the tail behavior of the delay distribution. We demonstrate that equiangular tight frames have desirable properties as encoding matrices, and propose efficient mechanisms for encoding large-scale data. We implement the proposed technique on Amazon EC2 clusters, and demonstrate its performance over several learning problems, including matrix factorization, LASSO, ridge regression and logistic regression, and compare the proposed method with uncoded, asynchronous, and data replication strategies.
• Models and information-theoretic bounds for nanopore sequencing(1705.11154)

Feb. 17, 2018 cs.IT, math.IT
Nanopore sequencing is an emerging new technology for sequencing DNA, which can read long fragments of DNA (~50,000 bases) in contrast to most current short-read sequencing technologies which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore and current variations are measured. The DNA sequence is inferred from this observed current pattern using an algorithm called a base-caller. In this paper, we propose a mathematical model for the "channel" from the input DNA sequence to the observed current, and calculate bounds on the information extraction capacity of the nanopore sequencer. This model incorporates impairments like (non-linear) inter-symbol interference, deletions, as well as random response. These information bounds have two-fold application: (1) The decoding rate with a uniform input distribution can be used to calculate the average size of the plausible list of DNA sequences given an observed current trace. This bound can be used to benchmark existing base-calling algorithms, as well as serving a performance objective to design better nanopores. (2) When the nanopore sequencer is used as a reader in a DNA storage system, the storage capacity is quantified by our bounds.
• On Capacity of Non-Coherent Diamond Networks(1802.02667)

Feb. 7, 2018 cs.IT, math.IT
There is a vast body of work on the capacity bounds for a "coherent" wireless network, where the network channel gains are known, at least at the destination. However, there has been much less attention to the case where the network parameters (channel gains) are unknown to everyone, i.e., the non-coherent wireless network capacity. In this paper, we study the generalized degrees of freedom (gDoF) of the block-fading non-coherent diamond (parallel relay) network with asymmetric distributions of link strengths, and a coherence time of T symbol duration. We first derive an outer bound for this channel and then derive the optimal signaling structure for this outer bound. Using the optimal signaling structure we solve the outer bound optimization problem for gDoF. Using insights from our outer bound signaling solution, we devise an achievability strategy based on a novel scheme that we call train-scale quantize-map-forward. This uses training in the links from source to relays, scaling and quantizing at relays combined with non-training based schemes. We show the optimality of this scheme with respect to the outer bound in terms of gDof. In non-coherent point-to-point MIMO, where the fading channel is unknown to transmitter and receiver, an important trade-off between communication and channel learning was revealed by Zheng and Tse, by demonstrating that not all antennas available might be used as it is sub-optimal to learn all their channel parameters. Our results in this paper for the diamond network demonstrates that in certain regimes the optimal scheme uses a sub-network, demonstrating a similar trade-off between channel learning and communications. However, in other regimes it is useful to use the entire network and not use training at all in the signaling, as traditional training based schemes are sub-optimal in these regimes.
• Straggler Mitigation in Distributed Optimization Through Data Encoding(1711.04969)

Jan. 22, 2018 cs.IT, math.IT, cs.DC, cs.LG, stat.ML
Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an alternate approach where we embed the redundancy directly in the data itself, and allow the computation to proceed completely oblivious to encoding. We propose several encoding schemes, and demonstrate that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration. Moreover, this approximation can be controlled by the amount of redundancy and the number of nodes used in each iteration. We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.
• A Distortion Based Approach for Protecting Inferences(1703.00482)

May 7, 2017 cs.IT, math.IT
Eavesdropping attacks in inference systems aim to learn not the raw data, but the system inferences to predict and manipulate system actions. We argue that conventional information security measures can be ambiguous on the adversary's estimation abilities, and adopt instead a distortion based framework that enables to operate over a metric space. We show that requiring perfect distortion-based security is more frugal than requiring perfect information-theoretic secrecy even for block length one codes, offering in some cases unbounded gains. Within this framework, we design algorithms that enable to efficiently use shared randomness, and show that each bit of shared random key is exponentially useful in security.
• Enhancing Multiuser MIMO Through Opportunistic D2D Cooperation(1604.06151)

March 6, 2017 cs.IT, math.IT, cs.NI
We propose a cellular architecture that combines multiuser MIMO (MU-MIMO) downlink with opportunistic use of unlicensed ISM bands to establish device-to-device (D2D) cooperation. The architecture consists of a physical-layer cooperation scheme based on forming downlink virtual MIMO channels through D2D relaying, and a novel resource allocation strategy for such D2D-enabled networks. We prove the approximate optimality of the physical-layer scheme, and demonstrate that such cooperation boosts the effective SNR of the weakest user in the system, especially in the many-user regime, due to multiuser diversity. To harness this physical-layer scheme, we formulate the cooperative user scheduling and relay selection problem using the network utility maximization framework. For such a cooperative network, we propose a novel utility metric that jointly captures fairness in throughput and the cost of relaying in the system. We propose a joint user scheduling and relay selection algorithm, which we prove to be asymptotically optimal. We study the architecture through system-level simulations over a wide range of scenarios. The highlight of these simulations is an approximately $6$x improvement in data rate for cell-edge (bottom fifth-percentile) users (over the state-of-the-art SU-MIMO) while still improving the overall throughput, and taking into account various system constraints.
• Matched Multiuser Gaussian Source-Channel Communications via Uncoded Schemes(1502.05428)

March 1, 2017 cs.IT, math.IT
We investigate whether uncoded schemes are optimal for Gaussian sources on multiuser Gaussian channels. Particularly, we consider two problems: the first is to send correlated Gaussian sources on a Gaussian broadcast channel where each receiver is interested in reconstructing only one source component (or one specific linear function of the sources) under the mean squared error distortion measure; the second is to send correlated Gaussian sources on a Gaussian multiple-access channel, where each transmitter observes a noisy combination of the source, and the receiver wishes to reconstruct the individual source components (or individual linear functions) under the mean squared error distortion measure. It is shown that when the channel parameters match certain general conditions, the induced distortion tuples are on the boundary of the achievable distortion region, and thus optimal. Instead of following the conventional approach of attempting to characterize the achievable distortion region, we ask the question whether and how a match can be effectively determined. This decision problem formulation helps to circumvent the difficult optimization problem often embedded in region characterization problems, and it also leads us to focus on the critical conditions in the outer bounds that make the inequalities become equalities, which effectively decouple the overall problem into several simpler sub-problems. Optimality results previously unknown in the literature are obtained using this novel approach. As a byproduct of the investigation, novel outer bounds are derived for these two problems.
• Design and Analysis of Stability-Guaranteed PUFs(1701.05637)

Jan. 19, 2017 cs.CR
The lack of stability is one of the major limitations that constrains PUF from being put in widespread practical use. In this paper, we propose a weak PUF and a strong PUF that are both completely stable with 0% intra-distance. These PUFs are called Locally Enhanced Defectivity (LED)PUF. The source of randomness of a LEDPUF is extracted from locally enhance defectivity without affecting other parts of the chip. A LEDPUF is a pure functional PUF that does not require any kinds of correction schemes as conventional parametric PUFs do. A weak LEDPUF is constructed by forming arrays of Directed Self Assembly (DSA) random connections is presented, and the strong LEDPUF is implemented by using the weak LEDPUF as the key of a keyed-hash message authentication code (HMAC). Our simulation and statistical results show that the entropy of the weak LEDPUF bits is close to ideal, and the inter-distances of both weak and strong LEDPUFs are about 50%, which means that these LEDPUFs are not only stable but also unique. We develop a new unified framework for evaluating the level of security of PUFs, based on password security, by using information theoretic tools of guesswork. The guesswork model allows to quantitatively compare, with a single unified metric, PUFs with varying levels of stability, bias and available side information. In addition, it generalizes other measures to evaluate the security level such as min-entropy and mutual information. We evaluate guesswork-based security of some measured SRAM and Ring Oscillator PUFs as an example and compare them with LEDPUF to show that stability has a more severe impact on the PUF security than biased responses. Furthermore, we find the guesswork of three new problems: Guesswork under probability of attack failure, the guesswork of idealized version of a message authentication code, and the guesswork of strong PUFs that are used for authentication.
• Consistency in the face of change: an adaptive approach to physical layer cooperation(1612.02062)

Dec. 6, 2016 cs.NI
Most existing works on physical-layer (PHY) cooperation (beyond routing) focus on how to best use a given, static relay network--while wireless networks are anything but static. In this paper, we pose a different set of questions: given that we have multiple devices within range, which relay(s) do we use for PHY cooperation, to maintain a consistent target performance? How can we efficiently adapt, as network conditions change? And how important is it, in terms of performance, to adapt? Although adapting to the best path when routing is a well understood problem, how to do so over PHY cooperation networks is an open question. Our contributions are: (1) We demonstrate via theoretical evaluation, a diminishing returns trend as the number of deployed relays increases. (2) Using a simple algorithm based on network metrics, we efficiently select the sub-network to use at any given time to maintain a target reliability. (3) When streaming video from Netflix, we experimentally show (using measurements from a WARP radio testbed employing DIQIF relaying) that our adaptive PHY cooperation scheme provides a throughput gain of 2x over nonadaptive PHY schemes, and a gain of 6x over genie-aided IP-level adaptive routing.
• Attack Resilience and Recovery using Physical Challenge Response Authentication for Active Sensors Under Integrity Attacks(1605.02062)

May 12, 2016 cs.CR
Embedded sensing systems are pervasively used in life- and security-critical systems such as those found in airplanes, automobiles, and healthcare. Traditional security mechanisms for these sensors focus on data encryption and other post-processing techniques, but the sensors themselves often remain vulnerable to attacks in the physical/analog domain. If an adversary manipulates a physical/analog signal prior to digitization, no amount of digital security mechanisms after the fact can help. Fortunately, nature imposes fundamental constraints on how these analog signals can behave. This work presents PyCRA, a physical challenge-response authentication scheme designed to protect active sensing systems against physical attacks occurring in the analog domain. PyCRA provides security for active sensors by continually challenging the surrounding environment via random but deliberate physical probes. By analyzing the responses to these probes, and by using the fact that the adversary cannot change the underlying laws of physics, we provide an authentication mechanism that not only detects malicious attacks but provides resilience against them. We demonstrate the effectiveness of PyCRA through several case studies using two sensing systems: (1) magnetic sensors like those found wheel speed sensors in robotics and automotive, and (2) commercial RFID tags used in many security-critical applications. Finally, we outline methods and theoretical proofs for further enhancing the resilience of PyCRA to active attacks by means of a confusion phase---a period of low signal to noise ratio that makes it more difficult for an attacker to correctly identify and respond to PyCRA's physical challenges. In doing so, we evaluate both the robustness and the limitations of PyCRA, concluding by outlining practical considerations as well as further applications for the proposed authentication mechanism.
• A Layered Caching Architecture for the Interference Channel(1605.01668)

May 5, 2016 cs.IT, math.IT
Recent work has studied the benefits of caching in the interference channel, particularly by placing caches at the transmitters. In this paper, we study the two-user Gaussian interference channel in which caches are placed at both the transmitters and the receivers. We propose a separation strategy that divides the physical and network layers. While a natural separation approach might be to abstract the physical layer into several independent bit pipes at the network layer, we argue that this is inefficient. Instead, the separation approach we propose exposes interacting bit pipes at the network layer, so that the receivers observe related (yet not identical) quantities. We find the optimal strategy within this layered architecture, and we compute the degrees-of-freedom it achieves. Finally, we show that separation is optimal in regimes where the receiver caches are large.
• Wiretapped Oblivious Transfer(1604.05668)

April 20, 2016 cs.IT, math.IT, cs.CR
In this paper, we study the problem of obtaining $1$-of-$2$ string oblivious transfer (OT) between users Alice and Bob, in the presence of a passive eavesdropper Eve. The resource enabling OT in our setup is a noisy broadcast channel from Alice to Bob and Eve. Apart from the OT requirements between the users, Eve is not allowed to learn anything about the users' inputs. When Alice and Bob are honest-but-curious and the noisy broadcast channel is made up of two independent binary erasure channels (connecting Alice-Bob and Alice-Eve), we derive the $1$-of-$2$ string OT capacity for both $2$-privacy (when Eve can collude with either Alice or Bob) and $1$-privacy (when no such collusion is allowed). We generalize these capacity results to $1$-of-$N$ string OT and study other variants of this problem. When Alice and/or Bob are malicious, we present a different scheme based on interactive hashing. This scheme is shown to be optimal for certain parameter regimes. We present a new formulation of multiple, simultaneous OTs between Alice-Bob and Alice-Cathy. For this new setup, we present schemes and outer bounds that match in all but one regime of parameters. Finally, we consider the setup where the broadcast channel is made up of a cascade of two independent binary erasure channels (connecting Alice-Bob and Bob-Eve) and $1$-of-$2$ string OT is desired between Alice and Bob with $1$-privacy. For this setup, we derive an upper and lower bound on the $1$-of-$2$ string OT capacity which match in one of two possible parameter regimes.
• Capacity Results for Multicasting Nested Message Sets over Combination Networks(1411.2417)

Jan. 7, 2016 cs.IT, math.IT
• Coded Caching for Multi-level Popularity and Access(1404.6563)

Dec. 31, 2015 cs.IT, math.IT
To address the exponentially rising demand for wireless content, use of caching is emerging as a potential solution. It has been recently established that joint design of content delivery and storage (coded caching) can significantly improve performance over conventional caching. Coded caching is well suited to emerging heterogeneous wireless architectures which consist of a dense deployment of local-coverage wireless access points (APs) with high data rates, along with sparsely-distributed, large-coverage macro-cell base stations (BS). This enables design of coded caching-and-delivery schemes that equip APs with storage, and place content in them in a way that creates coded-multicast opportunities for combining with macro-cell broadcast to satisfy users even with different demands. Such coded-caching schemes have been shown to be order-optimal with respect to the BS transmission rate, for a system with single-level content, i.e., one where all content is uniformly popular. In this work, we consider a system with non-uniform popularity content which is divided into multiple levels, based on varying degrees of popularity. The main contribution of this work is the derivation of an order-optimal scheme which judiciously shares cache memory among files with different popularities. To show order-optimality we derive new information-theoretic lower bounds, which use a sliding-window entropy inequality, effectively creating a non-cutset bound. We also extend the ideas to when users can access multiple caches along with the broadcast. Finally we consider two extreme cases of user distribution across caches for the multi-level popularity model: a single user per cache (single-user setup) versus a large number of users per cache (multi-user setup), and demonstrate a dichotomy in the order-optimal strategies for these two extreme cases.
• Secure State Estimation against Sensor Attacks in the Presence of Noise(1510.02462)

Oct. 12, 2015 cs.IT, math.IT, cs.CR, math.OC, cs.SY
We consider the problem of estimating the state of a noisy linear dynamical system when an unknown subset of sensors is arbitrarily corrupted by an adversary. We propose a secure state estimation algorithm, and derive (optimal) bounds on the achievable state estimation error given an upper bound on the number of attacked sensors. The proposed state estimator involves Kalman filters operating over subsets of sensors to search for a sensor subset which is reliable for state estimation. To further improve the subset search time, we propose Satisfiability Modulo Theory based techniques to exploit the combinatorial nature of searching over sensor subsets. Finally, as a result of independent interest, we give a coding theoretic view of attack detection and state estimation against sensor attacks in a noiseless dynamical system.
• Gaussian Interference Channel with Intermittent Feedback(1408.4590)

Aug. 30, 2015 cs.IT, math.IT
We investigate how to exploit intermittent feedback for interference management by studying the two-user Gaussian interference channel (IC). We approximately characterize (within a universal constant) the capacity region for the Gaussian IC with intermittent feedback. We exactly characterize the the capacity region of the linear deterministic version of the problem, which gives us insight into the Gaussian problem. We find that the characterization only depends on the forward channel parameters and the marginal probability distribution of each feedback link. The result shows that passive and unreliable feedback can be harnessed to provide multiplicative capacity gain in Gaussian interference channels. We find that when the feedback links are active with sufficiently large probabilities, the perfect feedback sum-capacity is achieved to within a constant gap. In contrast to other schemes developed for interference channel with feedback, our achievable scheme makes use of quantize-map-and-forward to relay the information obtained through feedback, performs forward decoding, and does not use structured codes. We also develop new outer bounds enabling us to obtain the (approximate) characterization of the capacity region.
• Secure State Estimation: Optimal Guarantees against Sensor Attacks in the Presence of Noise(1504.05566)

April 23, 2015 cs.IT, math.IT, cs.CR, math.OC, cs.SY
Motivated by the need to secure cyber-physical systems against attacks, we consider the problem of estimating the state of a noisy linear dynamical system when a subset of sensors is arbitrarily corrupted by an adversary. We propose a secure state estimation algorithm and derive (optimal) bounds on the achievable state estimation error. In addition, as a result of independent interest, we give a coding theoretic interpretation for prior work on secure state estimation against sensor attacks in a noiseless dynamical system.