• We propose a new latent Boolean feature model for complex networks that captures different types of node interactions and network communities. The model is based on a new concept in graph theory, termed the Boolean intersection representation of a graph, which generalizes the notion of an intersection representation. We mostly focus on one form of Boolean intersection, termed cointersection, and describe how to use this representation to deduce node feature sets and their communities. We derive several general bounds on the minimum number of features used in cointersection representations and discuss graph families for which exact cointersection characterizations are possible. Our results also include algorithms for finding optimal and approximate cointersection representations of a graph.
  • We introduce a new family of erasure codes, called group decodable code (GDC), for distributed storage system. Given a set of design parameters {\alpha; \beta; k; t}, where k is the number of information symbols, each codeword of an (\alpha; \beta; k; t)-group decodable code is a t-tuple of strings, called buckets, such that each bucket is a string of \beta symbols that is a codeword of a [\beta; \alpha] MDS code (which is encoded from \alpha information symbols). Such codes have the following two properties: (P1) Locally Repairable: Each code symbol has locality (\alpha; \beta-\alpha + 1). (P2) Group decodable: From each bucket we can decode \alpha information symbols. We establish an upper bound of the minimum distance of (\alpha; \beta; k; t)-group decodable code for any given set of {\alpha; \beta; k; t}; We also prove that the bound is achievable when the coding field F has size |F| > n-1 \choose k-1.
  • We consider the complexities of repair algorithms for locally repairable codes and propose a class of codes that repair single node failures using addition operations only, or codes with addition based repair. We construct two families of codes with addition based repair. The first family attains distance one less than the Singleton-like upper bound, while the second family attains the Singleton-like upper bound.
  • We consider a simple multiple access network (SMAN), where $k$ sources of unit rates transmit their data to a common sink via $n$ relays. Each relay is connected to the sink and to certain sources. A coding scheme (for the relays) is weakly secure if a passive adversary who eavesdrops on less than $k$ relay-sink links cannot reconstruct the data from each source. We show that there exists a weakly secure maximum distance separable (MDS) coding scheme for the relays if and only if every subset of $\ell$ relays must be collectively connected to at least $\ell+1$ sources, for all $0 < \ell < k$. Moreover, we prove that this condition can be verified in polynomial time in $n$ and $k$. Finally, given a SMAN satisfying the aforementioned condition, we provide another polynomial time algorithm to trim the network until it has a sparsest set of source-relay links that still supports a weakly secure MDS coding scheme.
  • We consider the locality of encoding and decoding operations in distributed storage systems (DSS), and propose a new class of codes, called locally encodable and decodable codes (LEDC), that provides a higher degree of operational locality compared to currently known codes. For a given locality structure, we derive an upper bound on the global distance and demonstrate the existence of an optimal LEDC for sufficiently large field size. In addition, we also construct two families of optimal LEDC for fields with size linear in code length.
  • The MDS property (aka the $k$-out-of-$n$ property) requires that if a file is split into several symbols and subsequently encoded into $n$ coded symbols, each being stored in one storage node of a distributed storage system (DSS), then an user can recover the file by accessing any $k$ nodes. We study the so-called $p$-decodable $\mu$-secure erasure coding scheme $(1 \leq p \leq k - \mu, 0 \leq \mu < k, p | (k-\mu))$, which satisfies the MDS property and the following additional properties: (P1) strongly secure up to a threshold: an adversary which eavesdrops at most $\mu$ storage nodes gains no information (in Shannon's sense) about the stored file, (P2) partially decodable: a legitimate user can recover a subset of $p$ file symbols by accessing some $\mu + p$ storage nodes. The scheme is perfectly $p$-decodable $\mu$-secure if it satisfies the following additional property: (P3) weakly secure up to a threshold: an adversary which eavesdrops more than $\mu$ but less than $\mu+p$ storage nodes cannot reconstruct any part of the file. Most of the related work in the literature only focused on the case $p = k - \mu$. In other words, no partial decodability is provided: an user cannot retrieve any part of the file by accessing less than $k$ nodes. We provide an explicit construction of $p$-decodable $\mu$-secure coding schemes over small fields for all $\mu$ and $p$. That construction also produces perfectly $p$-decodable $\mu$-secure schemes over small fields when $p = 1$ (for every $\mu$), and when $\mu = 0, 1$ (for every $p$). We establish that perfect schemes exist over \emph{sufficiently large} fields for almost all $\mu$ and $p$.
  • A passive adversary can eavesdrop stored content or downloaded content of some storage nodes, in order to learn illegally about the file stored across a distributed storage system (DSS). Previous work in the literature focuses on code constructions that trade storage capacity for perfect security. In other words, by decreasing the amount of original data that it can store, the system can guarantee that the adversary, which eavesdrops up to a certain number of storage nodes, obtains no information (in Shannon's sense) about the original data. In this work we introduce the concept of block security for DSS and investigate minimum bandwidth regenerating (MBR) codes that are block secure against adversaries of varied eavesdropping strengths. Such MBR codes guarantee that no information about any group of original data units up to a certain size is revealed, without sacrificing the storage capacity of the system. The size of such secure groups varies according to the number of nodes that the adversary can eavesdrop. We show that code constructions based on Cauchy matrices provide block security. The opposite conclusion is drawn for codes based on Vandermonde matrices.
  • We study the existence over small fields of Maximum Distance Separable (MDS) codes with generator matrices having specified supports (i.e. having specified locations of zero entries). This problem unifies and simplifies the problems posed in recent works of Yan and Sprintson (NetCod'13) on weakly secure cooperative data exchange, of Halbawi et al. (arxiv'13) on distributed Reed-Solomon codes for simple multiple access networks, and of Dau et al. (ISIT'13) on MDS codes with balanced and sparse generator matrices. We conjecture that there exist such $[n,k]_q$ MDS codes as long as $q \geq n + k - 1$, if the specified supports of the generator matrices satisfy the so-called MDS condition, which can be verified in polynomial time. We propose a combinatorial approach to tackle the conjecture, and prove that the conjecture holds for a special case when the sets of zero coordinates of rows of the generator matrix share with each other (pairwise) at most one common element. Based on our numerical result, the conjecture is also verified for all $k \leq 7$. Our approach is based on a novel generalization of the well-known Hall's marriage theorem, which allows (overlapping) multiple representatives instead of a single representative for each subset.
  • Linear erasure codes with local repairability are desirable for distributed data storage systems. An [n, k, d] code having all-symbol (r, \delta})-locality, denoted as (r, {\delta})a, is considered optimal if it also meets the minimum Hamming distance bound. The existing results on the existence and the construction of optimal (r, {\delta})a codes are limited to only the special case of {\delta} = 2, and to only two small regions within this special case, namely, m = 0 or m >= (v+{\delta}-1) > ({\delta}-1), where m = n mod (r+{\delta}-1) and v = k mod r. This paper investigates the existence conditions and presents deterministic constructive algorithms for optimal (r, {\delta})a codes with general r and {\delta}. First, a structure theorem is derived for general optimal (r, {\delta})a codes which helps illuminate some of their structure properties. Next, the entire problem space with arbitrary n, k, r and {\delta} is divided into eight different cases (regions) with regard to the specific relations of these parameters. For two cases, it is rigorously proved that no optimal (r, {\delta})a could exist. For four other cases the optimal (r, {\delta})a codes are shown to exist, deterministic constructions are proposed and the lower bound on the required field size for these algorithms to work is provided. Our new constructive algorithms not only cover more cases, but for the same cases where previous algorithms exist, the new constructions require a considerably smaller field, which translates to potentially lower computational complexity. Our findings substantially enriches the knowledge on (r, {\delta})a codes, leaving only two cases in which the existence of optimal codes are yet to be determined.
  • The min-rank of a graph was introduced by Haemers (1978) to bound the Shannon capacity of a graph. This parameter of a graph has recently gained much more attention from the research community after the work of Bar-Yossef et al. (2006). In their paper, it was shown that the min-rank of a graph G characterizes the optimal scalar linear solution of an instance of the Index Coding with Side Information (ICSI) problem described by the graph G. It was shown by Peeters (1996) that computing the min-rank of a general graph is an NP-hard problem. There are very few known families of graphs whose min-ranks can be found in polynomial time. In this work, we introduce a new family of graphs with efficiently computed min-ranks. Specifically, we establish a polynomial time dynamic programming algorithm to compute the min-ranks of graphs having simple tree structures. Intuitively, such graphs are obtained by gluing together, in a tree-like structure, any set of graphs for which the min-ranks can be determined in polynomial time. A polynomial time algorithm to recognize such graphs is also proposed.
  • The min-rank of a digraph was shown by Bar-Yossef et al. (2006) to represent the length of an optimal scalar linear solution of the corresponding instance of the Index Coding with Side Information (ICSI) problem. In this work, the graphs and digraphs of near-extreme min-ranks are characterized. Those graphs and digraphs correspond to the ICSI instances having near-extreme transmission rates when using optimal scalar linear index codes. In particular, it is shown that the decision problem whether a digraph has min-rank two is NP-complete. By contrast, the same question for graphs can be answered in polynomial time. Additionally, a new upper bound on the min-rank of a digraph, the circuit-packing bound, is presented. This bound is often tighter than the previously known bounds. By employing this new bound, we present several families of digraphs whose min-ranks can be found in polynomial time.
  • Parity declustering allows faster reconstruction of a disk array when some disk fails. Moreover, it guarantees uniform reconstruction workload on all surviving disks. It has been shown that parity declustering for one-failure tolerant array codes can be obtained via Balanced Incomplete Block Designs. We extend this technique for array codes that can tolerate an arbitrary number of disk failures via $t$-designs.
  • We study the delay minimization in a direct multicast communication scheme where a base station wishes to transmit a set of original packets to a group of clients. Each of the clients already has in its cache a subset of the original packets, and requests for all the remaining packets. The base station communicates directly with the clients by broadcasting information to them. Assume that bandwidths vary between the station and different clients. We propose a method to minimize the total delay required for the base station to satisfy requests from all clients.
  • We show that given $n$ and $k$, for $q$ sufficiently large, there always exists an $[n, k]_q$ MDS code that has a generator matrix $G$ satisfying the following two conditions: (C1) Sparsest: each row of $G$ has Hamming weight $n - k + 1$; (C2) Balanced: Hamming weights of the columns of $G$ differ from each other by at most one.
  • A problem of index coding with side information was first considered by Y. Birk and T. Kol (IEEE INFOCOM, 1998). In the present work, a generalization of index coding scheme, where transmitted symbols are subject to errors, is studied. Error-correcting methods for such a scheme, and their parameters, are investigated. In particular, the following question is discussed: given the side information hypergraph of index coding scheme and the maximal number of erroneous symbols $\delta$, what is the shortest length of a linear index code, such that every receiver is able to recover the required information? This question turns out to be a generalization of the problem of finding a shortest-length error-correcting code with a prescribed error-correcting capability in the classical coding theory. The Singleton bound and two other bounds, referred to as the $\alpha$-bound and the $\kappa$-bound, for the optimal length of a linear error-correcting index code (ECIC) are established. For large alphabets, a construction based on concatenation of an optimal index code with an MDS classical code, is shown to attain the Singleton bound. For smaller alphabets, however, this construction may not be optimal. A random construction is also analyzed. It yields another implicit bound on the length of an optimal linear ECIC. Further, the problem of error-correcting decoding by a linear ECIC is studied. It is shown that in order to decode correctly the desired symbol, the decoder is required to find one of the vectors, belonging to an affine space containing the actual error vector. The syndrome decoding is shown to produce the correct output if the weight of the error pattern is less or equal to the error-correcting capability of the corresponding ECIC. Finally, the notion of static ECIC, which is suitable for use with a family of instances of an index coding problem, is introduced.
  • Security aspects of the Index Coding with Side Information (ICSI) problem are investigated. Building on the results of Bar-Yossef et al. (2006), the properties of linear index codes are further explored. The notion of weak security, considered by Bhattad and Narayanan (2005) in the context of network coding, is generalized to block security. It is shown that the linear index code based on a matrix $L$, whose column space code $C(L)$ has length $n$, minimum distance $d$ and dual distance $d^\perp$, is $(d-1-t)$-block secure (and hence also weakly secure) if the adversary knows in advance $t \leq d-2$ messages, and is completely insecure if the adversary knows in advance more than $n - d$ messages. Strong security is examined under the conditions that the adversary: (i) possesses $t$ messages in advance; (ii) eavesdrops at most $\mu$ transmissions; (iii) corrupts at most $\delta$ transmissions. We prove that for sufficiently large $q$, an optimal linear index code which is strongly secure against such an adversary has length $\kappa_q+\mu+2\delta$. Here $\kappa_q$ is a generalization of the min-rank over $F_q$ of the side information graph for the ICSI problem in its original formulation in the work of Bar- Yossef et al.
  • A problem of index coding with side information was first considered by Y. Birk and T. Kol (IEEE INFOCOM, 1998). In the present work, a generalization of index coding scheme, where transmitted symbols are subject to errors, is studied. Error-correcting methods for such a scheme, and their parameters, are investigated. In particular, the following question is discussed: given the side information hypergraph of index coding scheme and the maximal number of erroneous symbols $\delta$, what is the shortest length of a linear index code, such that every receiver is able to recover the required information? This question turns out to be a generalization of the problem of finding a shortest-length error-correcting code with a prescribed error-correcting capability in the classical coding theory. The Singleton bound and two other bounds, referred to as the $\alpha$-bound and the $\kappa$-bound, for the optimal length of a linear error-correcting index code (ECIC) are established. For large alphabets, a construction based on concatenation of an optimal index code with an MDS classical code, is shown to attain the Singleton bound. For smaller alphabets, however, this construction may not be optimal. A random construction is also analyzed. It yields another inexplicit bound on the length of an optimal linear ECIC. Finally, the decoding of linear ECIC's is discussed. The syndrome decoding is shown to output the exact message if the weight of the error vector is less or equal to the error-correcting capability of the corresponding ECIC.
  • Security aspects of the Index Coding with Side Information (ICSI) problem are investigated. Building on the results of Bar-Yossef et al. (2006), the properties of linear coding schemes for the ICSI problem are further explored. The notion of weak security, considered by Bhattad and Narayanan (2005) in the context of network coding, is generalized to block security. It is shown that the coding scheme for the ICSI problem based on a linear code C of length n, minimum distance d and dual distance d^\perp, is (d-1-t)-block secure (and hence also weakly secure) if the adversary knows in advance t \le d - 2 messages, and is completely insecure if the adversary knows in advance more than n - d^\perp messages.
  • An optimal constant-composition or constant-weight code of weight $w$ has linear size if and only if its distance $d$ is at least $2w-1$. When $d\geq 2w$, the determination of the exact size of such a constant-composition or constant-weight code is trivial, but the case of $d=2w-1$ has been solved previously only for binary and ternary constant-composition and constant-weight codes, and for some sporadic instances. This paper provides a construction for quasicyclic optimal constant-composition and constant-weight codes of weight $w$ and distance $2w-1$ based on a new generalization of difference triangle sets. As a result, the sizes of optimal constant-composition codes and optimal constant-weight codes of weight $w$ and distance $2w-1$ are determined for all such codes of sufficiently large lengths. This solves an open problem of Etzion. The sizes of optimal constant-composition codes of weight $w$ and distance $2w-1$ are also determined for all $w\leq 6$, except in two cases.
  • This correspondence introduces two new constructive techniques to complete the determination of the sizes of optimal q-ary codes of constant weight three and distance four.