
Vector linear network coding (LNC) is a generalization of the conventional
scalar LNC, such that the data unit transmitted on every edge is an
$L$dimensional vector of data symbols over a base field GF($q$). Vector LNC
enriches the choices of coding operations at intermediate nodes, and there is a
popular conjecture on the benefit of vector LNC over scalar LNC in terms of
alphabet size of data units: there exist (singlesource) multicast networks
that are vector linearly solvable of dimension $L$ over GF($q$) but not scalar
linearly solvable over any field of size $q' \leq q^L$. This paper introduces a
systematic way to construct such multicast networks, and subsequently establish
explicit instances to affirm the positive answer of this conjecture for
\emph{infinitely many} alphabet sizes $p^L$ with respect to an \emph{arbitrary}
prime $p$. On the other hand, this paper also presents explicit instances with
the special property that they do not have a vector linear solution of
dimension $L$ over GF(2) but have scalar linear solutions over GF($q'$) for
some $q' < 2^L$, where $q'$ can be odd or even. This discovery also unveils
that over a given base field, a multicast network that has a vector linear
solution of dimension $L$ does not necessarily have a vector linear solution of
dimension $L' > L$.

Distributed storage systems provide largescale reliable data storage
services by spreading redundancy across a large group of storage nodes. In such
a large system, node failures take place on a regular basis. When a storage
node breaks down, a replacement node is expected to regenerate the redundant
data as soon as possible in order to maintain the same level of redundancy.
Previous results have been mainly focused on the minimization of network
traffic in regeneration. However, in practical networks, where link capacities
vary in a wide range, minimizing network traffic does not always yield the
minimum regeneration time. In this paper, we investigate two approaches to the
problem of minimizing regeneration time in networks with heterogeneous link
capacities. The first approach is to download different amounts of repair data
from the helping nodes according to the link capacities. The second approach
generalizes the conventional starstructured regeneration topology to
treestructured topologies so that we can utilize the links between helping
nodes with bypassing lowcapacity links. Simulation results show that the
flexible treestructured regeneration scheme that combines the advantages of
both approaches can achieve a substantial reduction in the regeneration time.

In an acyclic multicast network, it is well known that a linear network
coding solution over GF($q$) exists when $q$ is sufficiently large. In
particular, for each prime power $q$ no smaller than the number of receivers, a
linear solution over GF($q$) can be efficiently constructed. In this work, we
reveal that a linear solution over a given finite field does \emph{not}
necessarily imply the existence of a linear solution over all larger finite
fields. Specifically, we prove by construction that: (i) For every source
dimension no smaller than 3, there is a multicast network linearly solvable
over GF(7) but not over GF(8), and another multicast network linearly solvable
over GF(16) but not over GF(17); (ii) There is a multicast network linearly
solvable over GF(5) but not over such GF($q$) that $q > 5$ is a Mersenne prime
plus 1, which can be extremely large; (iii) A multicast network linearly
solvable over GF($q^{m_1}$) and over GF($q^{m_2}$) is \emph{not} necessarily
linearly solvable over GF($q^{m_1+m_2}$); (iv) There exists a class of
multicast networks with a set $T$ of receivers such that the minimum field size
$q_{min}$ for a linear solution over GF($q_{min}$) is lower bounded by
$\Theta(\sqrt{T})$, but not every larger field than GF($q_{min}$) suffices to
yield a linear solution. The insight brought from this work is that not only
the field size, but also the order of subgroups in the multiplicative group of
a finite field affects the linear solvability of a multicast network.

As storage systems grow in size, device failures happen more frequently than
ever before. Given the commodity nature of hard drives employed, a storage
system needs to tolerate a certain number of disk failures while maintaining
data integrity, and to recover lost data with minimal interference to normal
disk I/O operations. RAID6, which can tolerate up to two disk failures with
the minimum redundancy, is becoming widespread. However, traditional RAID6
codes suffer from high disk I/O overhead during recovery. In this paper, we
propose a new family of RAID6 codes, the Minimum Disk I/O Repairable (MDR)
codes, which achieve the optimal disk I/O overhead for single failure
recoveries. Moreover, we show that MDR codes can be encoded with the minimum
number of bitwise XOR operations. Simulation results show that MDR codes help
to save about half of disk read operations than traditional RAID6 codes, and
thus can reduce the recovery time by up to 40%.

Network Coding encourages information coding across a communication network.
While the necessity, benefit and complexity of network coding are sensitive to
the underlying graph structure of a network, existing theory on network coding
often treats the network topology as a black box, focusing on algebraic or
information theoretic aspects of the problem. This work aims at an indepth
examination of the relation between algebraic coding and network topologies. We
mathematically establish a series of results along the direction of: if network
coding is necessary/beneficial, or if a particular finite field is required for
coding, then the network must have a corresponding hidden structure embedded in
its underlying topology, and such embedding is computationally efficient to
verify. Specifically, we first formulate a metaconjecture, the NCMinor
Conjecture, that articulates such a connection between graph theory and network
coding, in the language of graph minors. We next prove that the NCMinor
Conjecture is almost equivalent to the Hadwiger Conjecture, which connects
graph minors with graph coloring. Such equivalence implies the existence of
$K_4$, $K_5$, $K_6$, and $K_{O(q/\log{q})}$ minors, for networks requiring
$\mathbb{F}_3$, $\mathbb{F}_4$, $\mathbb{F}_5$ and $\mathbb{F}_q$,
respectively. We finally prove that network coding can make a difference from
routing only if the network contains a $K_4$ minor, and this minor containment
result is tight. Practical implications of the above results are discussed.