• ### Spectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking(1707.09971)

This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model --- the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity --- the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan $\sin\Theta$ theorem for symmetric matrices. This also allows us to close the gap between the $\ell_2$ error upper bound for the spectral method and the minimax lower limit.
• ### Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval(1803.07726)

March 21, 2018 cs.IT, math.IT, cs.NA, math.OC, cs.LG, stat.ML
This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest $\mathbf{x}^{\natural}\in\mathbb{R}^{n}$ from $m$ quadratic equations/samples $y_{i}=(\mathbf{a}_{i}^{\top}\mathbf{x}^{\natural})^{2}$, $1\leq i\leq m$. This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficiency of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent --- when randomly initialized --- yields an $\epsilon$-accurate solution in $O\big(\log n+\log(1/\epsilon)\big)$ iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.
• ### Teaching Categories to Human Learners with Visual Explanations(1802.06924)

Feb. 20, 2018 cs.CV, cs.LG, stat.ML
We study the problem of computer-assisted teaching with explanations. Conventional approaches for machine teaching typically only provide feedback at the instance level e.g., the category or label of the instance. However, it is intuitive that clear explanations from a knowledgeable teacher can significantly improve a student's ability to learn a new concept. To address these existing limitations, we propose a teaching framework that provides interpretable explanations as feedback and models how the learner incorporates this additional information. In the case of images, we show that we can automatically generate explanations that highlight the parts of the image that are responsible for the class label. Experiments on human learners illustrate that, on average, participants achieve better test set performance on challenging categorization tasks when taught with our interpretable approach compared to existing methods.
• ### Nonconvex Matrix Factorization from Rank-One Measurements(1802.06286)

Feb. 17, 2018 cs.IT, math.IT, cs.LG, stat.ML
We consider the problem of recovering low-rank matrices from random rank-one measurements, which spans numerous applications including covariance sketching, phase retrieval, quantum state tomography, and learning shallow polynomial neural networks, among others. Our approach is to directly estimate the low-rank factor by minimizing a nonconvex quadratic loss function via vanilla gradient descent, following a tailored spectral initialization. When the true rank is small, this algorithm is guaranteed to converge to the ground truth (up to global ambiguity) with near-optimal sample complexity and computational complexity. To the best of our knowledge, this is the first guarantee that achieves near-optimality in both metrics. In particular, the key enabler of near-optimal computational guarantees is an implicit regularization phenomenon: without explicit regularization, both spectral initialization and the gradient descent iterates automatically stay within a region incoherent with the measurement vectors. This feature allows one to employ much more aggressive step sizes compared with the ones suggested in prior literature, without the need of sample splitting.
• ### Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners(1802.05190)

Feb. 14, 2018 cs.LG
In real-world applications of education and human teaching, an effective teacher chooses the next example intelligently based on the learner's current state. However, most of the existing works in algorithmic machine teaching focus on the batch setting, where adaptivity plays no role. In this paper, we study the case of teaching consistent, version space learners in an interactive setting---at any time step, the teacher provides an example, the learner performs an update, and the teacher observes the learner's new state. We highlight that adaptivity does not speed up the teaching process when considering existing models of version space learners, such as the "worst-case" model (the learner picks the next hypothesis randomly from the version space) and "preference-based" model (the learner picks hypothesis according to some global preference). Inspired by human teaching, we propose a new model where the learner picks hypothesis according to some local preference defined by the current hypothesis. We show that our model exhibits several desirable properties, e.g., adaptivity plays a key role, and the learner's transitions over hypotheses are smooth/interpretable. We develop efficient teaching algorithms for our model, and demonstrate our results via simulations as well as user studies.
• ### Noise-induced tipping in a periodically forced system: the noise-drift balanced regime(1801.05395)

Jan. 16, 2018 nlin.CD, math.DS
We consider a simple periodically-forced 1-D Langevin equation which possesses two stable periodic orbits in the absence of noise. We ask the question: is there a most likely transition path between the stable orbits that would allow us to identify a preferred phase of the periodic forcing for which tipping occurs? The regime where the forcing period is long compared to the adiabatic relaxation time has been well studied. Our work complements this by focusing on the regime where the forcing period is comparable to the relaxation time. We compute optimal paths using the least action method which involves the Onsager-Machlup functional and validate results with Monte Carlo simulations in a regime where noise and drift are balanced. Results for the preferred tipping phase are compared with the deterministic aspects of the problem. We identify parameter regimes where nullclines, associated with the deterministic problem in a 2-D extended phase space, form passageways through which the optimal paths transit. As the nullclines are independent of the relaxation time and the noise strength, this leads to a robust deterministic predictor of a preferred tipping phase for the noise and drift balanced regime.
• ### Efficient Online Learning for Optimizing Value of Information: Theory and Application to Interactive Troubleshooting(1703.05452)

July 17, 2017 cs.AI, cs.LG, stat.ML
We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms are either heuristics with no guarantees, or scale poorly (with exponential run time in terms of the number of available tests). Moreover, these methods assume a known distribution over the test outcomes, which is often not the case in practice. We propose an efficient sampling-based online learning framework to address the above issues. First, assuming the distribution over hypotheses is known, we propose a dynamic hypothesis enumeration strategy, which allows efficient information gathering with strong theoretical guarantees. We show that with sufficient amount of samples, one can identify a near-optimal decision with high probability. Second, when the parameters of the hypotheses distribution are unknown, we propose an algorithm which learns the parameters progressively via posterior sampling in an online fashion. We further establish a rigorous bound on the expected regret. We demonstrate the effectiveness of our approach on a real-world interactive troubleshooting application and show that one can efficiently make high-quality decisions with low cost.
• ### The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square(1706.01191)

Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2\Lambda$ is distributed as a $\chi^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks' theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/n\rightarrow\kappa$ for some constant $\kappa < 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2\Lambda~\stackrel{\mathrm{d}}{\rightarrow}~\alpha(\kappa)\chi_k^2$, where the scaling factor $\alpha(\kappa)$ is greater than one as soon as the dimensionality ratio $\kappa$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $\kappa=0.3$, $\alpha(\kappa)\approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
• ### On the Minimax Capacity Loss under Sub-Nyquist Universal Sampling(1304.7751)

March 12, 2017 cs.IT, math.IT
This paper investigates the information rate loss in analog channels when the sampler is designed to operate independent of the instantaneous channel occupancy. Specifically, a multiband linear time-invariant Gaussian channel under universal sub-Nyquist sampling is considered. The entire channel bandwidth is divided into $n$ subbands of equal bandwidth. At each time only $k$ constant-gain subbands are active, where the instantaneous subband occupancy is not known at the receiver and the sampler. We study the information loss through a capacity loss metric, that is, the capacity gap caused by the lack of instantaneous subband occupancy information. We characterize the minimax capacity loss for the entire sub-Nyquist rate regime, provided that the number $n$ of subbands and the SNR are both large. The minimax limits depend almost solely on the band sparsity factor and the undersampling factor, modulo some residual terms that vanish as $n$ and SNR grow. Our results highlight the power of randomized sampling methods (i.e. the samplers that consist of random periodic modulation and low-pass filters), which are able to approach the minimax capacity loss with exponentially high probability.
• ### The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences(1609.05820)

Nov. 22, 2016 cs.IT, math.IT, cs.CV, math.OC, cs.LG, stat.ML
Various applications involve assigning discrete label values to a collection of objects based on some pairwise noisy data. Due to the discrete---and hence nonconvex---structure of the problem, computing the optimal assignment (e.g.~maximum likelihood assignment) becomes intractable at first sight. This paper makes progress towards efficient computation by focusing on a concrete joint alignment problem---that is, the problem of recovering $n$ discrete variables $x_i \in \{1,\cdots, m\}$, $1\leq i\leq n$ given noisy observations of their modulo differences $\{x_i - x_j~\mathsf{mod}~m\}$. We propose a low-complexity and model-free procedure, which operates in a lifted space by representing distinct label values in orthogonal directions, and which attempts to optimize quadratic functions over hypercubes. Starting with a first guess computed via a spectral method, the algorithm successively refines the iterates via projected power iterations. We prove that for a broad class of statistical models, the proposed projected power method makes no error---and hence converges to the maximum likelihood estimate---in a suitable regime. Numerical experiments have been carried out on both synthetic and real data to demonstrate the practicality of our algorithm. We expect this algorithmic framework to be effective for a broad range of discrete assignment problems.
• ### Information Recovery from Pairwise Measurements(1404.7105)

Aug. 2, 2016 cs.IT, math.IT, math.ST, stat.TH
A variety of information processing tasks in practice involve recovering $n$ objects from single-shot graph-based measurements, particularly those taken over the edges of some measurement graph $\mathcal{G}$. This paper concerns the situation where each object takes value over a group of $M$ different values, and where one is interested to recover all these values based on observations of certain pairwise relations over $\mathcal{G}$. The imperfection of measurements presents two major challenges for information recovery: 1) $\textit{inaccuracy}$: a (dominant) portion $1-p$ of measurements are corrupted; 2) $\textit{incompleteness}$: a significant fraction of pairs are unobservable, i.e. $\mathcal{G}$ can be highly sparse. Under a natural random outlier model, we characterize the $\textit{minimax recovery rate}$, that is, the critical threshold of non-corruption rate $p$ below which exact information recovery is infeasible. This accommodates a very general class of pairwise relations. For various homogeneous random graph models (e.g. Erdos Renyi random graphs, random geometric graphs, small world graphs), the minimax recovery rate depends almost exclusively on the edge sparsity of the measurement graph $\mathcal{G}$ irrespective of other graphical metrics. This fundamental limit decays with the group size $M$ at a square root rate before entering a connectivity-limited regime. Under the Erdos Renyi random graph, a tractable combinatorial algorithm is proposed to approach the limit for large $M$ ($M=n^{\Omega(1)}$), while order-optimal recovery is enabled by semidefinite programs in the small $M$ regime. The extended (and most updated) version of this work can be found at (http://arxiv.org/abs/1504.01369).
• ### Near-optimal Bayesian Active Learning with Correlated and Noisy Tests(1605.07334)

July 11, 2016 cs.AI, cs.LG
We consider the Bayesian active learning and experimental design problem, where the goal is to learn the value of some unknown target variable through a sequence of informative, noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly. In this paper, we propose ECED, a novel, computationally efficient active learning algorithm, and prove strong theoretical guarantees that hold with correlated, noisy tests. Rather than directly optimizing the prediction error, at each step, ECED picks the test that maximizes the gain in a surrogate objective, which takes into account the dependencies between tests. Our analysis relies on an information-theoretic auxiliary function to track the progress of ECED, and utilizes adaptive submodularity to attain the near-optimal bound. We demonstrate strong empirical performance of ECED on two problem instances, including a Bayesian experimental design task intended to distinguish among economic theories of how people make risky decisions, and an active preference learning task via pairwise comparisons.
• ### Community Recovery in Graphs with Locality(1602.03828)

Motivated by applications in domains such as social networks and computational biology, we study the problem of community recovery in graphs with locality. In this problem, pairwise noisy measurements of whether two nodes are in the same community or different communities come mainly or exclusively from nearby nodes rather than uniformly sampled between all nodes pairs, as in most existing models. We present an algorithm that runs nearly linearly in the number of measurements and which achieves the information theoretic limit for exact recovery.
• ### Information Recovery from Pairwise Measurements(1504.01369)

This paper is concerned with jointly recovering $n$ node-variables $\left\{ x_{i}\right\}_{1\leq i\leq n}$ from a collection of pairwise difference measurements. Imagine we acquire a few observations taking the form of $x_{i}-x_{j}$; the observation pattern is represented by a measurement graph $\mathcal{G}$ with an edge set $\mathcal{E}$ such that $x_{i}-x_{j}$ is observed if and only if $(i,j)\in\mathcal{E}$. To account for noisy measurements in a general manner, we model the data acquisition process by a set of channels with given input/output transition measures. Employing information-theoretic tools applied to channel decoding problems, we develop a \emph{unified} framework to characterize the fundamental recovery criterion, which accommodates general graph structures, alphabet sizes, and channel transition measures. In particular, our results isolate a family of \emph{minimum} \emph{channel divergence measures} to characterize the degree of measurement corruption, which together with the size of the minimum cut of $\mathcal{G}$ dictates the feasibility of exact information recovery. For various homogeneous graphs, the recovery condition depends almost only on the edge sparsity of the measurement graph irrespective of other graphical metrics; alternatively, the minimum sample complexity required for these graphs scales like $\text{minimum sample complexity }\asymp\frac{n\log n}{\mathsf{Hel}_{1/2}^{\min}}$ for certain information metric $\mathsf{Hel}_{1/2}^{\min}$ defined in the main text, as long as the alphabet size is not super-polynomial in $n$. We apply our general theory to three concrete applications, including the stochastic block model, the outlier model, and the haplotype assembly problem. Our theory leads to order-wise tight recovery conditions for all these scenarios.
• ### Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems(1505.05114)

We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirtinger flow approach. There are several key distinguishing features, most notably, a distinct objective functional and novel update rules, which operate in an adaptive fashion and drop terms bearing too much influence on the search direction. These careful selection rules provide a tighter initial guess, better descent directions, and thus enhanced practical performance. On the theoretical side, we prove that for certain unstructured models of quadratic systems, our algorithms return the correct solution in linear time, i.e. in time proportional to reading the data $\{\boldsymbol{a}_i\}$ and $\{y_i\}$ as soon as the ratio $m/n$ between the number of equations and unknowns exceeds a fixed numerical constant. We extend the theory to deal with noisy systems in which we only have $y_i \approx |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$ and prove that our algorithms achieve a statistical accuracy, which is nearly un-improvable. We complement our theoretical study with numerical examples showing that solving random quadratic systems is both computationally and statistically not much harder than solving linear systems of the same size---hence the title of this paper. For instance, we demonstrate empirically that the computational cost of our algorithm is about four times that of solving a least-squares problem of the same size.
• ### Assessing the robustness of spatial pattern sequences in a dryland vegetation model(1507.03239)

Feb. 23, 2016 nlin.PS
A particular sequence of patterns, "$\text{gaps} \to \text{labyrinth} \to \text{spots}$," occurs with decreasing precipitation in previously reported numerical simulations of PDE dryland vegetation models. These observations have led to the suggestion that this sequence of patterns can serve as an early indicator of desertification in some ecosystems. Since parameter values can take on a range of plausible values in the vegetation models, it is important to investigate whether the pattern sequence prediction is robust to variation. For a particular model, we find that a quantity calculated via bifurcation-theoretic analysis appears to serve as a proxy for the pattern sequences that occur in numerical simulations across a range of parameter values. We find in further analysis that the quantity takes on values consistent with the standard sequence in an ecologically relevant limit of the model parameter values. This suggests that the standard sequence is a robust prediction of the model, and we conclude by proposing a methodology for assessing the robustness of the standard sequence in other models and formulations.
• ### Accelerated dimension-independent adaptive Metropolis(1506.05741)

June 18, 2015 math.PR, stat.CO, stat.ME
This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known adaptive Metropolis (AM) algorithm of (Haario etal. 2001) is extended herein to scale asymptotically uniformly with respect to the underlying parameter dimension for Gaussian targets, by respecting the variance of the target. The resulting algorithm, referred to as the dimension-independent adaptive Metropolis (DIAM) algorithm, also shows improved performance with respect to adaptive Metropolis on non-Gaussian targets. This algorithm is further improved, and the possibility of probing high-dimensional targets is enabled, via GPU-accelerated numerical libraries and periodically synchronized concurrent chains (justified a posteriori). Asymptotically in dimension, this GPU implementation exhibits a factor of four improvement versus a competitive CPU-based Intel MKL parallel version alone. Strong scaling to concurrent chains is exhibited, through a combination of longer time per sample batch (weak scaling) and yet fewer necessary samples to convergence. The algorithm performance is illustrated on several Gaussian and non-Gaussian target examples, in which the dimension may be in excess of one thousand.
• ### Spectral MLE: Top-$K$ Rank Aggregation from Pairwise Comparisons(1504.07218)

This paper explores the preference-based top-$K$ rank aggregation problem. Suppose that a collection of items is repeatedly compared in pairs, and one wishes to recover a consistent ordering that emphasizes the top-$K$ ranked items, based on partially revealed preferences. We focus on the Bradley-Terry-Luce (BTL) model that postulates a set of latent preference scores underlying all items, where the odds of paired comparisons depend only on the relative scores of the items involved. We characterize the minimax limits on identifiability of top-$K$ ranked items, in the presence of random and non-adaptive sampling. Our results highlight a separation measure that quantifies the gap of preference scores between the $K^{\text{th}}$ and $(K+1)^{\text{th}}$ ranked items. The minimum sample complexity required for reliable top-$K$ ranking scales inversely with the separation measure irrespective of other preference distribution metrics. To approach this minimax limit, we propose a nearly linear-time ranking scheme, called \emph{Spectral MLE}, that returns the indices of the top-$K$ items in accordance to a careful score estimate. In a nutshell, Spectral MLE starts with an initial score estimate with minimal squared loss (obtained via a spectral method), and then successively refines each component with the assistance of coordinate-wise MLEs. Encouragingly, Spectral MLE allows perfect top-$K$ item identification under minimal sample complexity. The practical applicability of Spectral MLE is further corroborated by numerical experiments.
• ### Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming(1310.0807)

Statistical inference and information processing of high-dimensional data often require efficient and accurate estimation of their second-order statistics. With rapidly changing data, limited processing power and storage at the acquisition devices, it is desirable to extract the covariance structure from a single pass over the data and a small number of stored measurements. In this paper, we explore a quadratic (or rank-one) measurement model which imposes minimal memory requirements and low computational complexity during the sampling process, and is shown to be optimal in preserving various low-dimensional covariance structures. Specifically, four popular structural assumptions of covariance matrices, namely low rank, Toeplitz low rank, sparsity, jointly rank-one and sparse structure, are investigated, while recovery is achieved via convex relaxation paradigms for the respective structure. The proposed quadratic sampling framework has a variety of potential applications including streaming data processing, high-frequency wireless communication, phase space tomography and phase retrieval in optics, and non-coherent subspace detection. Our method admits universally accurate covariance estimation in the absence of noise, as soon as the number of measurements exceeds the information theoretic limits. We also demonstrate the robustness of this approach against noise and imperfect structural assumptions. Our analysis is established upon a novel notion called the mixed-norm restricted isometry property (RIP-$\ell_{2}/\ell_{1}$), as well as the conventional RIP-$\ell_{2}/\ell_{2}$ for near-isotropic and bounded measurements. In addition, our results improve upon the best-known phase retrieval (including both dense and sparse signals) guarantees using PhaseLift with a significantly simpler approach.
• ### Backing off from Infinity: Performance Bounds via Concentration of Spectral Measure for Random MIMO Channels(1312.2574)

Oct. 25, 2014 cs.IT, math.IT, math.ST, stat.TH
The performance analysis of random vector channels, particularly multiple-input-multiple-output (MIMO) channels, has largely been established in the asymptotic regime of large channel dimensions, due to the analytical intractability of characterizing the exact distribution of the objective performance metrics. This paper exposes a new non-asymptotic framework that allows the characterization of many canonical MIMO system performance metrics to within a narrow interval under moderate-to-large channel dimensionality, provided that these metrics can be expressed as a separable function of the singular values of the matrix. The effectiveness of our framework is illustrated through two canonical examples. Specifically, we characterize the mutual information and power offset of random MIMO channels, as well as the minimum mean squared estimation error of MIMO channel inputs from the channel outputs. Our results lead to simple, informative, and reasonably accurate control of various performance metrics in the finite-dimensional regime, as corroborated by the numerical simulations. Our analysis framework is established via the concentration of spectral measure phenomenon for random matrices uncovered by Guionnet and Zeitouni, which arises in a variety of random matrix ensembles irrespective of the precise distributions of the matrix entries.
• ### Robust Spectral Compressed Sensing via Structured Matrix Completion(1304.8126)

July 20, 2014 math.NA, cs.IT, math.IT, stat.ML, cs.SY
The paper explores the problem of \emph{spectral compressed sensing}, which aims to recover a spectrally sparse signal from a small random subset of its $n$ time domain samples. The signal of interest is assumed to be a superposition of $r$ multi-dimensional complex sinusoids, while the underlying frequencies can assume any \emph{continuous} values in the normalized frequency domain. Conventional compressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete dictionary on the Fourier representation. To address this issue, we develop a novel algorithm, called \emph{Enhanced Matrix Completion (EMaC)}, based on structured matrix completion that does not require prior knowledge of the model order. The algorithm starts by arranging the data into a low-rank enhanced form exhibiting multi-fold Hankel structure, and then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of $r\log^{4}n$, and is stable against bounded noise. Even if a constant portion of samples are corrupted with arbitrary magnitude, EMaC still allows exact recovery, provided that the sample complexity exceeds the order of $r^{2}\log^{3}n$. Along the way, our results demonstrate the power of convex relaxation in completing a low-rank multi-fold Hankel or Toeplitz matrix from minimal observed entries. The performance of our algorithm and its applicability to super resolution are further validated by numerical experiments.
• ### Scalable Semidefinite Relaxation for Maximum A Posterior Estimation(1405.4807)

May 19, 2014 cs.IT, math.IT, cs.CV, math.OC, cs.LG, stat.ML
Maximum a posteriori (MAP) inference over discrete Markov random fields is a fundamental task spanning a wide spectrum of real-world applications, which is known to be NP-hard for general graphs. In this paper, we propose a novel semidefinite relaxation formulation (referred to as SDR) to estimate the MAP assignment. Algorithmically, we develop an accelerated variant of the alternating direction method of multipliers (referred to as SDPAD-LR) that can effectively exploit the special structure of the new relaxation. Encouragingly, the proposed procedure allows solving SDR for large-scale problems, e.g., problems on a grid graph comprising hundreds of thousands of variables with multiple states per node. Compared with prior SDP solvers, SDPAD-LR is capable of attaining comparable accuracy while exhibiting remarkably improved scalability, in contrast to the commonly held belief that semidefinite relaxation can only been applied on small-scale MRF problems. We have evaluated the performance of SDR on various benchmark datasets including OPENGM2 and PIC in terms of both the quality of the solutions and computation time. Experimental results demonstrate that for a broad class of problems, SDPAD-LR outperforms state-of-the-art algorithms in producing better MAP assignment in an efficient manner.
• ### Channel Capacity under Sub-Nyquist Nonuniform Sampling(1204.6049)

March 22, 2014 cs.IT, math.IT
This paper investigates the effect of sub-Nyquist sampling upon the capacity of an analog channel. The channel is assumed to be a linear time-invariant Gaussian channel, where perfect channel knowledge is available at both the transmitter and the receiver. We consider a general class of right-invertible time-preserving sampling methods which include irregular nonuniform sampling, and characterize in closed form the channel capacity achievable by this class of sampling methods, under a sampling rate and power constraint. Our results indicate that the optimal sampling structures extract out the set of frequencies that exhibits the highest signal-to-noise ratio among all spectral sets of measure equal to the sampling rate. This can be attained through filterbank sampling with uniform sampling at each branch with possibly different rates, or through a single branch of modulation and filtering followed by uniform sampling. These results reveal that for a large class of channels, employing irregular nonuniform sampling sets, while typically complicated to realize, does not provide capacity gain over uniform sampling sets with appropriate preprocessing. Our findings demonstrate that aliasing or scrambling of spectral components does not provide capacity gain, which is in contrast to the benefits obtained from random mixing in spectrum-blind compressive sampling schemes.
• ### A minimal model of predator-swarm interactions(1403.3250)

We propose a minimal model of predator-swarm interactions which captures many of the essential dynamics observed in nature. Different outcomes are observed depending on the predator strength. For a "weak" predator, the swarm is able to escape the predator completely. As the strength is increased, the predator is able to catch up with the swarm as a whole, but the individual prey are able to escape by "confusing" the predator: the prey forms a ring with the predator at the center. For higher predator strength, complex chasing dynamics are observed which can become chaotic. For even higher strength, the predator is able to successfully capture the prey. Our model is simple enough to be amenable to a full mathematical analysis which is used to predict the shape of the swarm as well as the resulting predator-prey dynamics as a function of model parameters. We show that as the predator strength is increased, there is a transition (due to a Hopf bifurcation) from confusion state to chasing dynamics, and we compute the threshold analytically. Our analysis indicates that the swarming behaviour is not helpful in avoiding the predator, suggesting that there are other reasons why the species may swarm. The complex shape of the swarm in our model during the chasing dynamics is similar to the shape of a flock of sheep avoiding a shepherd.
• ### An Algorithm for Exact Super-resolution and Phase Retrieval(1310.7552)

March 7, 2014 cs.IT, math.IT, cs.NA
We explore a fundamental problem of super-resolving a signal of interest from a few measurements of its low-pass magnitudes. We propose a 2-stage tractable algorithm that, in the absence of noise, admits perfect super-resolution of an $r$-sparse signal from $2r^2-2r+2$ low-pass magnitude measurements. The spike locations of the signal can assume any value over a continuous disk, without increasing the required sample size. The proposed algorithm first employs a conventional super-resolution algorithm (e.g. the matrix pencil approach) to recover unlabeled sets of signal correlation coefficients, and then applies a simple sorting algorithm to disentangle and retrieve the true parameters in a deterministic manner. Our approach can be adapted to multi-dimensional spike models and random Fourier sampling by replacing its first step with other harmonic retrieval algorithms.