• The \emph{state complexity} of a regular language $L_m$ is the number $m$ of states in a minimal deterministic finite automaton (DFA) accepting $L_m$. The state complexity of a regularity-preserving binary operation on regular languages is defined as the maximal state complexity of the result of the operation where the two operands range over all languages of state complexities $\le m$ and $\le n$, respectively. We find a tight upper bound on the state complexity of the binary operation \emph{overlap assembly} on regular languages. This operation was introduced by Csuhaj-Varj\'u, Petre, and Vaszil to model the process of self-assembly of two linear DNA strands into a longer DNA strand, provided that their ends "overlap". We prove that the state complexity of the overlap assembly of languages $L_m$ and $L_n$, where $m\ge 2$ and $n\ge1$, is at most $2 (m-1) 3^{n-1} + 2^n$. Moreover, for $m \ge 2$ and $n \ge 3$ there exist languages $L_m$ and $L_n$ over an alphabet of size $n$ whose overlap assembly meets the upper bound and this bound cannot be met with smaller alphabets. Finally, we prove that $m+n$ is a tight upper bound on the overlap assembly of unary languages, and that there are binary languages whose overlap assembly has exponential state complexity at least $m(2^{n-1}-2)+2$.
  • Significant success has been realized recently on applying machine learning to real-world applications. There have also been corresponding concerns on the privacy of training data, which relates to data security and confidentiality issues. Differential privacy provides a principled and rigorous privacy guarantee on machine learning models. While it is common to design a model satisfying a required differential-privacy property by injecting noise, it is generally hard to balance the trade-off between privacy and utility. We show that stochastic gradient Markov chain Monte Carlo (SG-MCMC) -- a class of scalable Bayesian posterior sampling algorithms proposed recently -- satisfies strong differential privacy with carefully chosen step sizes. We develop theory on the performance of the proposed differentially-private SG-MCMC method. We conduct experiments to support our analysis and show that a standard SG-MCMC sampler without any modification (under a default setting) can reach state-of-the-art performance in terms of both privacy and utility on Bayesian learning.
  • The statistical analysis of covariance matrices occurs in many important applications, e.g. in diffusion tensor imaging and longitudinal data analysis. We consider the situation where it is of interest to estimate an average covariance matrix, describe its anisotropy, to carry out principal geodesic analysis and to interpolate between covariance matrices. There are many choices of metric available, each with its advantages. The particular choice of what is best will depend on the particular application. The use of the Procrustes size-and-shape metric is particularly appropriate when the covariance matrices are close to being deficient in rank. We discuss the use of different metrics for diffusion tensor analysis, and we also introduce certain types of regularization for tensors.
  • The radiofrequency (RF) transmit field is severely inhomogeneous at ultrahigh field due to both RF penetration and RF coil design issues. This particularly impairs image quality for sequences that use inversion pulses such as magnetization prepared rapid acquisition gradient echo and limits the use of quantitative arterial spin labeling sequences such as flow-attenuated inversion recovery. Here we have used a search algorithm to produce inversion pulses tailored to take into account the heterogeneity of the RF transmit field at 7 T. This created a slice selective inversion pulse that worked well (good slice profile and uniform inversion) over the range of RF amplitudes typically obtained in the head at 7 T while still maintaining an experimentally achievable pulse length and pulse amplitude in the brain at 7 T. The pulses used were based on the frequency offset correction inversion technique, as well as time dilation of functions, but the RF amplitude, frequency sweep, and gradient functions were all generated using a genetic algorithm with an evaluation function that took into account both the desired inversion profile and the transmit field inhomogeneity.