• Graphics Processing Units (GPUs) support dynamic voltage and frequency scaling (DVFS) in order to balance computational performance and energy consumption. However, there still lacks simple and accurate performance estimation of a given GPU kernel under different frequency settings on real hardware, which is important to decide best frequency configuration for energy saving. This paper reveals a fine-grained model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Over a 2.5x range of both core and memory frequencies among 12 GPU kernels, our model achieves accurate results (within 3.5\%) on real hardware. Compared with the cycle-level simulators, our model only needs some simple micro-benchmark to extract a set of hardware parameters and performance counters of the kernels to produce this high accuracy.
  • With huge amounts of training data, deep learning has made great breakthroughs in many artificial intelligence (AI) applications. However, such large-scale data sets present computational challenges, requiring training to be distributed on a cluster equipped with accelerators like GPUs. With the fast increase of GPU computing power, the data communications among GPUs have become a potential bottleneck on the overall training performance. In this paper, we first propose a general directed acyclic graph (DAG) model to describe the distributed synchronous stochastic gradient descent (S-SGD) algorithm, which has been widely used in distributed deep learning frameworks. To understand the practical impact of data communications on training performance, we conduct extensive empirical studies on four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI, CNTK, MXNet and TensorFlow) over multi-GPU and multi-node environments with different data communication techniques, including PCIe, NVLink, 10GbE, and InfiniBand. Through both analytical and experimental studies, we identify the potential bottlenecks and overheads that could be further optimized. At last, we make the data set of our experimental traces publicly available, which could be used to support simulation-based studies.
  • Recommendation system has been widely used in different areas. Collaborative filtering focuses on rating, ignoring the features of items itself. In order to effectively evaluate customers preferences on books, taking into consideration of the characteristics of offline book retail, we use LDA model to calculate customers preference on book topics and use word2vec to calculate customers preference on book types. When forecasting rating on books, we take two factors into consideration: similarity of customers and correlation between customers and books. The experiment shows that our hybrid recommendation method based on features performances better than single recommendation method in offline book retail data.
  • We propose a Doppler tracking system for gravitational wave detection via Double Optical Clocks in Space (DOCS). In this configuration two spacecrafts (each containing an optical clock) are launched to space for Doppler shift observations. Compared to the similar attempt of gravitational wave detection in the Cassini mission, the radio signal of DOCS that contains the relative frequency changes avoids completely noise effects due for instance to troposphere, ionosphere, ground-based antenna and transponder. Given the high stabilities of the two optical clocks (Allan deviation $\sim 4.1\times 10^{-17}$ @ 1000 s), an overall estimated sensitivity of $5 \times 10^{-19}$ could be achieved with an observation time of 2 years, and would allow to detect gravitational waves in the frequency range from $\sim 10^{-4}$ Hz to $\sim 10^{-2}$ Hz.
  • We study magnitudes and temperature dependences of the electron-electron and electron-phonon interaction times which play the dominant role in the formation and relaxation of photon induced hotspot in two dimensional amorphous WSi films. The time constants are obtained through magnetoconductance measurements in perpendicular magnetic field in the superconducting fluctuation regime and through time-resolved photoresponse to optical pulses. The excess magnetoconductivity is interpreted in terms of the weak-localization effect and superconducting fluctuations. Aslamazov-Larkin, and Maki-Thompson superconducting fluctuation alone fail to reproduce the magnetic field dependence in the relatively high magnetic field range when the temperature is rather close to Tc because the suppression of the electronic density of states due to the formation of short lifetime Cooper pairs needs to be considered. The time scale {\tau}_i of inelastic scattering is ascribed to a combination of electron-electron ({\tau}_(e-e)) and electron-phonon ({\tau}_(e-ph)) interaction times, and a characteristic electron-fluctuation time ({\tau}_(e-fl)), which makes it possible to extract their magnitudes and temperature dependences from the measured {\tau}_i. The ratio of phonon-electron ({\tau}_(ph-e)) and electron-phonon interaction times is obtained via measurements of the optical photoresponse of WSi microbridges. Relatively large {\tau}_(e-ph)/{\tau}_(ph-e) and {\tau}_(e-ph)/{\tau}_(e-e) ratios ensure that in WSi the photon energy is more efficiently confined in the electron subsystem than in other materials commonly used in the technology of superconducting nanowire single-photon detectors (SNSPDs). We discuss the impact of interaction times on the hotspot dynamics and compare relevant metrics of SNSPDs from different materials.
  • We find the number of compositions over finite abelian groups under two types of restrictions: (i) each part belongs to a given subset and (ii) small runs of consecutive parts must have given properties. Waring's problem over finite fields can be converted to type~(i) compositions, whereas Carlitz and locally Mullen compositions can be formulated as type~(ii) compositions. We use the multisection formula to translate the problem from integers to group elements, the transfer matrix method to do exact counting, and finally the Perron-Frobenius theorem to derive asymptotics. We also exhibit bijections involving certain restricted classes of compositions.
  • We study theoretically spin transport through a single-molecule magnet (SMM) in the sequential and cotunneling regimes, where the SMM is weakly coupled to one ferromagnetic and one normalmetallic leads. By a master-equation approach, it is found that the spin polarization injected from the ferromagnetic lead is amplified and highly polarized spin-current can be generated, due to the exchange coupling between the transport electron and the anisotropic spin of the SMM. Moreover, the spin-current polarization can be tuned by the gate or bias voltage, and thus an efficient spin injection device based on the SMM is proposed in molecular spintronics.
  • A remarkable quantitative agreement is found between the non-Markovian quantum kinetic approach and the time-dependent Dirac equation approach for a large region of Keldysh parameter, in the investigation of electron-positron pair production in the electric fields which is spatially homogeneous and envelope pulse shaped. If a sub-critical bound potential is immersed in this background field, the TDDE results show that the creation probability will be enhanced by the bound states resonance by two orders of magnitude. We also establish a computing resources greatly saved TDDE formalism for spatially homogeneous field.
  • In this paper, we construct some new classes of complete permutation monomials with exponent $d=\frac{q^n-1}{q-1}$ using AGW criterion (a special case). This proves two recent conjectures in [Wuetal2] and extends some of these recent results to more general $n$'s.
  • Weyl fermions have not been found in nature as elementary particles, but they emerge as nodal points in the band structure of electronic and classical wave crystals. Novel phenomena such as Fermi arcs and chiral anomaly have fueled the interest in these topological points which are frequently perceived as monopoles in momentum space. Here we report the experimental observation of generalized optical Weyl points inside the parameter space of a photonic crystal with a specially designed four-layer unit cell. The reflection at the surface of a truncated photonic crystal exhibits phase vortexes due to the synthetic Weyl points, which in turn guarantees the existence of interface states between photonic crystals and any reflecting substrates. The reflection phase vortexes have been confirmed for the first time in our experiments which serve as an experimental signature of the generalized Weyl points. The existence of these interface states is protected by the topological properties of the Weyl points and the trajectories of these states in the parameter space resembles those of Weyl semimetal "Fermi arcs surface states" in momentum space. Tracing the origin of interface states to the topological character of the parameter space paves the way for a rational design of strongly localized states with enhanced local field.
  • Permutation polynomials over finite fields have been studied extensively recently due to their wide applications in cryptography, coding theory, communication theory, among others. Recently, several authors have studied permutation trinomials of the form $x^rh\left(x^{q-1}\right)$ over $\mathbb{F}_{q^2}$, where $q=2^k$, $h(x)=1+x^s+x^t$ and $r, s, t, k>0$ are integers. Their methods are essentially usage of a multiplicative version of AGW Criterion because they all transformed the problem of proving permutation polynomials over $\mathbb{F}_{q^2}$ into that of showing the corresponding fractional polynomials permute a smaller set $\mu_{q+1}$, where $\mu_{q+1}:=\{x\in\mathbb{F}_{q^2} : x^{q+1}=1\}$. Motivated by these results, we characterize the permutation polynomials of the form $x^rh\left(x^{q-1}\right)$ over $\mathbb{F}_{q^2}$ such that $h(x)\in\mathbb{F}_q[x]$ is arbitrary and $q$ is also an arbitrary prime power. Using AGW Criterion twice, one is multiplicative and the other is additive, we reduce the problem of proving permutation polynomials over $\mathbb{F}_{q^2}$ into that of showing permutations over a small subset $S$ of a proper subfield $\mathbb{F}_{q}$, which is significantly different from previously known methods. In particular, we demonstrate our method by constructing many new explicit classes of permutation polynomials of the form $x^rh\left(x^{q-1}\right)$ over $\mathbb{F}_{q^2}$. Moreover, we can explain most of the known permutation trinomials, which are in [6, 13, 14, 16, 20, 29], over finite field with even characteristic.
  • Quantum key distribution (QKD) uses individual light quanta in quantum superposition states to guarantee unconditional communication security between distant parties. In practice, the achievable distance for QKD has been limited to a few hundred kilometers, due to the channel loss of fibers or terrestrial free space that exponentially reduced the photon rate. Satellite-based QKD promises to establish a global-scale quantum network by exploiting the negligible photon loss and decoherence in the empty out space. Here, we develop and launch a low-Earth-orbit satellite to implement decoy-state QKD with over kHz key rate from the satellite to ground over a distance up to 1200 km, which is up to 20 orders of magnitudes more efficient than that expected using an optical fiber (with 0.2 dB/km loss) of the same length. The establishment of a reliable and efficient space-to-ground link for faithful quantum state transmission constitutes a key milestone for global-scale quantum networks.
  • We propose an optimized design for nanowire superconducting single photon detectors, using the recently discovered position dependent detection efficiency in these devices. This knowledge allows an optimized the design of meandering wire NbN detectors by altering the field distribution across the wire. In order to calculate the response of the detectors with different geometries, we use a monotonic local detection efficiency from a nanowire and optical absorption distribution via finite-different-time-domain simulations. The calculations predict a trade-off between average absorption and the edge effect leading to a predicted optimal wire width close to 100 nm for 1550 nm wavelength, which drops to 50 nm wire width for 600 nm wavelength. The absorption at the edges can be enhanced by depositing a silicon nanowire on top of the superconducting nanowire, which improves both the total absorption efficiency as well as the internal detection efficiency of meandering wire structures.
  • Self-supported electrocatalysts being generated and employed directly as electrode for energy conversion has been intensively pursued in the fields of materials chemistry and energy. Herein, we report a synthetic strategy to prepare freestanding hierarchically structured, nitrogen-doped nanoporous graphitic carbon membranes functionalized with Janus-type Co/CoP nanocrystals (termed as HNDCM-Co/CoP), which were successfully applied as a highly-efficient, binder-free electrode in hydrogen evolution reaction (HER). Benefited from multiple structural merits, such as high degree of graphitization, three-dimensionally interconnected micro-/meso-/macropores, uniform nitrogen-doping, well-dispersed Co/CoP nanocrystals as well as the confinement effect of the thin carbon layer on the nanocrystals, HNDCM-Co/CoP exhibited superior electrocatalytic activity and long-term operation stability for HER under both acid and alkaline conditions. As a proof-of-concept of practical usage, a macroscopic piece of HNDCM-Co/CoP of 5.6 cm x 4 cm x 60 um in size was prepared in our laboratory. Driven by a solar cell, electroreduction of water in alkaline condition (pH 14) was performed, and H2 has been produced at a rate of 16 ml/min, demonstrating its potential as real-life energy conversion systems.
  • Let $p$ be an odd prime, $n$ a positive integer and $g$ a primitive root of $p^n$. Suppose $D_i^{(p^n)}=\{g^{2s+i}|s=0,1,2,\cdots,\frac{(p-1)p^{n-1}}{2}\}$, $i=0,1$, is the generalized cyclotomic classes with $Z_{p^n}^{\ast}=D_0\cup D_1$. In this paper, we prove that Gauss periods based on $D_0$ and $D_1$ are both equal to 0 for $n\geq2$. As an application, we determine a lower bound on the 2-adic complexity of a class of Ding-Helleseth generalized cyclotomic sequences of period $p^n$. The result shows that the 2-adic complexity is at least $p^n-p^{n-1}-1$, which is larger than $\frac{N+1}{2}$, where $N=p^n$ is the period of the sequence.
  • Let $p,q$ be distinct primes satisfying $\mathrm{gcd}(p-1,q-1)=d$ and let $D_i$, $i=0,1,\cdots,d-1$, be Whiteman's generalized cyclotomic classes with $Z_{pq}^{\ast}=\cup_{i=0}^{d-1}D_i$. In this paper, we give the values of Gauss periods based on the generalized cyclotomic sets $D_0^{\ast}=\sum_{i=0}^{\frac{d}{2}-1}D_{2i}$ and $D_1^{\ast}=\sum_{i=0}^{\frac{d}{2}-1}D_{2i+1}$. As an application, we determine a lower bound on the 2-adic complexity of modified Jacobi sequence. Our result shows that the 2-adic complexity of modified Jacobi sequence is at least $pq-p-q-1$ with period $N=pq$. This indicates that the 2-adic complexity of modified Jacobi sequence is large enough to resist the attack of the rational approximation algorithm (RAA) for feedback with carry shift registers (FCSRs).
  • Discriminant Correlation Filters (DCF) based methods now become a kind of dominant approach to online object tracking. The features used in these methods, however, are either based on hand-crafted features like HoGs, or convolutional features trained independently from other tasks like image classification. In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously. Specifically, we treat DCF as a special correlation filter layer added in a Siamese network, and carefully derive the backpropagation through it by defining the network output as the probability heatmap of object location. Since the derivation is still carried out in Fourier frequency domain, the efficiency property of DCF is preserved. This enables our tracker to run at more than 60 FPS during test time, while achieving a significant accuracy gain compared with KCF using HoGs. Extensive evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the proposed DCFNet tracker is competitive with several state-of-the-art trackers, while being more compact and much faster.
  • In modern stream cipher, there are many algorithms, such as ZUC, LTE encryption algorithm and LTE integrity algorithm, using bit-component sequences of $p$-ary $m$-sequences as the input of the algorithm. Therefore, analyzing their statistical property (For example, autocorrelation, linear complexity and 2-adic complexity) of bit-component sequences of $p$-ary $m$-sequences is becoming an important research topic. In this paper, we first derive some autocorrelation properties of LSB (Least Significant Bit) sequences of $p$-ary $m$-sequences, i.e., we convert the problem of computing autocorrelations of LSB sequences of period $p^n-1$ for any positive $n\geq2$ to the problem of determining autocorrelations of LSB sequence of period $p-1$. Then, based on this property and computer calculation, we list some autocorrelation distributions of LSB sequences of $p$-ary $m$-sequences with order $n$ for some small primes $p$'s, such as $p=3,5,7,11,17,31$. Additionally, using their autocorrelation distributions and the method inspired by Hu, we give the lower bounds on the 2-adic complexities of these LSB sequences. Our results show that the main parts of all the lower bounds on the 2-adic complexity of these LSB sequencesare larger than $\frac{N}{2}$, where $N$ is the period of these sequences. Therefor, these bounds are large enough to resist the analysis of RAA (Rational Approximation Algorithm) for FCSR (Feedback with Carry Shift Register). Especially, for a Mersenne prime $p=2^k-1$, since all its bit-component sequences of a $p$-ary $m$-sequence are shift equivalent, our results hold for all its bit-component sequences.
  • Pseudo-random sequences with good statistical property, such as low autocorrelation, high linear complexity and large 2-adic complexity, have been applied in stream cipher. In general, it is difficult to give both the linear complexity and 2-adic complexity of a periodic binary sequence. Cai and Ding \cite{Cai Ying} gave a class of sequences with almost optimal autocorrelation by constructing almost difference sets. Wang \cite{Wang Qi} proved that one type of those sequences by Cai and Ding has large linear complexity. Sun et al. \cite{Sun Yuhua} showed that another type of sequences by Cai and Ding has also large linear complexity. Additionally, Sun et al. also generalized the construction by Cai and Ding using $d$-form function with difference-balanced property. In this paper, we first give the detailed autocorrelation distribution of the sequences was generalized from Cai and Ding \cite{Cai Ying} by Sun et al. \cite{Sun Yuhua}. Then, inspired by the method of Hu \cite{Hu Honggang}, we analyse their 2-adic complexity and give a lower bound on the 2-adic complexity of these sequences. Our result show that the 2-adic complexity of these sequences is at least $N-\mathrm{log}_2\sqrt{N+1}$ and that it reach $N-1$ in many cases, which are large enough to resist the rational approximation algorithm (RAA) for feedback with carry shift registers (FCSRs).
  • Deep learning has been shown as a successful machine learning method for a variety of tasks, and its popularity results in numerous open-source deep learning software tools. Training a deep network is usually a very time-consuming process. To address the computational challenge in deep learning, many tools exploit hardware features such as multi-core CPUs and many-core GPUs to shorten the training time. However, different tools exhibit different features and running performance when training different types of deep networks on different hardware platforms, which makes it difficult for end users to select an appropriate pair of software and hardware. In this paper, we aim to make a comparative study of the state-of-the-art GPU-accelerated deep learning software tools, including Caffe, CNTK, MXNet, TensorFlow, and Torch. We first benchmark the running performance of these tools with three popular types of neural networks on two CPU platforms and three GPU platforms. We then benchmark some distributed versions on multiple GPUs. Our contribution is two-fold. First, for end users of deep learning tools, our benchmarking results can serve as a guide to selecting appropriate hardware platforms and software tools. Second, for software developers of deep learning tools, our in-depth analysis points out possible future directions to further optimize the running performance.
  • Nanoporous graphitic carbon membranes with defined chemical composition and pore architecture are novel nanomaterials that are actively pursued. Compared to easy-to-make porous carbon powders that dominate the porous carbon research and applications in energy generation/conversion and environmental remediation, porous carbon membranes are synthetically more challenging though rather appealing from an application perspective due to their structural integrity, interconnectivity and purity. Here we report a simple bottom-up approach to fabricate large-size, freestanding, porous carbon membranes that feature an unusual single-crystal-like graphitic order and hierarchical pore architecture plus favorable nitrogen doping. When loaded with cobalt nanoparticles, such carbon membranes serve as high-performance carbon-based non-noble metal electrocatalyst for overall water splitting.
  • In this note, we give a shorter proof of the result of Zheng, Yu, and Pei on the explicit formula of inverses of generalized cyclotomic permutation polynomials over finite fields. Moreover, we characterize all these cyclotomic permutation polynomials that are involutions. Our results provide a fast algorithm (only modular operations are involved) to generate many classes of generalized cyclotomic permutation polynomials, their inverses, and involutions.
  • A ThO$_{2}$ sample and a nickel activation foil were irradiated in the leakage neutron field of CFBR-II reactor. The activities of the activation products were measured after irradiation to obtain the reaction rates. The normalized reaction rates were also calculated based on the ENDF/B-VII.1, CENDL-3.1, JENDL-4.0, BROND-2.2 databases. The experimental reaction rate ratio is 4.37 with an uncertainty of 3.9\% which is coincident with each of the ratios calculated based on the ENDFB-VII. 1, JENDL-4.0, BROND-2.2 databases, but is 11.2\% larger than that based on CENDL-3.1 database.
  • Weyl fermions1 do not appear in nature as elementary particles, but they are now found to exist as nodal points in the band structure of electronic and classical wave crystals. Novel phenomena such as Fermi arcs and chiral anomaly have fueled the interest of these topological points which are frequently perceived as monopoles in momentum space. Here, we demonstrate that generalized Weyl points can exist in a parameter space and we report the first observation of such nodal points in one-dimensional photonic crystals in the optical range. The reflection phase inside the band gap of a truncated photonic crystal exhibits vortexes in the parameter space where the Weyl points are defined and they share the same topological charges as the Weyl points. These vortexes guarantee the existence of interface states, the trajectory of which can be understood as "Fermi arcs" emerging from the Weyl nodes.
  • Energy efficiency has become one of the top design criteria for current computing systems. The dynamic voltage and frequency scaling (DVFS) has been widely adopted by laptop computers, servers, and mobile devices to conserve energy, while the GPU DVFS is still at a certain early age. This paper aims at exploring the impact of GPU DVFS on the application performance and power consumption, and furthermore, on energy conservation. We survey the state-of-the-art GPU DVFS characterizations, and then summarize recent research works on GPU power and performance models. We also conduct real GPU DVFS experiments on NVIDIA Fermi and Maxwell GPUs. According to our experimental results, GPU DVFS has significant potential for energy saving. The effect of scaling core voltage/frequency and memory voltage/frequency depends on not only the GPU architectures, but also the characteristic of GPU applications.