• This paper presents our work on designing a parallel platform for large-scale reservoir simulations. Detailed components, such as grid and linear solver, and data structures are introduced, which can serve as a guide to parallel reservoir simulations and other parallel applications. The main objective of platform is to support implementation of various parallel reservoir simulators on distributed-memory parallel systems, where MPI (Message Passing Interface) is employed for communications among computation nodes. It provides structured grid due to its simplicity and cell-centered data is applied for each cell. The platform has a distributed matrix and vector module and a map module. The matrix and vector module is the base of our parallel linear systems. The map connects grid and linear system modules, which defines various mappings between grid and linear systems. Commonly-used Krylov subspace linear solvers are implemented, including the restarted GMRES method and the BiCGSTAB method. It also has an interface to a parallel algebraic multigrid solver, BoomerAMG from HYPRE. Parallel general-purpose preconditioners and special preconditioners for reservoir simulations are also developed. Various data structures are designed, such as grid, cell, data, linear solver and preconditioner, and some key default parameters are presented in this paper. The numerical experiments show that our platform has excellent scalability and it can simulate giant reservoir models with hundreds of millions of grid cells using thousands of CPU cores.
  • We consider the classical coded caching problem as defined by Maddah-Ali and Niesen, where a server with a library of $N$ files of equal size is connected to $K$ users via a shared error-free link. Each user is equipped with a cache with capacity of $M$ files. The goal is to design a static content placement and delivery scheme such that the average load over the shared link is minimized. We first present a class of centralized coded caching schemes consisting of a general content placement strategy specified by a file partition parameter, enabling efficient and flexible content placement, and a specific content delivery strategy, enabling load reduction by exploiting common requests of different users. For the proposed class of schemes, we consider two cases for the optimization of the file partition parameter, depending on whether a large subpacketization level is allowed or not. In the case of an unrestricted subpacketization level, we formulate the coded caching optimization in order to minimize the average load under an arbitrary file popularity. A direct formulation of the problem involves $N2^K$ variables. By imposing some additional conditions, the problem is reduced to a linear program with $N(K+1)$ variables under an arbitrary file popularity and with $K+1$ variables under the uniform file popularity. We can recover Yu {\em et al.}'s optimal scheme for the uniform file popularity as an optimal solution of our problem. When a low subpacketization level is desired, we introduce a subpacketization level constraint involving the $\ell_0$ norm for each file. Again, by imposing the same additional conditions, we can simplify the problem to a difference of two convex functions (DC) problem with $N(K+1)$ variables that can be efficiently solved.
  • Pinning control of a complex network aims at forcing the states of all nodes to track an external signal by controlling a small number of nodes in the network. In this paper, an algebraic graph-theoretic condition is proposed to optimize pinning control. When individual node dynamics and coupling strength of the network are given, the effectiveness of pinning control can be measured by the smallest eigenvalue of the grounded Laplacian matrix obtained by deleting the rows and columns corresponding to the pinned nodes from the Laplacian matrix of the network. The larger this smallest eigenvalue, the more effective the pinning control. Spectral properties of the smallest eigenvalue are analyzed using the network topology information, including the spectrum of the network Laplacian matrix, the minimal degree of uncontrolled nodes, the number of edges between the controlled node set and the uncontrolled node set, etc. The obtained properties are shown effective for optimizing the pinning control strategy, and demonstrated by illustrative examples. Finally, for both scale-free and small-world networks, in order to maximize their corresponding smallest eigenvalues, it is better to pin the nodes with large degrees when the percentage of pinned nodes is relatively small, while it is better to pin nodes with small degrees when the percentage is relatively large. This surprising phenomenon can be explained by one of the theorems established.
  • Mobile virtual reality (VR) delivery is gaining increasing attention from both industry and academia due to its ability to provide an immersive experience. However, achieving mobile VR delivery requires ultra-high transmission rate, deemed as a first killer application for 5G wireless networks. In this paper, in order to alleviate the traffic burden over wireless networks, we develop an implementation framework for mobile VR delivery by utilizing caching and computing capabilities of mobile VR device. We then jointly optimize the caching and computation offloading policy for minimizing the required average transmission rate under the latency and local average energy consumption constraints. In a symmetric scenario, we obtain the optimal joint policy and the closed-form expression of the minimum average transmission rate. Accordingly, we analyze the tradeoff among communication, computing and caching, and then reveal analytically the fact that the communication overhead can be traded by the computing and caching capabilities of mobile VR device, and also what conditions must be met for it to happen. Finally, we discuss the optimization problem in a heterogeneous scenario, and propose an efficient suboptimal algorithm with low computation complexity, which is shown to achieve good performance in the numerical results.
  • It's well known that the n-sphere $S^n$ is the universal double covering of the $n$-dimensional real projective space $\mathbb{R}P^n$ and then any Finsler metric on $\mathbb{R}P^n$ induces a Finsler metric of $S^n$. In this paper, we prove that for every Finsler $(S^n, F)$ for $n\geq3$ whose metric is induced by irreversible Finsler $(\mathbb{R}P^n,F)$ with reversibility $\lambda$ and flag curvature $K$ satisfying $(\frac{\lambda}{\lambda+1})^2<K\leq 1$, there exist at least $n-1$ prime closed geodesics on $(S^n, F)$. Furthermore, if there exist finitely many distinct closed geodesics on $(S^n, F)$, then there exist at least $2[\frac{n}{2}]-1$ of them are non-hyperbolic.
  • In traffic systems, cooperative driving has attracted the researchers attentions. A lot of works attempt to understand the effects of cooperative driving behavior and/or time delays on traffic flow dynamics for specific traffic flow model. This paper is a new attempt to investigate analyses of linear stability and weak nonlinear for the general car-following model with consideration of cooperation and time delays. We derive linear stability condition and study that how the combinations of cooperation and time delays affect the stability of traffic flow. Burgers equation and Korteweg de Vries (KdV) equation for car-following model considering cooperation and time delays are derived. Their solitary wave solutions and constraint conditions are concluded. We investigate the property of cooperative optimal velocity(OV) model which estimates the combinations of cooperation and time delays about the evolution of traffic waves using both analytic and numerical methods. The results indicate that delays and cooperation are model-dependent, and cooperative behavior could inhibit the stabilization of traffic flow. Moreover, delays of sensing to relative motion are easy to trigger the traffic waves; delays of sensing to host vehicle are beneficial to relieve the instability effect a certain extent.
  • Chiral Magnetic Effect(CME) is usually believed not receiving higher order corrections due to the non-renormalization of AVV triangle diagram in the framework of quantum field theory. However, the CME-relevant triangle, which is obtained by expanding the current-current correlation requires zero momentum on the axial vertex, is not equivalent to the general AVV triangle when taking the zero-momentum limit owing to the infrared problem on the axial vertex. Therefore, it is still significant to check if there exists perturbative higher order corrections to the current-current correlation. In this paper, we explicitly calculate the two-loop corrections of CME within NJL model with Chern-Simons term which ensures a consistent $\mu_5$. The result shows the two-loop corrections to the CME conductivity are zero, which confirms the non-renomalization of CME conductivity.
  • In recent years, high performance scientific computing on graphics processing units (GPUs) have gained widespread acceptance. These devices are designed to offer massively parallel threads for running code with general purpose. There are many researches focus on finite element method with GPUs. However, most of the works are specific to certain problems and applications. Some works propose methods for finite element assembly that is general for a wide range of finite element models. But the development of finite element code is dependent on the hardware architectures. It is usually complicated and error prone using the libraries provided by the hardware vendors. In this paper, we present architecture and implementation of finite element assembly for partial differential equations (PDEs) based on symbolic computation and runtime compilation technique on GPU. User friendly programming interface with symbolic computation is provided. At the same time, high computational efficiency is achieved by using runtime compilation technique. As far as we know, it is the first work using this technique to accelerate finite element assembly for solving PDEs. Experiments show that a one to two orders of speedup is achieved for the problems studied in the paper.
  • Starting from well-known absolute instruments for perfect imaging, we introduce a type of rotational-symmetrical compact closed manifolds, namely geodesic lenses. We demonstrate that light rays confined on geodesic lenses are closed trajectories. While for optical waves, the spectrum of geodesic lens is (at least approximately) degenerate and equidistant with numerical methods. Based on this property, we show a periodical evolution of optical waves and quantum waves on geodesic lenses. Moreover, we fabricate two geodesic lenses in sub-micrometer scale, where curved light rays are observed with high accurate precision. Our results may offer a new platform to investigate light propagation on curved surfaces.
  • In this paper, we experimentally demonstrate reversible wavefront shaping through mimicking gravitational field. A gradient-index micro-structured optical waveguide with special refractive index profile was constructed whose effective index satisfying a gravitational field profile. Inside the waveguide, an incident broad Gaussian beam is firstly transformed into an accelerating beam, and the generated accelerating beam is gradually changed back to a Gaussian beam afterwards. To validate our experiment, we performed full-wave continuum simulations that agree with the experimental results. Furthermore, a theoretical model was established to describe the evolution of the laser beam based on Landau's method, showing that the accelerating beam behaves like the Airy beam in the small range in which the linear potential approaches zero. To our knowledge, such a reversible wavefront shaping technique has not been reported before.
  • Weyl fermions have not been found in nature as elementary particles, but they emerge as nodal points in the band structure of electronic and classical wave crystals. Novel phenomena such as Fermi arcs and chiral anomaly have fueled the interest in these topological points which are frequently perceived as monopoles in momentum space. Here we report the experimental observation of generalized optical Weyl points inside the parameter space of a photonic crystal with a specially designed four-layer unit cell. The reflection at the surface of a truncated photonic crystal exhibits phase vortexes due to the synthetic Weyl points, which in turn guarantees the existence of interface states between photonic crystals and any reflecting substrates. The reflection phase vortexes have been confirmed for the first time in our experiments which serve as an experimental signature of the generalized Weyl points. The existence of these interface states is protected by the topological properties of the Weyl points and the trajectories of these states in the parameter space resembles those of Weyl semimetal "Fermi arcs surface states" in momentum space. Tracing the origin of interface states to the topological character of the parameter space paves the way for a rational design of strongly localized states with enhanced local field.
  • Due to the explosive growth in multimedia traffic, the scalability of video-on-demand (VoD) services becomes increasingly important. By exploiting the potential cache ability at the client side, the performance of VoD multicast delivery can be improved through video segment pre-caching. In this paper, we address the performance limits of client caching enabled VoD schemes in wireless multicast networks with asynchronous requests. Both reactive and proactive systems are investigated. Specifically, for the reactive system where videos are transmitted on demand, we propose a joint cache allocation and multicast delivery scheme to minimize the average bandwidth consumption under the zero-delay constraint. For the proactive system where videos are periodically broadcasted, a joint design of the cache-bandwidth allocation algorithm and the delivery mechanism is developed to minimize the average waiting time under the total bandwidth constraint. In addition to the full access pattern where clients view videos in their entirety, we further consider the access patterns with random endpoints, fixed-size intervals and downloading demand, respectively. The impacts of different access patterns on the resource-allocation algorithm and the delivery mechanism are elaborated. Simulation results validate the accuracy of the analytical results and also provide useful insights in designing VoD networks with client caching.
  • The goal of load balancing (grid partitioning) is to minimize overall computations and communications, and to make sure that all processors have a similar workload. Geometric methods divide a grid by using a location of a cell while topological methods work with connectivity of cells, which is generally described as a graph. This paper introduces a Hilbert space-filling curve method. A space-filling curve is a continuous curve and defines a map between a one-dimensional space and a multi-dimensional space. A Hilbert space-filling curve is one special space-filling curve discovered by Hilbert and has many useful characteristics, such as good locality, which means that two objects that are close to each other in a multi-dimensional space are also close to each other in a one dimensional space. This property can model communications in grid-based parallel applications. The idea of the Hilbert space-filling curve method is to map a computational domain into a one-dimensional space, partition the one-dimensional space to certain intervals, and assign all cells in a same interval to a MPI. To implement a load balancing method, a mapping kernel is required to convert high-dimensional coordinates to a scalar value and an efficient one-dimensional partitioning module that divides a one-dimensional space and makes sure that all intervals have a similar workload. The Hilbert space-filling curve method is compared with ParMETIS, a famous graph partitioning package. The results show that our Hilbert space-filling curve method has good partition quality. It has been applied to grids with billions of cells, and linear scalability has been obtained on IBM Blue Gene/Q.
  • Let $M=S^n/ \Gamma$ and $h$ be a nontrivial element of finite order $p$ in $\pi_1(M)$, where the integer $n\geq2$, $\Gamma$ is a finite group which acts freely and isometrically on the $n$-sphere and therefore $M$ is diffeomorphic to a compact space form. In this paper, we establish first the resonance identity for non-contractible homologically visible minimal closed geodesics of the class $[h]$ on every Finsler compact space form $(M, F)$ when there exist only finitely many distinct non-contractible closed geodesics of the class $[h]$ on $(M, F)$. Then as an application of this resonance identity, we prove the existence of at least two distinct non-contractible closed geodesics of the class $[h]$ on $(M, F)$ with a bumpy Finsler metric, which improves a result of Taimanov in [Taimanov 2016] by removing some additional conditions. Also our results extend the resonance identity and multiplicity results on $\mathcal{R}P^n$ in [arXiv:1607.02746] to general compact space forms.
  • In this paper, we establish first the resonance identity for non-contractible homologically visible prime closed geodesics on Finsler $n$-dimensional real projective space $(\mathbb{R}P^n,F)$ when there exist only finitely many distinct non-contractible closed geodesics on $(\mathbb{R}P^n,F)$, where the integer $n\geq2$. Then as an application of this resonance identity, we prove the existence of at least two distinct non-contractible closed geodesics on $\mathbb{R}P^{n}$ with a bumpy and irreversible Finsler metric. Together with two previous results on bumpy and reversible Finsler metrics in \cite{DLX2015} and \cite{Tai2016}, it yields that every $\mathbb{R}P^{n}$ with a bumpy Finsler metric possesses at least two distinct non-contractible closed geodesics.
  • A centralized coded caching scheme has been proposed by Maddah-Ali and Niesen to reduce the worst-case load of a network consisting of a server with access to N files and connected through a shared link to K users, each equipped with a cache of size M. However, this centralized coded caching scheme is not able to take advantage of a non-uniform, possibly very skewed, file popularity distribution. In this work, we consider the same network setting but aim to reduce the average load under an arbitrary (known) file popularity distribution. First, we consider a class of centralized coded caching schemes utilizing general uncoded placement and a specific coded delivery strategy, which are specified by a general file partition parameter. Then, we formulate the coded caching design optimization problem over the considered class of schemes with 2^K2^N variables to minimize the average load by optimizing the file partition parameter under an arbitrary file popularity. Furthermore, we show that the optimization problem is convex, and the resulting optimal solution generally improves upon known schemes. Next, we analyze structural properties of the optimization problem to obtain design insights and reduce the complexity. Specifically, we obtain an equivalent linear optimization problem with (K+1)N variables under an arbitrary file popularity and an equivalent linear optimization problem with K+1 variables under the uniform file popularity. Under the uniform file popularity, we also obtain the closed form optimal solution, which corresponds to Maddah-Ali-Niesen's centralized coded caching scheme. Finally, we present an information-theoretic converse bound on the average load under an arbitrary file popularity.
  • Transformation optics (TO) has been used to propose various novel optical devices. With the help of metamaterials, several intriguing designs, such as invisibility cloaks, have been implemented. However, as the basic units should be much smaller than the working wavelengths to achieve the effective material parameters, and the sizes of devices should be much larger than the wavelengths of illumination to work within the light-ray approximation, it is a big challenge to implement an experimental system that works simultaneously for both geometric optics and wave optics. In this letter, by using a gradient-index micro-structured optical waveguide, we realize a device of conformal transformation optics (CTO) and demonstrate its self-focusing property for geometry optics and Talbot effect for wave optics. In addition, the Talbot effect in such a system has a potential application to transfer digital information without diffraction. Our findings demonstrate the photon controlling ability of CTO in a feasible experiment system.
  • Joint pushing and caching is recognized as an efficient remedy to the problem of spectrum scarcity incurred by tremendous mobile data traffic. In this paper, by exploiting storage resources at end users and predictability of user demand processes, we design the optimal joint pushing and caching policy to maximize bandwidth utilization, which is of fundamental importance to mobile telecom carriers. In particular, we formulate the stochastic optimization problem as an infinite horizon average cost Markov Decision Process (MDP), for which there generally exist only numerical solutions without many insights. By structural analysis, we show how the optimal policy achieves a balance between the current transmission cost and the future average transmission cost. In addition, we show that the optimal average transmission cost decreases with the cache size, revealing a tradeoff between the cache size and the bandwidth utilization. Then, due to the fact that obtaining a numerical optimal solution suffers the curse of dimensionality and implementing it requires a centralized controller and global system information, we develop a decentralized policy with polynomial complexity w.r.t. the numbers of users and files as well as cache size, by a linear approximation of the value function and optimization relaxation techniques. Next, we propose an online decentralized algorithm to implement the proposed low-complexity decentralized policy using the technique of Q-learning, when priori knowledge of user demand processes is not available. Finally, using numerical results, we demonstrate the advantage of the proposed solutions over some existing designs. The results in this paper offer useful guidelines for designing practical cache-enabled content-centric wireless networks.
  • Stimulated by the exciting progress in the observation of new bottomonium states, we study the bottomonium spectrum. To calculate the mass spectrum, we adopt a nonrelativistic screened potential model. The radial Schr\"{o}dinger equation is solved with the three-point difference central method, where the spin-dependent potentials are dealt with non-perturbatively. With this treatment, the corrections of the spin-dependent potentials to the wave functions can be included successfully. Furthermore, we calculate the electromagnetic transitions of the $nS$ ($n\leq 4$), $nP$ ($n\leq 3$), and $nD$ ($n\leq 2$) bottomonium states with a nonrelativistic electromagnetic transition operator widely applied to meson photoproduction reactions. Our predicted masses, hyperfine and fine splittings, electromagnetic transition widths and branching ratios of the bottomonium states are in good agreement with the available experimental data. Especially, the EM transitions of $\Upsilon(3S)\to \chi_{b1,2}(1P)\gamma$, which were not well understood in previous studies, can be reasonably explained by considering the corrections of the spin-dependent interactions to the wave functions. We also discuss the observations of the missing bottomonium states by using radiative transitions. Some important radiative decay chains involving the missing bottomonium states are suggested to be observed. We hope our study can provide some useful references to observe and measure the properties of bottomonium mesons in forthcoming experiments.
  • We propose a framework employing stochastic differential equations to facilitate the long-term stability analysis of power grids with intermittent wind power generations. This framework takes into account the discrete dynamics which play a critical role in the long-term stability analysis, incorporates the model of wind speed with different probability distributions, and also develops an approximation methodology (by a deterministic hybrid model) for the stochastic hybrid model to reduce the computational burden brought about by the uncertainty of wind power. The theoretical and numerical studies show that a deterministic hybrid model can provide an accurate trajectory approximation and stability assessments for the stochastic hybrid model under mild conditions. In addition, we discuss the critical cases that the deterministic hybrid model fails and discover that these cases are caused by a violation of the proposed sufficient conditions. Such discussion complements the proposed framework and methodology and also reaffirms the importance of the stochastic hybrid model when the system operates close to its stability limit.
  • Quantum digital signatures (QDS) provide a means for signing electronic communications with informationtheoretic security. However, all previous demonstrations of quantum digital signatures assume trusted measurement devices. This renders them vulnerable against detector side-channel attacks, just like quantum key distribution. Here, we exploit a measurement-device-independent (MDI) quantum network, over a 200-square-kilometer metropolitan area, to perform a field test of a three-party measurement-device-independent quantum digital signature (MDI-QDS) scheme that is secure against any detector side-channel attack. In so doing, we are able to successfully sign a binary message with a security level of about 1E-7. Remarkably, our work demonstrates the feasibility of MDI-QDS for practical applications.
  • This research investigates the implementation mechanism of block-wise ILU(k) preconditioner on GPU. The block-wise ILU(k) algorithm requires both the level k and the block size to be designed as variables. A decoupled ILU(k) algorithm consists of a symbolic phase and a factorization phase. In the symbolic phase, a ILU(k) nonzero pattern is established from the point-wise structure extracted from a block-wise matrix. In the factorization phase, the block-wise matrix with a variable block size is factorized into a block lower triangular matrix and a block upper triangular matrix. And a further diagonal factorization is required to perform on the block upper triangular matrix for adapting a parallel triangular solver on GPU.We also present the numerical experiments to study the preconditioner actions on different k levels and block sizes.
  • We study analytically the one-loop contribution to the Chiral Magnetic Effect(CME) using lattice regularization with a Wilson fermion field. In the continuum limit, we find that the chiral magnetic current vanishes at nonzero temperature but emerges at zero temperature consistent with that found by Pauli-Villas regularization. For finite lattice size, however, the chiral magnetic current is nonvanishing at nonzero temperature. But the numerical vaule of the coefficient of CME current is very small compared with that extracted from the full QCD simulation for the same lattice parameters. The possibility of higher order corrections from QCD dynamics is also assessed.
  • In this paper, we investigate the signal shaping in a two-user discrete time memoryless Gaussian multiple-access channel (MAC) with computation. It is shown that by optimizing input probability distribution, the transmission rate per transmitter is beyond the cut-set bound. In contrast with the single-user discrete memoryless channel, the Maxwell-Boltzmann distribution is no longer a good approximation to the optimal input probability distribution for this discrete-time Gaussian MAC with computation. Specifically, we derive and analyze the mutual information for this channel. Because of the computation in the destination, the mutual information is not concave in general on the input probability distribution, and then primal-dual interior-point method is used to solve this non-convex problem. Finally, some good input probability distributions for 16-ary pulse amplitude modulation (PAM) constellation are obtained and achieve $4.0349$ dB gain over the cut-set bound for the target transmission rate $3.0067$ bits/(channel use).
  • For the parallel computation of partial differential equations, one key is the grid partitioning. It requires that each process owns the same amount of computations, and also, the partitioning quality should be proper to reduce the communications among processes. When calculating the partial differential equations using adaptive finite element methods, the grid and the basis functions adjust in each iteration, which introduce load balancing issues. The grid should be redistributed dynamically. This paper studies dynamic load balancing algorithms and the implementation on the adaptive finite element platform PHG. The numerical experiments show that algorithms studied in this paper have good partitioning quality, and they are efficient.