
This paper presents our work on designing a parallel platform for largescale
reservoir simulations. Detailed components, such as grid and linear solver, and
data structures are introduced, which can serve as a guide to parallel
reservoir simulations and other parallel applications. The main objective of
platform is to support implementation of various parallel reservoir simulators
on distributedmemory parallel systems, where MPI (Message Passing Interface)
is employed for communications among computation nodes. It provides structured
grid due to its simplicity and cellcentered data is applied for each cell. The
platform has a distributed matrix and vector module and a map module. The
matrix and vector module is the base of our parallel linear systems. The map
connects grid and linear system modules, which defines various mappings between
grid and linear systems. Commonlyused Krylov subspace linear solvers are
implemented, including the restarted GMRES method and the BiCGSTAB method. It
also has an interface to a parallel algebraic multigrid solver, BoomerAMG from
HYPRE. Parallel generalpurpose preconditioners and special preconditioners for
reservoir simulations are also developed. Various data structures are designed,
such as grid, cell, data, linear solver and preconditioner, and some key
default parameters are presented in this paper. The numerical experiments show
that our platform has excellent scalability and it can simulate giant reservoir
models with hundreds of millions of grid cells using thousands of CPU cores.

We consider the classical coded caching problem as defined by MaddahAli and
Niesen, where a server with a library of $N$ files of equal size is connected
to $K$ users via a shared errorfree link. Each user is equipped with a cache
with capacity of $M$ files. The goal is to design a static content placement
and delivery scheme such that the average load over the shared link is
minimized. We first present a class of centralized coded caching schemes
consisting of a general content placement strategy specified by a file
partition parameter, enabling efficient and flexible content placement, and a
specific content delivery strategy, enabling load reduction by exploiting
common requests of different users. For the proposed class of schemes, we
consider two cases for the optimization of the file partition parameter,
depending on whether a large subpacketization level is allowed or not. In the
case of an unrestricted subpacketization level, we formulate the coded caching
optimization in order to minimize the average load under an arbitrary file
popularity. A direct formulation of the problem involves $N2^K$ variables. By
imposing some additional conditions, the problem is reduced to a linear program
with $N(K+1)$ variables under an arbitrary file popularity and with $K+1$
variables under the uniform file popularity. We can recover Yu {\em et al.}'s
optimal scheme for the uniform file popularity as an optimal solution of our
problem. When a low subpacketization level is desired, we introduce a
subpacketization level constraint involving the $\ell_0$ norm for each file.
Again, by imposing the same additional conditions, we can simplify the problem
to a difference of two convex functions (DC) problem with $N(K+1)$ variables
that can be efficiently solved.

Pinning control of a complex network aims at forcing the states of all nodes
to track an external signal by controlling a small number of nodes in the
network. In this paper, an algebraic graphtheoretic condition is proposed to
optimize pinning control. When individual node dynamics and coupling strength
of the network are given, the effectiveness of pinning control can be measured
by the smallest eigenvalue of the grounded Laplacian matrix obtained by
deleting the rows and columns corresponding to the pinned nodes from the
Laplacian matrix of the network. The larger this smallest eigenvalue, the more
effective the pinning control. Spectral properties of the smallest eigenvalue
are analyzed using the network topology information, including the spectrum of
the network Laplacian matrix, the minimal degree of uncontrolled nodes, the
number of edges between the controlled node set and the uncontrolled node set,
etc. The obtained properties are shown effective for optimizing the pinning
control strategy, and demonstrated by illustrative examples. Finally, for both
scalefree and smallworld networks, in order to maximize their corresponding
smallest eigenvalues, it is better to pin the nodes with large degrees when the
percentage of pinned nodes is relatively small, while it is better to pin nodes
with small degrees when the percentage is relatively large. This surprising
phenomenon can be explained by one of the theorems established.

Mobile virtual reality (VR) delivery is gaining increasing attention from
both industry and academia due to its ability to provide an immersive
experience. However, achieving mobile VR delivery requires ultrahigh
transmission rate, deemed as a first killer application for 5G wireless
networks. In this paper, in order to alleviate the traffic burden over wireless
networks, we develop an implementation framework for mobile VR delivery by
utilizing caching and computing capabilities of mobile VR device. We then
jointly optimize the caching and computation offloading policy for minimizing
the required average transmission rate under the latency and local average
energy consumption constraints. In a symmetric scenario, we obtain the optimal
joint policy and the closedform expression of the minimum average transmission
rate. Accordingly, we analyze the tradeoff among communication, computing and
caching, and then reveal analytically the fact that the communication overhead
can be traded by the computing and caching capabilities of mobile VR device,
and also what conditions must be met for it to happen. Finally, we discuss the
optimization problem in a heterogeneous scenario, and propose an efficient
suboptimal algorithm with low computation complexity, which is shown to achieve
good performance in the numerical results.

It's well known that the nsphere $S^n$ is the universal double covering of
the $n$dimensional real projective space $\mathbb{R}P^n$ and then any Finsler
metric on $\mathbb{R}P^n$ induces a Finsler metric of $S^n$. In this paper, we
prove that for every Finsler $(S^n, F)$ for $n\geq3$ whose metric is induced by
irreversible Finsler $(\mathbb{R}P^n,F)$ with reversibility $\lambda$ and flag
curvature $K$ satisfying $(\frac{\lambda}{\lambda+1})^2<K\leq 1$, there exist
at least $n1$ prime closed geodesics on $(S^n, F)$. Furthermore, if there
exist finitely many distinct closed geodesics on $(S^n, F)$, then there exist
at least $2[\frac{n}{2}]1$ of them are nonhyperbolic.

In traffic systems, cooperative driving has attracted the researchers
attentions. A lot of works attempt to understand the effects of cooperative
driving behavior and/or time delays on traffic flow dynamics for specific
traffic flow model. This paper is a new attempt to investigate analyses of
linear stability and weak nonlinear for the general carfollowing model with
consideration of cooperation and time delays. We derive linear stability
condition and study that how the combinations of cooperation and time delays
affect the stability of traffic flow. Burgers equation and Korteweg de Vries
(KdV) equation for carfollowing model considering cooperation and time delays
are derived. Their solitary wave solutions and constraint conditions are
concluded. We investigate the property of cooperative optimal velocity(OV)
model which estimates the combinations of cooperation and time delays about the
evolution of traffic waves using both analytic and numerical methods. The
results indicate that delays and cooperation are modeldependent, and
cooperative behavior could inhibit the stabilization of traffic flow. Moreover,
delays of sensing to relative motion are easy to trigger the traffic waves;
delays of sensing to host vehicle are beneficial to relieve the instability
effect a certain extent.

Chiral Magnetic Effect(CME) is usually believed not receiving higher order
corrections due to the nonrenormalization of AVV triangle diagram in the
framework of quantum field theory. However, the CMErelevant triangle, which is
obtained by expanding the currentcurrent correlation requires zero momentum on
the axial vertex, is not equivalent to the general AVV triangle when taking the
zeromomentum limit owing to the infrared problem on the axial vertex.
Therefore, it is still significant to check if there exists perturbative higher
order corrections to the currentcurrent correlation. In this paper, we
explicitly calculate the twoloop corrections of CME within NJL model with
ChernSimons term which ensures a consistent $\mu_5$. The result shows the
twoloop corrections to the CME conductivity are zero, which confirms the
nonrenomalization of CME conductivity.

In recent years, high performance scientific computing on graphics processing
units (GPUs) have gained widespread acceptance. These devices are designed to
offer massively parallel threads for running code with general purpose. There
are many researches focus on finite element method with GPUs. However, most of
the works are specific to certain problems and applications. Some works propose
methods for finite element assembly that is general for a wide range of finite
element models. But the development of finite element code is dependent on the
hardware architectures. It is usually complicated and error prone using the
libraries provided by the hardware vendors. In this paper, we present
architecture and implementation of finite element assembly for partial
differential equations (PDEs) based on symbolic computation and runtime
compilation technique on GPU. User friendly programming interface with symbolic
computation is provided. At the same time, high computational efficiency is
achieved by using runtime compilation technique. As far as we know, it is the
first work using this technique to accelerate finite element assembly for
solving PDEs. Experiments show that a one to two orders of speedup is achieved
for the problems studied in the paper.

Starting from wellknown absolute instruments for perfect imaging, we
introduce a type of rotationalsymmetrical compact closed manifolds, namely
geodesic lenses. We demonstrate that light rays confined on geodesic lenses are
closed trajectories. While for optical waves, the spectrum of geodesic lens is
(at least approximately) degenerate and equidistant with numerical methods.
Based on this property, we show a periodical evolution of optical waves and
quantum waves on geodesic lenses. Moreover, we fabricate two geodesic lenses in
submicrometer scale, where curved light rays are observed with high accurate
precision. Our results may offer a new platform to investigate light
propagation on curved surfaces.

In this paper, we experimentally demonstrate reversible wavefront shaping
through mimicking gravitational field. A gradientindex microstructured
optical waveguide with special refractive index profile was constructed whose
effective index satisfying a gravitational field profile. Inside the waveguide,
an incident broad Gaussian beam is firstly transformed into an accelerating
beam, and the generated accelerating beam is gradually changed back to a
Gaussian beam afterwards. To validate our experiment, we performed fullwave
continuum simulations that agree with the experimental results. Furthermore, a
theoretical model was established to describe the evolution of the laser beam
based on Landau's method, showing that the accelerating beam behaves like the
Airy beam in the small range in which the linear potential approaches zero. To
our knowledge, such a reversible wavefront shaping technique has not been
reported before.

Weyl fermions have not been found in nature as elementary particles, but they
emerge as nodal points in the band structure of electronic and classical wave
crystals. Novel phenomena such as Fermi arcs and chiral anomaly have fueled the
interest in these topological points which are frequently perceived as
monopoles in momentum space. Here we report the experimental observation of
generalized optical Weyl points inside the parameter space of a photonic
crystal with a specially designed fourlayer unit cell. The reflection at the
surface of a truncated photonic crystal exhibits phase vortexes due to the
synthetic Weyl points, which in turn guarantees the existence of interface
states between photonic crystals and any reflecting substrates. The reflection
phase vortexes have been confirmed for the first time in our experiments which
serve as an experimental signature of the generalized Weyl points. The
existence of these interface states is protected by the topological properties
of the Weyl points and the trajectories of these states in the parameter space
resembles those of Weyl semimetal "Fermi arcs surface states" in momentum
space. Tracing the origin of interface states to the topological character of
the parameter space paves the way for a rational design of strongly localized
states with enhanced local field.

Due to the explosive growth in multimedia traffic, the scalability of
videoondemand (VoD) services becomes increasingly important. By exploiting
the potential cache ability at the client side, the performance of VoD
multicast delivery can be improved through video segment precaching. In this
paper, we address the performance limits of client caching enabled VoD schemes
in wireless multicast networks with asynchronous requests. Both reactive and
proactive systems are investigated. Specifically, for the reactive system where
videos are transmitted on demand, we propose a joint cache allocation and
multicast delivery scheme to minimize the average bandwidth consumption under
the zerodelay constraint. For the proactive system where videos are
periodically broadcasted, a joint design of the cachebandwidth allocation
algorithm and the delivery mechanism is developed to minimize the average
waiting time under the total bandwidth constraint. In addition to the full
access pattern where clients view videos in their entirety, we further consider
the access patterns with random endpoints, fixedsize intervals and downloading
demand, respectively. The impacts of different access patterns on the
resourceallocation algorithm and the delivery mechanism are elaborated.
Simulation results validate the accuracy of the analytical results and also
provide useful insights in designing VoD networks with client caching.

The goal of load balancing (grid partitioning) is to minimize overall
computations and communications, and to make sure that all processors have a
similar workload. Geometric methods divide a grid by using a location of a cell
while topological methods work with connectivity of cells, which is generally
described as a graph. This paper introduces a Hilbert spacefilling curve
method. A spacefilling curve is a continuous curve and defines a map between a
onedimensional space and a multidimensional space. A Hilbert spacefilling
curve is one special spacefilling curve discovered by Hilbert and has many
useful characteristics, such as good locality, which means that two objects
that are close to each other in a multidimensional space are also close to
each other in a one dimensional space. This property can model communications
in gridbased parallel applications. The idea of the Hilbert spacefilling
curve method is to map a computational domain into a onedimensional space,
partition the onedimensional space to certain intervals, and assign all cells
in a same interval to a MPI. To implement a load balancing method, a mapping
kernel is required to convert highdimensional coordinates to a scalar value
and an efficient onedimensional partitioning module that divides a
onedimensional space and makes sure that all intervals have a similar
workload.
The Hilbert spacefilling curve method is compared with ParMETIS, a famous
graph partitioning package. The results show that our Hilbert spacefilling
curve method has good partition quality. It has been applied to grids with
billions of cells, and linear scalability has been obtained on IBM Blue Gene/Q.

Let $M=S^n/ \Gamma$ and $h$ be a nontrivial element of finite order $p$ in
$\pi_1(M)$, where the integer $n\geq2$, $\Gamma$ is a finite group which acts
freely and isometrically on the $n$sphere and therefore $M$ is diffeomorphic
to a compact space form. In this paper, we establish first the resonance
identity for noncontractible homologically visible minimal closed geodesics of
the class $[h]$ on every Finsler compact space form $(M, F)$ when there exist
only finitely many distinct noncontractible closed geodesics of the class
$[h]$ on $(M, F)$. Then as an application of this resonance identity, we prove
the existence of at least two distinct noncontractible closed geodesics of the
class $[h]$ on $(M, F)$ with a bumpy Finsler metric, which improves a result of
Taimanov in [Taimanov 2016] by removing some additional conditions. Also our
results extend the resonance identity and multiplicity results on
$\mathcal{R}P^n$ in [arXiv:1607.02746] to general compact space forms.

In this paper, we establish first the resonance identity for noncontractible
homologically visible prime closed geodesics on Finsler $n$dimensional real
projective space $(\mathbb{R}P^n,F)$ when there exist only finitely many
distinct noncontractible closed geodesics on $(\mathbb{R}P^n,F)$, where the
integer $n\geq2$. Then as an application of this resonance identity, we prove
the existence of at least two distinct noncontractible closed geodesics on
$\mathbb{R}P^{n}$ with a bumpy and irreversible Finsler metric. Together with
two previous results on bumpy and reversible Finsler metrics in \cite{DLX2015}
and \cite{Tai2016}, it yields that every $\mathbb{R}P^{n}$ with a bumpy Finsler
metric possesses at least two distinct noncontractible closed geodesics.

A centralized coded caching scheme has been proposed by MaddahAli and Niesen
to reduce the worstcase load of a network consisting of a server with access
to N files and connected through a shared link to K users, each equipped with a
cache of size M. However, this centralized coded caching scheme is not able to
take advantage of a nonuniform, possibly very skewed, file popularity
distribution. In this work, we consider the same network setting but aim to
reduce the average load under an arbitrary (known) file popularity
distribution. First, we consider a class of centralized coded caching schemes
utilizing general uncoded placement and a specific coded delivery strategy,
which are specified by a general file partition parameter. Then, we formulate
the coded caching design optimization problem over the considered class of
schemes with 2^K2^N variables to minimize the average load by optimizing the
file partition parameter under an arbitrary file popularity. Furthermore, we
show that the optimization problem is convex, and the resulting optimal
solution generally improves upon known schemes. Next, we analyze structural
properties of the optimization problem to obtain design insights and reduce the
complexity. Specifically, we obtain an equivalent linear optimization problem
with (K+1)N variables under an arbitrary file popularity and an equivalent
linear optimization problem with K+1 variables under the uniform file
popularity. Under the uniform file popularity, we also obtain the closed form
optimal solution, which corresponds to MaddahAliNiesen's centralized coded
caching scheme. Finally, we present an informationtheoretic converse bound on
the average load under an arbitrary file popularity.

Transformation optics (TO) has been used to propose various novel optical
devices. With the help of metamaterials, several intriguing designs, such as
invisibility cloaks, have been implemented. However, as the basic units should
be much smaller than the working wavelengths to achieve the effective material
parameters, and the sizes of devices should be much larger than the wavelengths
of illumination to work within the lightray approximation, it is a big
challenge to implement an experimental system that works simultaneously for
both geometric optics and wave optics. In this letter, by using a
gradientindex microstructured optical waveguide, we realize a device of
conformal transformation optics (CTO) and demonstrate its selffocusing
property for geometry optics and Talbot effect for wave optics. In addition,
the Talbot effect in such a system has a potential application to transfer
digital information without diffraction. Our findings demonstrate the photon
controlling ability of CTO in a feasible experiment system.

Joint pushing and caching is recognized as an efficient remedy to the problem
of spectrum scarcity incurred by tremendous mobile data traffic. In this paper,
by exploiting storage resources at end users and predictability of user demand
processes, we design the optimal joint pushing and caching policy to maximize
bandwidth utilization, which is of fundamental importance to mobile telecom
carriers. In particular, we formulate the stochastic optimization problem as an
infinite horizon average cost Markov Decision Process (MDP), for which there
generally exist only numerical solutions without many insights. By structural
analysis, we show how the optimal policy achieves a balance between the current
transmission cost and the future average transmission cost. In addition, we
show that the optimal average transmission cost decreases with the cache size,
revealing a tradeoff between the cache size and the bandwidth utilization.
Then, due to the fact that obtaining a numerical optimal solution suffers the
curse of dimensionality and implementing it requires a centralized controller
and global system information, we develop a decentralized policy with
polynomial complexity w.r.t. the numbers of users and files as well as cache
size, by a linear approximation of the value function and optimization
relaxation techniques. Next, we propose an online decentralized algorithm to
implement the proposed lowcomplexity decentralized policy using the technique
of Qlearning, when priori knowledge of user demand processes is not available.
Finally, using numerical results, we demonstrate the advantage of the proposed
solutions over some existing designs. The results in this paper offer useful
guidelines for designing practical cacheenabled contentcentric wireless
networks.

Stimulated by the exciting progress in the observation of new bottomonium
states, we study the bottomonium spectrum. To calculate the mass spectrum, we
adopt a nonrelativistic screened potential model. The radial Schr\"{o}dinger
equation is solved with the threepoint difference central method, where the
spindependent potentials are dealt with nonperturbatively. With this
treatment, the corrections of the spindependent potentials to the wave
functions can be included successfully. Furthermore, we calculate the
electromagnetic transitions of the $nS$ ($n\leq 4$), $nP$ ($n\leq 3$), and $nD$
($n\leq 2$) bottomonium states with a nonrelativistic electromagnetic
transition operator widely applied to meson photoproduction reactions. Our
predicted masses, hyperfine and fine splittings, electromagnetic transition
widths and branching ratios of the bottomonium states are in good agreement
with the available experimental data. Especially, the EM transitions of
$\Upsilon(3S)\to \chi_{b1,2}(1P)\gamma$, which were not well understood in
previous studies, can be reasonably explained by considering the corrections of
the spindependent interactions to the wave functions. We also discuss the
observations of the missing bottomonium states by using radiative transitions.
Some important radiative decay chains involving the missing bottomonium states
are suggested to be observed. We hope our study can provide some useful
references to observe and measure the properties of bottomonium mesons in
forthcoming experiments.

We propose a framework employing stochastic differential equations to
facilitate the longterm stability analysis of power grids with intermittent
wind power generations. This framework takes into account the discrete dynamics
which play a critical role in the longterm stability analysis, incorporates
the model of wind speed with different probability distributions, and also
develops an approximation methodology (by a deterministic hybrid model) for the
stochastic hybrid model to reduce the computational burden brought about by the
uncertainty of wind power. The theoretical and numerical studies show that a
deterministic hybrid model can provide an accurate trajectory approximation and
stability assessments for the stochastic hybrid model under mild conditions. In
addition, we discuss the critical cases that the deterministic hybrid model
fails and discover that these cases are caused by a violation of the proposed
sufficient conditions. Such discussion complements the proposed framework and
methodology and also reaffirms the importance of the stochastic hybrid model
when the system operates close to its stability limit.

Quantum digital signatures (QDS) provide a means for signing electronic
communications with informationtheoretic security. However, all previous
demonstrations of quantum digital signatures assume trusted measurement
devices. This renders them vulnerable against detector sidechannel attacks,
just like quantum key distribution. Here, we exploit a
measurementdeviceindependent (MDI) quantum network, over a
200squarekilometer metropolitan area, to perform a field test of a
threeparty measurementdeviceindependent quantum digital signature (MDIQDS)
scheme that is secure against any detector sidechannel attack. In so doing, we
are able to successfully sign a binary message with a security level of about
1E7. Remarkably, our work demonstrates the feasibility of MDIQDS for
practical applications.

This research investigates the implementation mechanism of blockwise ILU(k)
preconditioner on GPU. The blockwise ILU(k) algorithm requires both the level
k and the block size to be designed as variables. A decoupled ILU(k) algorithm
consists of a symbolic phase and a factorization phase. In the symbolic phase,
a ILU(k) nonzero pattern is established from the pointwise structure extracted
from a blockwise matrix. In the factorization phase, the blockwise matrix
with a variable block size is factorized into a block lower triangular matrix
and a block upper triangular matrix. And a further diagonal factorization is
required to perform on the block upper triangular matrix for adapting a
parallel triangular solver on GPU.We also present the numerical experiments to
study the preconditioner actions on different k levels and block sizes.

We study analytically the oneloop contribution to the Chiral Magnetic
Effect(CME) using lattice regularization with a Wilson fermion field. In the
continuum limit, we find that the chiral magnetic current vanishes at nonzero
temperature but emerges at zero temperature consistent with that found by
PauliVillas regularization. For finite lattice size, however, the chiral
magnetic current is nonvanishing at nonzero temperature. But the numerical
vaule of the coefficient of CME current is very small compared with that
extracted from the full QCD simulation for the same lattice parameters. The
possibility of higher order corrections from QCD dynamics is also assessed.

In this paper, we investigate the signal shaping in a twouser discrete time
memoryless Gaussian multipleaccess channel (MAC) with computation. It is shown
that by optimizing input probability distribution, the transmission rate per
transmitter is beyond the cutset bound. In contrast with the singleuser
discrete memoryless channel, the MaxwellBoltzmann distribution is no longer a
good approximation to the optimal input probability distribution for this
discretetime Gaussian MAC with computation. Specifically, we derive and
analyze the mutual information for this channel. Because of the computation in
the destination, the mutual information is not concave in general on the input
probability distribution, and then primaldual interiorpoint method is used to
solve this nonconvex problem. Finally, some good input probability
distributions for 16ary pulse amplitude modulation (PAM) constellation are
obtained and achieve $4.0349$ dB gain over the cutset bound for the target
transmission rate $3.0067$ bits/(channel use).

For the parallel computation of partial differential equations, one key is
the grid partitioning. It requires that each process owns the same amount of
computations, and also, the partitioning quality should be proper to reduce the
communications among processes. When calculating the partial differential
equations using adaptive finite element methods, the grid and the basis
functions adjust in each iteration, which introduce load balancing issues. The
grid should be redistributed dynamically. This paper studies dynamic load
balancing algorithms and the implementation on the adaptive finite element
platform PHG. The numerical experiments show that algorithms studied in this
paper have good partitioning quality, and they are efficient.