
It has been widely observed that many activation functions and pooling
methods of neural network models have (positive) rescalinginvariant property,
including ReLU, PReLU, maxpooling, and average pooling, which makes
fullyconnected neural networks (FNNs) and convolutional neural networks (CNNs)
invariant to (positive) rescaling operation across layers. This may cause
unneglectable problems with their optimization: (1) different NN models could
be equivalent, but their gradients can be very different from each other; (2)
it can be proven that the loss functions may have many spurious critical points
in the redundant weight space. To tackle these problems, in this paper, we
first characterize the rescalinginvariant properties of NN models using
equivalence classes and prove that the dimension of the equivalence class space
is significantly smaller than the dimension of the original weight space. Then
we represent the loss function in the compact equivalence class space and
develop novel algorithms that conduct optimization of the NN models directly in
the equivalence class space. We call these algorithms Equivalence Class
Optimization (abbreviated as ECOpt) algorithms. Moreover, we design efficient
tricks to compute the gradients in the equivalence class, which almost have no
extra computational complexity as compared to standard backpropagation (BP).
We conducted experimental study to demonstrate the effectiveness of our
proposed new optimization algorithms. In particular, we show that by using the
idea of ECOpt, we can significantly improve the accuracy of the learned model
(for both FNN and CNN), as compared to using conventional stochastic gradient
descent algorithms.

With the fast development of deep learning, it has become common to learn big
neural networks using massive training data. Asynchronous Stochastic Gradient
Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which
is, however, known to suffer from the problem of delayed gradients. That is,
when a local worker adds its gradient to the global model, the global model may
have been updated by other workers and this gradient becomes "delayed". We
propose a novel technology to compensate this delay, so as to make the
optimization behavior of ASGD closer to that of sequential SGD. This is
achieved by leveraging Taylor expansion of the gradient function and efficient
approximation to the Hessian matrix of the loss function. We call the new
algorithm Delay Compensated ASGD (DCASGD). We evaluated the proposed algorithm
on CIFAR10 and ImageNet datasets, and the experimental results demonstrate
that DCASGD outperforms both synchronous SGD and asynchronous SGD, and nearly
approaches the performance of sequential SGD.

Following the discovery of the Higgs boson at LHC, new large colliders are
being studied by the international highenergy community to explore Higgs
physics in detail and new physics beyond the Standard Model. In China, a
twostage circular collider project CEPCSPPC is proposed, with the first stage
CEPC (Circular Electron Positron Collier, a socalled Higgs factory) focused on
Higgs physics, and the second stage SPPC (Super ProtonProton Collider) focused
on new physics beyond the Standard Model. This paper discusses this second
stage.

Stabilization of the accelerating field in Drift Tube Linac(DTL) is obtained
by inserting Post Couplers(PCs).On the basis of the circuit model equivalent
for the DTL with and without asymmetrical PCs, stabilization is deduced
quantitatively: let $\delta \omega/\omega_0$ be the relative frequency error,
then we discover that the sensitivity of field to perturbation is proportional
to $\sqrt{\delta \omega / \omega_0}$ without PCs and to $\delta
\omega/\omega_0$ with PCs. Then we adapt the circuit model of symmetrical PCs
for the case of asymmetrical PCs. The circuit model shows how the slope of
field distribution is changed by rotating the asymmetrical PCs and illustrates
that the asymmetrical PCs have the same effect as the symmetrical ones in
stabilization.