• ### Optimizing Neural Networks in the Equivalence Class Space(1802.03713)

April 18, 2018 cs.LG, stat.ML
It has been widely observed that many activation functions and pooling methods of neural network models have (positive-) rescaling-invariant property, including ReLU, PReLU, max-pooling, and average pooling, which makes fully-connected neural networks (FNNs) and convolutional neural networks (CNNs) invariant to (positive) rescaling operation across layers. This may cause unneglectable problems with their optimization: (1) different NN models could be equivalent, but their gradients can be very different from each other; (2) it can be proven that the loss functions may have many spurious critical points in the redundant weight space. To tackle these problems, in this paper, we first characterize the rescaling-invariant properties of NN models using equivalence classes and prove that the dimension of the equivalence class space is significantly smaller than the dimension of the original weight space. Then we represent the loss function in the compact equivalence class space and develop novel algorithms that conduct optimization of the NN models directly in the equivalence class space. We call these algorithms Equivalence Class Optimization (abbreviated as EC-Opt) algorithms. Moreover, we design efficient tricks to compute the gradients in the equivalence class, which almost have no extra computational complexity as compared to standard back-propagation (BP). We conducted experimental study to demonstrate the effectiveness of our proposed new optimization algorithms. In particular, we show that by using the idea of EC-Opt, we can significantly improve the accuracy of the learned model (for both FNN and CNN), as compared to using conventional stochastic gradient descent algorithms.
• ### Asynchronous Stochastic Gradient Descent with Delay Compensation(1609.08326)

Feb. 18, 2020 cs.DC, cs.LG
With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is, however, known to suffer from the problem of delayed gradients. That is, when a local worker adds its gradient to the global model, the global model may have been updated by other workers and this gradient becomes "delayed". We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is achieved by leveraging Taylor expansion of the gradient function and efficient approximation to the Hessian matrix of the loss function. We call the new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD.
• ### Concept for a Future Super Proton-Proton Collider(1507.03224)

July 19, 2015 hep-ex, physics.acc-ph
Following the discovery of the Higgs boson at LHC, new large colliders are being studied by the international high-energy community to explore Higgs physics in detail and new physics beyond the Standard Model. In China, a two-stage circular collider project CEPC-SPPC is proposed, with the first stage CEPC (Circular Electron Positron Collier, a so-called Higgs factory) focused on Higgs physics, and the second stage SPPC (Super Proton-Proton Collider) focused on new physics beyond the Standard Model. This paper discusses this second stage.
• ### Analyzing the effects of post couplers in DTL tuning by the equivalent circuit model(1304.4761)

May 21, 2013 physics.acc-ph
Stabilization of the accelerating field in Drift Tube Linac(DTL) is obtained by inserting Post Couplers(PCs).On the basis of the circuit model equivalent for the DTL with and without asymmetrical PCs, stabilization is deduced quantitatively: let $\delta \omega/\omega_0$ be the relative frequency error, then we discover that the sensitivity of field to perturbation is proportional to $\sqrt{\delta \omega / \omega_0}$ without PCs and to $\delta \omega/\omega_0$ with PCs. Then we adapt the circuit model of symmetrical PCs for the case of asymmetrical PCs. The circuit model shows how the slope of field distribution is changed by rotating the asymmetrical PCs and illustrates that the asymmetrical PCs have the same effect as the symmetrical ones in stabilization.