• ### Mpemba effect in spin glasses: a persistent memory effect(1804.07569)

April 20, 2018 cond-mat.dis-nn
The Mpemba effect refers to the observation that the hotter of two identical beakers of water, put in contact with the same thermal reservoir, can cool faster under certain conditions. The phenomenon is not specific to water, as it has been reported for nanotube resonators and clathrate hydrates. However, although records of the Mpemba effect date as far back as Aristotle, its very existence has been questioned. The study of simpler systems is clearly needed to clarify the underlying physics, granular fluids being one successful example of this approach. Here, we propose spin glasses as an alternative model system. Using the Janus II supercomputer, custom built for spin-glass simulations, we show that the Mpemba effect is indeed present in spin glasses and we clarify its origin: it is a non-equilibrium memory effect, encoded in the glassy coherence length.
• ### Energy-efficiency evaluation of Intel KNL for HPC workloads(1804.01911)

April 5, 2018 cs.DC
Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In this work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel into the HPC market. We take into account the 64-core Xeon Phi 7230, and analyze its energy performance using both the on-chip MCDRAM and the regular DDR4 system memory as main storage for the application data-domain. As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture and implemented using different memory data layouts to store its lattice. We assessthen the energy consumption using different memory data-layouts, kind of memory (DDR4 or MCDRAM) and number of threads per core.
• ### Theory meets experiment for the aging rate of spin glasses(1803.02264)

Experiments on spin glasses can now make precise measurements of the exponent $z(T)$ governing the growth of glassy domains, while our computational capabilities allow us to make quantitative predictions for experimental scales. However, experimental and numerical values for $z(T)$ have differed. We use new simulations on the Janus II computer to resolve this discrepancy, finding a time-dependent $z(T, t_w)$, which leads to the experimental value through mild extrapolations. Furthermore, theoretical insight is gained by studying a crossover between the $T = T_c$ and $T = 0$ fixed points.
• ### Matching microscopic and macroscopic responses in glasses(1704.07777)

We first reproduce on the Janus and Janus II computers a milestone experiment that measures the spin-glass coherence length through the lowering of free-energy barriers induced by the Zeeman effect. Secondly we determine the scaling behavior that allows a quantitative analysis of a new experiment reported in the companion Letter [S. Guchhait and R. Orbach, Phys. Rev. Lett. 118, 157203 (2017)]. The value of the coherence length estimated through the analysis of microscopic correlation functions turns out to be quantitatively consistent with its measurement through macroscopic response functions. Further, non-linear susceptibilities, recently measured in glass-forming liquids, scale as powers of the same microscopic length.
• ### A statics-dynamics equivalence through the fluctuation-dissipation ratio provides a window into the spin-glass phase from nonequilibrium measurements(1610.01418)

The unifying feature of glass formers (such as polymers, supercooled liquids, colloids, granulars, spin glasses, superconductors, ...) is a sluggish dynamics at low temperatures. Indeed, their dynamics is so slow that thermal equilibrium is never reached in macroscopic samples: in analogy with living beings, glasses are said to age. Here, we show how to relate experimentally relevant quantities with the experimentally unreachable low-temperature equilibrium phase. We have performed a very accurate computation of the non-equilibrium fluctuation-dissipation ratio for the three-dimensional Edwards-Anderson Ising spin glass, by means of large-scale simulations on the special-purpose computers Janus and Janus II. This ratio (computed for finite times on very large, effectively infinite, systems) is compared with the equilibrium probability distribution of the spin overlap for finite sizes. The resulting quantitative statics-dynamics dictionary, based on observables that can be measured with current experimental methods, could allow the experimental exploration of important features of the spin-glass phase without uncontrollable extrapolations to infinite times or system sizes.
• ### Optimization of Lattice Boltzmann Simulations on Heterogeneous Computers(1703.04594)

March 14, 2017 cs.DC
High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs, limiting performance and energy efficiency. The obvious step forward is to run compute-intensive kernels in a concurrent and balanced way on both hosts and accelerators. In this paper we consider exactly this problem for a class of applications based on Lattice Boltzmann Methods, widely used in computational fluid-dynamics. Our goal is to develop just one program, portable and able to run efficiently on several different combinations of hosts and accelerators. To reach this goal, we define common data layouts enabling the code to exploit efficiently the different parallel and vector options of the various accelerators, and matching the possibly different requirements of the compute-bound and memory-bound kernels of the application. We also define models and metrics that predict the best partitioning of workloads among host and accelerator, and the optimally achievable overall performance level. We test the performance of our codes and their scaling properties using as testbeds HPC clusters incorporating different accelerators: Intel Xeon-Phi many-core processors, NVIDIA GPUs and AMD GPUs.
• ### Massively parallel lattice-Boltzmann codes on large GPU clusters(1703.00185)

March 1, 2017 cs.DC
This paper describes a massively parallel code for a state-of-the art thermal lattice- Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bot- tlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and op- timization methodology that can be used for the development of other high performance applications for computational physics.
• ### Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC(1703.00186)

March 1, 2017 cs.DC
An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability and correctness. Several new programming environments try to tackle this problem. Among them, OpenACC offers a high-level approach based on compiler directive clauses to mark regions of existing C, C++ or Fortran codes to run on accelerators. This approach directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency. In this paper we address precisely this issue, using as a test-bench a massively parallel Lattice Boltzmann algorithm. We first describe our multi-node implementation and optimization of the algorithm, using OpenACC and MPI. We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and OpenCL. We also asses the performance impact associated to portable programming, and the actual portability and performance-portability of OpenACC-based applications across several state-of-the- art architectures.
• ### High-spin structure in $^{40}$K(1211.4069)

Nov. 17, 2012 nucl-ex
High-spin states of $^{40}$K have been populated in the fusion-evaporation reaction $^{12}$C($^{30}$Si,np)$^{40}$K and studied by means of $\gamma$-ray spectroscopy techniques using one AGATA triple cluster detector, at INFN - Laboratori Nazionali di Legnaro. Several new states with excitation energy up to 8 MeV and spin up to $10^-$ have been discovered. These new states are discussed in terms of J=3 and T=0 neutron-proton hole pairs. Shell-model calculations in a large model space have shown a good agreement with the experimental data for most of the energy levels. The evolution of the structure of this nucleus is here studied as a function of excitation energy and angular momentum.
• The Advanced GAmma Tracking Array (AGATA) is a European project to develop and operate the next generation gamma-ray spectrometer. AGATA is based on the technique of gamma-ray energy tracking in electrically segmented high-purity germanium crystals. This technique requires the accurate determination of the energy, time and position of every interaction as a gamma ray deposits its energy within the detector volume. Reconstruction of the full interaction path results in a detector with very high efficiency and excellent spectral response. The realization of gamma-ray tracking and AGATA is a result of many technical advances. These include the development of encapsulated highly-segmented germanium detectors assembled in a triple cluster detector cryostat, an electronics system with fast digital sampling and a data acquisition system to process the data at a high rate. The full characterization of the crystals was measured and compared with detector-response simulations. This enabled pulse-shape analysis algorithms, to extract energy, time and position, to be employed. In addition, tracking algorithms for event reconstruction were developed. The first phase of AGATA is now complete and operational in its first physics campaign. In the future AGATA will be moved between laboratories in Europe and operated in a series of campaigns to take advantage of the different beams and facilities available to maximize its science output. The paper reviews all the achievements made in the AGATA project including all the necessary infrastructure to operate and support the spectrometer.