
We present a lattice determination of the $\Lambda$ parameter in threeflavor
QCD and the strong coupling at the Z pole mass. Computing the nonperturbative
running of the coupling in the range from $0.2\,$GeV to $70\,$GeV, and using
experimental input values for the masses and decay constants of the pion and
the kaon, we obtain $\Lambda_{\overline{\rm MS}}^{(3)}=341(12)\,$MeV. The
nonperturbative running up to very high energies guarantees that systematic
effects associated with perturbation theory are well under control. Using the
fourloop prediction for $\Lambda_{\overline{\rm
MS}}^{(5)}/\Lambda_{\overline{\rm MS}}^{(3)}$ yields
$\alpha^{(5)}_{\overline{\rm MS}}(m_{\rm Z}) = 0.11852(84)$.

The computation of the form factors for the $B_s \to K \ell \nu$ decay is
presented. The b quark is treated by means of Heavy Quark Effective Theory,
currently in the static approximation. In these proceedings we discuss the
extraction of the bare matrix elements from lattice data through a combined fit
to two and threepoint correlation functions, as well as by considering
suitable ratios. The different methods agree concerning the extracted form
factors and approximately 2% accuracy is reached. The nonperturbative
renormalization and matching to QCD is described in accompanying proceedings
PoS(LATTICE2016)292.

We present results by the ALPHA collaboration for the $\Lambda$parameter in
3flavour QCD and the strong coupling constant at the electroweak scale,
$\alpha_s(m_Z)$, in terms of hadronic quantities computed on the CLS gauge
configurations. The first part of this proceedings contribution contains a
review of published material \cite{Brida:2016flw,DallaBrida:2016kgh} and yields
the $\Lambda$parameter in units of a low energy scale, $1/L_{\rm had}$. We
then discuss how to determine this scale in physical units from experimental
data for the pion and kaon decay constants. We obtain $\Lambda_{\overline{\rm
MS}}^{(3)} = 332(14)$ MeV which translates to $\alpha_s(M_Z)=0.1179(10)(2)$
using perturbation theory to match between 3, 4 and 5flavour QCD.

We present a computation of Bmeson decay constants from lattice QCD
simulations within the framework of Heavy Quark Effective Theory for the
bquark. The nexttoleading order corrections in the HQET expansion are
included nonperturbatively. Based on Nf=2 gauge field ensembles, covering
three lattice spacings a (0.080.05)fm and pion masses down to 190MeV, a
variational method for extracting hadronic matrix elements is used to keep
systematic errors under control. In addition we perform a careful
autocorrelation analysis in the extrapolation to the continuum and to the
physical pion mass limits. Our final results read fB=186(13)MeV, fBs=224(14)MeV
and fBs/fB=1.203(65). A comparison with other results in the literature does
not reveal a dependence on the number of dynamical quarks, and effects from
truncating HQET appear to be negligible.

We report our final estimate of the bquark mass from $N_f=2$ lattice QCD
simulations using Heavy Quark Effective Theory nonperturbatively matched to
QCD at $O(1/m_h)$. Treating systematic and statistical errors in a conservative
manner, we obtain $\overline{m}_{\rm b}^{\overline{\rm MS}}(2 {\rm
GeV})=4.88(15)$ GeV after an extrapolation to the physical point.

We report the final results of the ALPHA collaboration for some Bphysics
observables: $f_B$, $f_{B_s}$ and $m_b$. We employ CLS configurations with 2
flavors of $O(a)$ improved Wilson fermions in the sea and pion masses ranging
down to 190 MeV. The bquark is treated in HQET to order $1/m_b$. The
renormalization, the matching and the improvement were performed
nonperturbatively, and three lattice spacings reaching $a=0.048$ fm are used
in the continuum extrapolation.

The 2012 PDG reports a tension at the level of $3 \sigma$ between two
exclusive determinations of $V_{ub}$. They are obtained by combining the
experimental branching ratios of $B \to \tau \nu$ and $B \to \pi l \nu$
(respectively) with a theoretical computation of the hadronic matrix elements
$\fB$ and the $B \to \pi$ form factor $f_+(q^2)$. To understand the tension,
improved precision and a careful analysis of the systematics involved are
necessary. We report the results of the ALPHA collaboration for $\fB$ from the
lattice with 2 flavors of $O(a)$ improved Wilson fermions. We employ HQET,
including $1/m_b$ corrections, with pion masses ranging down to $\approx$ 190
MeV. Renormalization and matching were performed nonperturbatively, and three
lattice spacings reaching $a^{1}\approx 4.1$ GeV are used in the continuum
extrapolation. We also present progress towards a computation of $f_+(q^2)$, to
directly compare two independent exclusive determinations of $V_{ub}$ with
each other and with inclusive determinations. Additionally, we report on
preliminary results for $\fBq{s}$, needed for the analysis of $B_s \to
\mu^+\mu^$.}

We present our analysis of B physics quantities using nonperturbatively
matched Heavy Quark Effective Theory (HQET) in Nf= 2 lattice QCD on the CLS
ensembles. Using alltoall propagators, HYPsmeared static quarks, and the
Generalized Eigenvalue Problem (GEVP) approach with a conservative plateau
selection procedure, we are able to systematically control all sources of
error. With significantly increased statistics compared to last year, our
preliminary results are mb (mb) = 4.22(10)(4)z GeV for the MS bquark mass, and
fB = 193(9)stat (4)_\chi MeV and fBs = 219(12)stat MeV for the Bmeson decay
constants.

QPACE is a novel massively parallel architecture optimized for lattice QCD
simulations. A single QPACE node is based on the IBM PowerXCell 8i processor.
The nodes are interconnected by a custom 3dimensional torus network
implemented on an FPGA. The compute power of the processor is provided by 8
Synergistic Processing Units. Making efficient use of these accelerator cores
in scientific applications is challenging. In this paper we describe our
strategies for porting applications to the QPACE architecture and report on
performance numbers.

QPACE is a novel parallel computer which has been developed to be primarily
used for lattice QCD simulations. The compute power is provided by the IBM
PowerXCell 8i processor, an enhanced version of the Cell processor that is used
in the Playstation 3. The QPACE nodes are interconnected by a custom,
application optimized 3dimensional torus network implemented on an FPGA. To
achieve the very high packaging density of 26 TFlops per rack a new water
cooling concept has been developed and successfully realized. In this paper we
give an overview of the architecture and highlight some important technical
details of the system. Furthermore, we provide initial performance results and
report on the installation of 8 QPACE racks providing an aggregate peak
performance of 200 TFlops.

We apply an OsterwalderSeiler version of twisted mass QCD to a study of the
$B_K$ parameter, in which three of the four quark fields making up the relevant
$\Delta S =2$ operator are maximally twisted with the same twist angle, while
the fourth one has a twist angle of opposite sign. It is known that this setup
ensures automatic improvement of the bare $K^0$$\overline K^0$ operator matrix
element and multiplicative renormalization of the $\Delta S =2$ operator, at
the price of breaking the $K^0$$\overline K^0$ mass degeneracy by
discretization effects. As a result, two dominant systematic errors of the $B_
K$ determination with Wilson fermions are kept under control. With the Clover
term included in the fermion action, we perform a feasibility study and find,
in the quenched approximation, a significant improvement of the scaling
behaviour of $B_K$, compared to earlier standard tmQCD determinations.
Moreover, we study in detail the $K^0$$\overline K^0$ mass splitting that
characterizes this approach and confirm that, in the presence of the Clover
term, it is greatly reduced in a maximally twisted theory.

We give an overview of the QPACE project, which is pursuing the development
of a massively parallel, scalable supercomputer for LQCD. The machine is a
threedimensional torus of identical processing nodes, based on the PowerXCell
8i processor. The nodes are connected by an FPGAbased, applicationoptimized
network processor attached to the PowerXCell 8i processor. We present a
performance analysis of lattice QCD codes on QPACE and corresponding hardware
benchmarks.

We evaluate IBM's Enhanced Cell Broadband Engine (BE) as a possible building
block of a new generation of lattice QCD machines. The Enhanced Cell BE will
provide full support of doubleprecision floatingpoint arithmetics, including
IEEEcompliant rounding. We have developed a performance model and applied it
to relevant lattice QCD kernels. The performance estimates are supported by
micro and applicationbenchmarks that have been obtained on currently
available Cell BEbased computers, such as IBM QS20 blades and PlayStation 3.
The results are encouraging and show that this processor is an interesting
option for lattice QCD applications. For a massively parallel machine on the
basis of the Cell BE, an applicationoptimized network needs to be developed.

We discuss some large effects of dynamical fermions. One is a cutoff effect,
others concern the contribution of multipion states to correlation functions
and are expected to survive the continuum limit. We then turn to the
preparation for simulations at small lattice spacings which we are planning
down to around a=0.04fm in order to understand the size of O(a^2)effects of
the standard O(a)improved theory. The dependence of the lattice spacing on the
bare coupling is determined through the Schr"odinger functional renormalized
coupling.

We present the APE (Array Processor Experiment) project for the development
of dedicated parallel computers for numerical simulations in lattice gauge
theories. While APEmille is a production machine in today's physics simulations
at various sites in Europe, a new machine, apeNEXT, is currently being
developed to provide multiTflops computing performance. Like previous APE
machines, the new supercomputer is largely custom designed and specifically
optimized for simulations of Lattice QCD.

We present the current status of the apeNEXT project. Aim of this project is
the development of the next generation of APE machines which will provide
multiteraflop computing power. Like previous machines, apeNEXT is based on a
custom designed processor, which is specifically optimized for simulating QCD.
We discuss the machine design, report on benchmarks, and give an overview on
the status of the software development.

We present the current status of the apeNEXT project. Aim of this project is
the development of the next generation of APE machines which will provide
multiteraflop computing power. Like previous machines, apeNEXT is based on a
custom designed processor, which is specifically optimized for simulating QCD.
We discuss the machine design, report on benchmarks, and give an overview on
the status of the software development.

APENEXT is a new generation APE processor, optimized for LGT simulations. The
project follows the basic ideas of previous APE machines and develops simple
and cheap parallel systems with multi TFlops processing power. This paper
describes the main features of this new development.

This paper presents the status of the APEmille project, which is essentially
completed, as far as machine development and construction is concerned. Several
large installations of APEmille are in use for physics production runs leading
to many new results presented at this conference. This paper briefly summarizes
the APEmille architecture, reviews the status of the installations and presents
some performance figures for physics codes.

We present a comprehensive study of the masses of pseudoscalar and vector
mesons, as well as octet and decuplet baryons computed in O(a) improved
quenched lattice QCD. Results have been obtained using the nonperturbative
definition of the improvement coefficient c_sw, and also its estimate in
tadpole improved perturbation theory. We investigate effects of improvement on
the incidence of exceptional configurations, mass splittings and the parameter
J. By combining the results obtained using nonperturbative and tadpole
improvement in a simultaneous continuum extrapolation we can compare our
spectral data to experiment. We confirm earlier findings by the CPPACS
Collaboration that the quenched light hadron spectrum agrees with experiment at
the 10% level.

We report on the progress and status of the APEmille project: a SIMD parallel
computer with a peak performance in the TeraFlops range which is now in an
advanced development phase. We discuss the hardware and software architecture,
and present some performance estimates for Lattice Gauge Theory (LGT)
applications.

We present a study of the instanton size and spatial distributions in pure
SU(3) gauge theory using underrelaxed cooling. We also investigate the
lowlying eigenmodes of the (improved) WilsonDirac operator, in particular,
the appearance of zeromodes and their spacetime localisation with respect to
instantons in the underlying gauge field.

A previously introduced multiboson technique for the simulation of QCD with
dynamical quarks is described and some results of first test runs on a
$6^3\times12$ lattice with Wilson quarks and gauge group SU(2) are reported.

We calculate direct CPviolating rate asymmetries in charged $B\to PP$ and
$B\to VP$ decays arising from the interference of amplitudes with different
strong and CKM phases. The perturbative strong phases develop at order
$\alpha_s$ from absorptive parts of oneloop matrix elements of the
nexttoleading logarithm corrected effective Hamiltonian. CPT constraints are
maintained. Based on this model, we find that partial rate asymmetries between
charge conjugate $B^{\pm}$ decays can be as high as 20\% for certain channels
with branching ratios in the $10^{6}$ range. Because the $c\bar{c}$ threshold
lies so close to the physical momentum scale, the asymmetries depend
sensitively on the model assumptions used to evaluate the imaginary parts of
the matrix elements, in particular, on the internal momentum transfer. The
charge asymmetries of partial rates would provide unambiguous evidence for
direct CP violation.

We calculate CPviolating rate asymmetries in the rare radiative decays
$B^\pm\rightarrow K^{\ast\pm} \gamma$ and $B^\pm\rightarrow \rho^\pm \gamma$.
They arise because of the interference between leadingorder penguin amplitudes
and onegluon corrections with absorptive phases, and provide unambiguous
evidence for direct CP violation. Complementing earlier studies, we also
investigate gluon exchange with the `spectator' quark. The bound state effects
in the exclusive matrix elements are taken into account by a covariant model,
which yields a branching ratio $BR(B\rightarrow K^\ast \gamma) = (45)\times
10^{5}$ in good agreement with the observed value. The bound state effects
increase the CP asymmetry, which is of order $1 \%$ in the channel
$B\rightarrow K^\ast \gamma$ and $15 \%$ for $B \to \rho \gamma$.