High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor

A1 Originalartikel i en vetenskaplig tidskrift (referentgranskad)

Interna författare/redaktörer

Publikationens författare: Fredrik Robertsén, Keijo Mattila, Jan Westerholm
Förläggare: John Wiley & Sons
Publiceringsår: 2019
Tidskrift: Concurrency and Computation: Practice and Experience
Volym: 31
Nummer: 13


We present a high‐performance implementation of the lattice‐Boltzmann
method (LBM) on the Knights Landing generation of Xeon Phi. The Knights
Landing architecture includes 16GB of high‐speed memory (MCDRAM) with a
reported bandwidth of over 400 GB/s, and a subset of the AVX‐512 single
instruction multiple data (SIMD) instruction set. We explain five
critical implementation aspects for high performance on this
architecture: (1) the choice of appropriate LBM algorithm, (2) suitable
data layout, (3) vectorization of the computation, (4) data prefetching,
and (5) running our LBM simulations exclusively from the MCDRAM. The
effects of these implementation aspects on the computational performance
are demonstrated with the lattice‐Boltzmann scheme involving the D3Q19
discrete velocity set and the TRT collision operator. In our benchmark
simulations of fluid flow through porous media, using double‐precision
floating‐point arithmetic, the observed performance exceeds 960 million
fluid lattice site updates per second.


Xeon Phi

Senast uppdaterad 2020-25-02 vid 04:13