The Knights Landing (KNL) update of Intel's Xeon Phi product line is now available. For the applications I'm primarily interested in, namely the numerical solution of partial differential equation, the typical bottleneck is memory bandwidth. To assess memory bandwidth, the STREAM benchmark is the de-facto standard, so let us have a look at how KNL compares to the previous Xeon Phi generation (Knights Corner, KNC) as well as to the Xeon product line.
Continue reading
Tag Archives: xeon phi
Strided Memory Access on CPUs, GPUs, and MIC
Optimization guides for GPUs discuss in length the importance of contiguous ("coalesced", etc.) memory access for achieving high memory bandwidth (e.g. this parallel4all blog post). But how does strided memory access compare across different architectures? Is this something specific to NVIDIA GPUs? Let's shed some light on these questions by some benchmarks. Continue reading
STREAM Benchmark Results on Intel Xeon and Xeon Phi
While the number of floating point operations per second (FLOPs) is often considered to be the primary indicator for achievable performance, in many important application areas the limiting factor nowadays is memory bandwidth (cf. memory wall). The standard benchmark to measure memory bandwidth is the STREAM benchmark. Despite its simplicity of 'just simple vector operations', the benchmark is a very helpful indicator for actual application performance. Continue reading