With the increase in parallelism in response to a stagnation of clock frequencies, software libraries are pushed towards multi-threading. However, there are several different threading approaches out there: The most popular in the C/C++ world are POSIX Threads (pthread), OpenMP, and C++11 threads. Clearly, a good software library does not enforce the use of one particular approach, but is able to deal with (almost) any multi-threading approach. In this blog post I will discuss a possible software library design to achieve this. Continue reading
Tag Archives: OpenMP
Strided Memory Access on CPUs, GPUs, and MIC
Optimization guides for GPUs discuss in length the importance of contiguous ("coalesced", etc.) memory access for achieving high memory bandwidth (e.g. this parallel4all blog post). But how does strided memory access compare across different architectures? Is this something specific to NVIDIA GPUs? Let's shed some light on these questions by some benchmarks. Continue reading