Category Archives: Software Development

Everything related to programming tricks, helpful tools, potential pitfalls, or portability issues.

Computational Science And Engineering Software Sustainability And Productivity (CSESSP) Challenges

Is writing code essential for your every-day work, be it research, development, or engineering? Did you have to take a coding shortcut here or there to get the task done? Do you think that your code would need a bit more polishing before handing it out to other people? If you answer all these questions with yes, you're not alone! Continue reading →

FWF-Project: 3D Solution of the Boltzmann Equation on Supercomputers

The Austrian Science Fund (FWF) approved my project proposal entitled "3D Solution of the Boltzmann Equation on Supercomputers". This project will fund my scientific work for three more years, with prospective start in mid 2017. Here is a brief summary of what this project is about. Continue reading →

Latency Comparison of Lua, OpenCL, and native C/C++

Just-in-time compilation is an appealing technique for producing optimized code at run time rather than at compile time. In an earlier post I was already looking into the just-in-time compilation overhead of various OpenCL SDKs. This blog post looks into the cost of launching OpenCL kernels on the CPU and compares with the cost of calling a plain C/C++ function through a function pointer, and with the cost of calling a precompiled Lua script. Continue reading →

Multi-Threading in C/C++: Implications on Software Library Design

With the increase in parallelism in response to a stagnation of clock frequencies, software libraries are pushed towards multi-threading. However, there are several different threading approaches out there: The most popular in the C/C++ world are POSIX Threads (pthread), OpenMP, and C++11 threads. Clearly, a good software library does not enforce the use of one particular approach, but is able to deal with (almost) any multi-threading approach. In this blog post I will discuss a possible software library design to achieve this. Continue reading →

Raspberry Pi: Interfacing Honeywell Humidity and Temperature Sensors

Recently I was toying with a Raspberry Pi 2 and other hardware to get a better idea about the current status of the Internet of Things. Among several sensors, I was also looking into a Honeywell HIH8131 sensor (around 25 Euros, obtained from Reichelt). Unfortunately, none of the solutions I found on the web for reading the sensor worked for me, so I finally went down into the low-level details of communicating via the I2C bus through the Linux kernel. And I enjoyed it!
Continue reading →

Sparse Matrix Transposition: Datastructure Performance Comparison

While processor manufacturers repeatedly emphasize the importance of their latest innovations such as vector extensions (AVX, AVX2, etc.) of the processing elements, proper placement of data in memory is at least equally important. At the same time, generic implementations of many different data structures allow one to (re)use the most appealing one quickly. However, the intuitively most appropriate data structure may not be the fastest. Continue reading →

Strided Memory Access on CPUs, GPUs, and MIC

Optimization guides for GPUs discuss in length the importance of contiguous ("coalesced", etc.) memory access for achieving high memory bandwidth (e.g. this parallel4all blog post). But how does strided memory access compare across different architectures? Is this something specific to NVIDIA GPUs? Let's shed some light on these questions by some benchmarks. Continue reading →

OpenCL Just-In-Time (JIT) Compilation Benchmarks

The beauty of the vendor-independent standard OpenCL is that a single kernel language is sufficient to program many different architectures, ranging from dual-core CPUs over Intel's Many Integrated Cores (MIC) architecture to GPUs and even FPGAs. The kernels are just-in-time compiled during the program run, which has several advantages and disadvantages. An incomplete list is as follows:

Advantage: Binary can be fully optimized for the underlying hardware
Advantage: High portability
Disadvantage: Just-in-Time compilation induces overhead
Disadvantage: No automatic performance portability

Today's blog post is about just-in-time (jit) compilation overhead. Ideally, jit-compilation is infinitely fast. In reality, it is sufficient to keep the jit-compilation time small compared to the overall execution time. But what is 'small'?

Continue reading →

GPU Research Center at TU Wien

Today it was announced that TU Wien hosts an NVIDIA GPU Research Center, for which Josef Weinbub, Florian Rudolf, and I are PIs. The agenda includes improvements to ViennaCL as well as PETSc, both open source libraries I'm actively involved in. In addition to continued, incremental improvements, we will also look into two interesting research questions related to the numerical solution of partial differential equations. Continue reading →

Mentored Project Ideas for GSoC 2014

Our organization Computational Science and Engineering at TU Wien was selected for the Google Summer of Code 2014. Within our organization, a couple of great open source software projects hosted at TU Wien are reaching out to students all over the world for work on free scientific software over the summer. Application deadline for students is on March 21, 2014. The funding provided by Google for the students is again highly appreciated 🙂

This year I'm again mentoring project ideas for ViennaCL, which I'll describe briefly in the following: Continue reading →

Karl Rupp

Computational Scientist