Is writing code essential for your every-day work, be it research, development, or engineering? Did you have to take a coding shortcut here or there to get the task done? Do you think that your code would need a bit more polishing before handing it out to other people? If you answer all these questions with yes, you're not alone! Continue reading
Category Archives: Software Development
FWF-Project: 3D Solution of the Boltzmann Equation on Supercomputers
The Austrian Science Fund (FWF) approved my project proposal entitled "3D Solution of the Boltzmann Equation on Supercomputers". This project will fund my scientific work for three more years, with prospective start in mid 2017. Here is a brief summary of what this project is about. Continue reading
Latency Comparison of Lua, OpenCL, and native C/C++
Just-in-time compilation is an appealing technique for producing optimized code at run time rather than at compile time. In an earlier post I was already looking into the just-in-time compilation overhead of various OpenCL SDKs. This blog post looks into the cost of launching OpenCL kernels on the CPU and compares with the cost of calling a plain C/C++ function through a function pointer, and with the cost of calling a precompiled Lua script. Continue reading
Multi-Threading in C/C++: Implications on Software Library Design
With the increase in parallelism in response to a stagnation of clock frequencies, software libraries are pushed towards multi-threading. However, there are several different threading approaches out there: The most popular in the C/C++ world are POSIX Threads (pthread), OpenMP, and C++11 threads. Clearly, a good software library does not enforce the use of one particular approach, but is able to deal with (almost) any multi-threading approach. In this blog post I will discuss a possible software library design to achieve this. Continue reading
Raspberry Pi: Interfacing Honeywell Humidity and Temperature Sensors
Recently I was toying with a Raspberry Pi 2 and other hardware to get a better idea about the current status of the Internet of Things. Among several sensors, I was also looking into a Honeywell HIH8131 sensor (around 25 Euros, obtained from Reichelt). Unfortunately, none of the solutions I found on the web for reading the sensor worked for me, so I finally went down into the low-level details of communicating via the I2C bus through the Linux kernel. And I enjoyed it!
Continue reading
Sparse Matrix Transposition: Datastructure Performance Comparison
While processor manufacturers repeatedly emphasize the importance of their latest innovations such as vector extensions (AVX, AVX2, etc.) of the processing elements, proper placement of data in memory is at least equally important. At the same time, generic implementations of many different data structures allow one to (re)use the most appealing one quickly. However, the intuitively most appropriate data structure may not be the fastest. Continue reading
Strided Memory Access on CPUs, GPUs, and MIC
Optimization guides for GPUs discuss in length the importance of contiguous ("coalesced", etc.) memory access for achieving high memory bandwidth (e.g. this parallel4all blog post). But how does strided memory access compare across different architectures? Is this something specific to NVIDIA GPUs? Let's shed some light on these questions by some benchmarks. Continue reading
OpenCL Just-In-Time (JIT) Compilation Benchmarks
The beauty of the vendor-independent standard OpenCL is that a single kernel language is sufficient to program many different architectures, ranging from dual-core CPUs over Intel's Many Integrated Cores (MIC) architecture to GPUs and even FPGAs. The kernels are just-in-time compiled during the program run, which has several advantages and disadvantages. An incomplete list is as follows:
- Advantage: Binary can be fully optimized for the underlying hardware
- Advantage: High portability
- Disadvantage: Just-in-Time compilation induces overhead
- Disadvantage: No automatic performance portability
Today's blog post is about just-in-time (jit) compilation overhead. Ideally, jit-compilation is infinitely fast. In reality, it is sufficient to keep the jit-compilation time small compared to the overall execution time. But what is 'small'?
GPU Research Center at TU Wien
Today it was announced that TU Wien hosts an NVIDIA GPU Research Center, for which Josef Weinbub, Florian Rudolf, and I are PIs. The agenda includes improvements to ViennaCL as well as PETSc, both open source libraries I'm actively involved in. In addition to continued, incremental improvements, we will also look into two interesting research questions related to the numerical solution of partial differential equations. Continue reading
Mentored Project Ideas for GSoC 2014
Our organization Computational Science and Engineering at TU Wien was selected for the Google Summer of Code 2014. Within our organization, a couple of great open source software projects hosted at TU Wien are reaching out to students all over the world for work on free scientific software over the summer. Application deadline for students is on March 21, 2014. The funding provided by Google for the students is again highly appreciated 🙂
This year I'm again mentoring project ideas for ViennaCL, which I'll describe briefly in the following: Continue reading