A high-quality publication not only has good content, but also takes care of the tiny details. In an earlier blog post we looked at how to embed fonts in a PDF, today we look at PDF metadata which specifies properties such as the author, the title, a subject, and keywords. Setting the PDF metadata correctly will make it easier for search engines to find and correctly advertise your work, so spending a few minutes on setting the metadata correctly is time well spent.
Continue reading
GPU Memory Bandwidth vs. Thread Blocks (CUDA) / Workgroups (OpenCL)
The massive parallelism of GPUs provides ample of performance for certain algorithms in scientific computing. At the same time, however, Amdahl's Law imposes limits on possible performance gains from parallelization. Thus, let us look in this blog post on how *few* threads one can launch on GPUs while still getting good performance (here: memory bandwidth). Continue reading
OpenCL Just-In-Time (JIT) Compilation Benchmarks
The beauty of the vendor-independent standard OpenCL is that a single kernel language is sufficient to program many different architectures, ranging from dual-core CPUs over Intel's Many Integrated Cores (MIC) architecture to GPUs and even FPGAs. The kernels are just-in-time compiled during the program run, which has several advantages and disadvantages. An incomplete list is as follows:
- Advantage: Binary can be fully optimized for the underlying hardware
- Advantage: High portability
- Disadvantage: Just-in-Time compilation induces overhead
- Disadvantage: No automatic performance portability
Today's blog post is about just-in-time (jit) compilation overhead. Ideally, jit-compilation is infinitely fast. In reality, it is sufficient to keep the jit-compilation time small compared to the overall execution time. But what is 'small'?
Embed all fonts in PDFs generated with LaTeX or PDFLaTeX
For high quality publications it is absolutely mandatory to embed all fonts in the respective PDF. If a PDF does not embed all fonts, the target system may replace the respective font with the 'best' available system font, so the document is almost certain to look different on different machines. Not quite what you want from a portable document standard, is it?
In the following I will explain how you can make sure that all fonts are embedded in your LaTeX documents (journal papers, conference contributions, flyers, etc.). Some tricks will even apply to PDFs in general and not be specific to LaTeX. There is also a small tarball available for download with all the details so that you can reproduce (and use for copy&paste) the subsequent discussion.
My Science Blog with Updates each Week in 2016
A vital part in science is to establish new knowledge, which only becomes accessible to the community through communication. Journal papers, conference contributions, etc. are traditional means to communicate new knowledge, but other communication channels have emerged through the Internet. One of these new means of communication are blogs, just like the one you are currently reading. My New Year's resolution of 2016 is to publish at least one blog post per calendar week, mostly covering the topics discussed in the following.
CfP: Intl. Workshop on OpenCL 2016 (IWOCL 2016)
The International Workshop on OpenCL (IWOCL) is an annual meeting bringing together the experts on OpenCL, an open standard for programming heterogeneous parallel computing systems. In 2016 IWOCL will run its fourth installment from April 19-21 in Vienna, Austria, and as local chair of IWOCL 2016 I'm proud to share the IWOCL 2016 Call for Papers, Technical Submissions, Tutorials and Posters. Continue reading
GPU Research Center at TU Wien
Today it was announced that TU Wien hosts an NVIDIA GPU Research Center, for which Josef Weinbub, Florian Rudolf, and I are PIs. The agenda includes improvements to ViennaCL as well as PETSc, both open source libraries I'm actively involved in. In addition to continued, incremental improvements, we will also look into two interesting research questions related to the numerical solution of partial differential equations. Continue reading
40 Years of Microprocessor Trend Data
One of the most popular plots when it comes to technologic advancements in microprocessors in general and Moore's Law in particular is a plot entitled 35 Years of Microprocessor Trend Data based on data by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten. Later, trend lines with some (speculative) extrapolation were added by C. Moore. One can find the plot with and without trend lines at various locations in the web (and further down). However, the plot suffers from one the sands of time: Data is only plotted up until the year 2010, missing out the last five years. Continue reading
STREAM Benchmark Results on Intel Xeon and Xeon Phi
While the number of floating point operations per second (FLOPs) is often considered to be the primary indicator for achievable performance, in many important application areas the limiting factor nowadays is memory bandwidth (cf. memory wall). The standard benchmark to measure memory bandwidth is the STREAM benchmark. Despite its simplicity of 'just simple vector operations', the benchmark is a very helpful indicator for actual application performance. Continue reading
IuE Summer of Code 2015
In response to the success of our organization Computational Science and Engineering at TU Wien at recent editions of the Google Summer of Code, I'm happy to announce the IuE Summer of Code 2015!
The IuE Summer of Code will be similar in spirit to the Google Summer of Code, but with a focus on students from the TU Wien, particularly from electrical engineering (Since the Institute for Microelectronics (IuE) is part of the faculty of electrical engineering and IT). The range of projects includes semiconductor device simulation, high performance computing, or meshing, to name a few. Full details will be announced at the Info-Meeting on March 9, 2015, at 15:00 in lecture hall EI 3A.