Author Archives: Karl Rupp

42 Years of Microprocessor Trend Data

In the following I will provide a frequently requested update to my former 40 Years of Microprocessor Trend Data post. Continue reading →

CfP: High Performance Computing Symposium 2018

Do you have new and exciting research results in the area of high performance computing? Then consider submitting your work to the 26^th High Performance Computing Symposium (HPC 2018)! Full paper submissions (12 pages max.) are due on January 08, 2018. Continue reading →

Join the PETSc User Meeting 2017!

Join us for the PETSc User Meeting 2017 in Boulder, Colorado. This meeting will bring together current and prospective users as well as developers of PETSc. As in previous editions, the focus is on interaction and the exchange of ideas on how to get your computations running. Continue reading →

PhD Student Position in Scientific Computing on Many-Core Architectures

My colleague Josef Weinbub and I are looking for a motivated PhD student to join our efforts as a research assistant at the Institute for Microelectronics, TU Wien. If you

have recently completed or expect to soon complete your Master's degree in mathematics, computer science, or a related discipline
have previous exposure to OpenCL or CUDA
enjoy working on open source software

then apply via email to manuela.reinharter@tuwien.ac.at no later than Wednesday, November 9, 2016. Use the code "307.8.2" to reference this position.
Continue reading →

Sparse Matrix-Matrix Multiplication on Intel Xeon and Xeon Phi (KNC, KNL)

The latest incarnation of the Intel Xeon Phi product line, codename Knights Landing (KNL), is becoming more broadly available. As such, there is a lot of interest in how it performs, particularly when compared to other contenders in the high performance computing landscape. I have posted STREAM benchmark results for KNL earlier in my blog, which outlined the potential benefit of the high bandwidth memory (MCDRAM) of KNL. Let us have a look at a more complicated operation, which is neither limited by raw compute power nor by raw memory bandwidth: sparse matrix-matrix multiplication (aka. sparse matrix-matrix products). Continue reading →

Computational Science And Engineering Software Sustainability And Productivity (CSESSP) Challenges

Is writing code essential for your every-day work, be it research, development, or engineering? Did you have to take a coding shortcut here or there to get the task done? Do you think that your code would need a bit more polishing before handing it out to other people? If you answer all these questions with yes, you're not alone! Continue reading →

CfP: High Performance Computing Symposium 2017

Do you have new and exciting research results in the area of high performance computing? Then consider submitting your work to the 25^th High Performance Computing Symposium (HPC 2017)! The optional abstract submissions are due on October 15, 2016. Full paper submissions (8 pages max.) are due on December 15, 2016. Continue reading →

FLOPs per Cycle for CPUs, GPUs and Xeon Phis

My popular blog post on CPU, GPU and MIC Hardware Characteristics over Time has just received a major update, taking INTEL's Knights Landing and NVIDIA's Pascal architecture into account. Moreover, I added a comparison of FLOPs per clock cycle, which I want to discuss in slightly greater depth in this blog post. Continue reading →

Organizing a Conference: My Experiences from the PETSc User Meeting 2016

The PETSc User Meeting 2016 took place in Vienna, Austria, from June 28-30, 2016. Overall, the feedback received from the delegates was very positive; which I was glad to hear as main organizer of the event. In the following I want to share what I consider to be key factors for success and lessons learnt. Continue reading →

Knights Landing vs. Knights Corner, Haswell, Ivy Bridge, and Sandy Bridge: STREAM benchmark results

The Knights Landing (KNL) update of Intel's Xeon Phi product line is now available. For the applications I'm primarily interested in, namely the numerical solution of partial differential equation, the typical bottleneck is memory bandwidth. To assess memory bandwidth, the STREAM benchmark is the de-facto standard, so let us have a look at how KNL compares to the previous Xeon Phi generation (Knights Corner, KNC) as well as to the Xeon product line.
Continue reading →