Welcome to my series of blogs on engineering software for accelerated systems! Special-purpose hardware designed to execute certain computations (prominently, GPUs designed to execute graphics computations) is expected to provide better performance than general-purpose hardware (prominently, CPUs). Better performance typically means faster or accelerated execution, but often means lower energy consumption as well. Expectations of better performance of course imply that software is also up to scratch.
As a way of introduction, I have been working on software for accelerated systems for over ten years, first with CPU vector extensions like ARM® NEON™ technology, then with vector co-processors like ClearSpeed CSX and Cell SPE, and more recently with GPUs supporting parallel computations like ARM® Mali™ GPUs. I have experienced switching from just using vendor-specific APIs to both implementing and using vendor-independent standards such as OpenCL™. Also, I have experienced both working in academia and industry which is bound to affect what I am going to write about.
I am aiming at engineering-minded people out there, so you should expect facts and informed opinions, no hype, no politics.
Following this picture
I am telling you there is a better way of engineering software for accelerated systems.