ARM has just announced their submission to Khronos for their OpenCL 1.1 Full Profile conformance tests results for the MaliTM-T604 GPU. This makes ARM the first GPU IP Vendor to submit results for Full Profile. Yes, Full Profile, not Embedded Profile. We did not see the need to compromise, and there are many good reasons, which I'll try to explain in this blog.
Full Profile is the ubiquitous implementation today: all desktops, laptops and servers that support OpenCL implement Full Profile. This means that all OpenCL developers produce code targeting Full Profile. This is what they are accustomed to, ever since OpenCL was first introduced over 4 years ago.
Existing OpenCL software has been written for Full Profile platforms, therefore assuming the presence of a large amount of features that are optional in Embedded Profile. The list of optional features is detailed in the Khronos specification. The most important ones include:
Native support for 64-bit integer maths (including vector data types and operations). Implementing 64-bit arithmetic directly in hardware on the Mali-T600 Series of GPUs makes it radically faster and more efficient than software emulation. This helps in areas such as pointer arithmetic for large address spaces (making you future proof for the post 4 GByte world), and can benefits many applications including multimedia encoders, decoders and encryption software.
Hardware accelerated support for 3D images. Great for volumetric modelling.
Compliance to IEEE 754-2008 precision. That means your code will get the same accuracy on a Mali-T600 Series GPU as any other Full Profile conformant platform.
Built-in atomic functions accelerated in hardware. OpenCL is about parallel computation, a world where atomic operations (and barriers/fences – these are also implemented in hardware on Mali-T600) are commonplace. Resolving this in hardware is far more efficient that any sort of emulation or expensive external memory synchronization.
The ARM OpenCL compiler provides support for both online and offline compilation, and is suitable for both limited-resources mobile devices and larger plugged-in compute systems. Offline compilation can be used as development tool to help programmers optimize their code.
An OpenCL Full Profile driver is more attractive to developers. It makes life easier by providing a functional baseline: new developers do not need to worry about which features are supported and which are not, and what performance you may get, and what precision may be supported. Full Profile lowers the entry point and reduces cost for developers.
And if you have Embedded Profile code – not a problem, this will of course run seamlessly on all the Mali-T600 family of GPUs.
Well, we are very excited about it, and today we are proud to announce this landmark achievement: The Mali-T600 series of processors bring mainstream GPU Computing to the embedded and mobile markets... with no compromises.
1 Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.