At a recent press event in China, Huawei announced their latest flagship smartphone, the Honor 7. This device is special. It marks a major milestone in the use of innovative heterogeneous computing technologies.
Initial sales figures indicate the device is doing pretty well: a record 200,000 units sold in two minutes in the first flash sale (that is 1,667 handsets a second!), followed by 9 million units pre-ordered in the first week. There are also plans to make this available in Europe, I look forward to get my hands on one.
Powering the Honor 7 is the in-house Kirin 935 SoC, featuring an octa-core 64-bit system with big.LITTLE technology (two quad-core ARM Cortex-A53 processor clusters) and a Mali-T628 MP4 GPU.
Here is the official launch video:
Among the many leading features, the device includes a 20-megapixel camera with f/2.0 aperture and phase detection auto-focus, which allows the camera to focus in just 0.1 seconds, and an 8-megapixel front camera with fixed focus and f/2.4 aperture. What is really special about this phone, is every time you take a photo, the Mali GPU processes it to improve its appearance.
The OpenCL standard API has been used by Huawei to offload key image processing steps onto the GPU. Inside the camera stack the processing is optimally balanced between the CPU and the GPU. Using the processor that is most suited to each step ensures increased efficiency.
This is a break away from the common approach of using dedicated h/w IP, and has enabled a market leading OEM such as Huawei to work on advancing the algorithm with new techniques and optimizations all the way to the device launch. And of course further improvements can still be rolled out with over the air updates whilst the device are in the field, whereas hardware updates are not possible.
ARM and Huawei engineers have collaborated very closely in this project. Key algorithms such as de-noising have been ported to the Mali architecture using OpenCL and optimized at micro and macro level to operate more efficiently. Now every photo the user takes with this device is processed through the Mali GPU, in real time.
De-noise may sound diminutive, however it can indeed be a very complex nut to crack, in particular in challenging lighting conditions, which is where most noise occurs, and where most photos are taken.
Any implementation of a de-noise pipeline normally includes a mix of the common steps, for example: Haar features detection, Gaussian blurring, Sobel operators, biliteral filtering, down and up-scaling applied to various channels, all interleaved in more or less complex pipelines, depending on the implementation. These type of filter chains can easily exceeded 20 stages, and need to operate on high resolution images and critically, in real time and to a sensible power overhead.
On paper each individual block would seem suitable for GPU acceleration. A lot of work went into getting this pipeline optimized, as well as fine tuning interoperation between CPU and GPU, and integrating this new functionality in the existing device camera framework.
Although using an ISP, or any dedicated hardware for the matter, can often provide advantages in power, performance and area for the specific use case it is designed for, it also often has and important shortcoming: limited flexibility. Fixed logic cannot be changed once is it committed to silicon. The algorithm in question was modified, improved, developed aggressively right to the wire. The choice of using OpenCL on the Mali GPU has essentially enabled the OEM, Huawei in this instance, to give the end user a better camera experience, using existing hardware. In addition to superior image quality, in their public launch Huawei claimed a performance improvement of 2x through the use of the GPU, so faster pictures too!
The Honor 7 device marks a major adoption milestone for ARM’s vision of heterogeneous computing and GPU compute. But it is just the latest milestone of a fantastic journey in the adoption of this technology. Here is a refresher of what took us here.
GPU Compute in mobile and embedded systems was introduced in shipping devices in late 2012, when the Google Nexus 10 tablet was launched with RenderScript support on the Mali-T604 GPU.
Since then a large number of partners have endorsed this technology to improve a variety of end-user applications.
In early 2013 many partners (I have personally being involved in excess of 25 independent projects) have gradually enabled their technologies to use OpenCL on Mali. We have since seen:
This snowball was certainly rolling! And much continued to happen early this year:
At CES 2015 Omnivision announced the availability of an advanced imaging library aimed at complementing their camera module ISP using the Mali GPU. The library includes advanced imaging features such as 3D noise suppression, chroma noise reduction, de-fringe and de-haze and is targeted at smartphones and tablets.
At the Mobile World Congress in Barcelona earlier this year ArcSoft demonstrated their latest middleware camera products running on MediaTek MT6752 based chip. This included pre-ISP image stabilization, real-time dynamic video HDR and other camera middleware with the objective of bringing to the mass market features that are typically available at the high end, this made possible by the use of GPU Compute.
ARM has recently taken part in the Embedded Vision Alliance Summit in Santa Clara, where we hosted a workshop on computer vision on ARM based system. Fotonation, Morpho, ArcSoft, Wikitude and many others have discussed how they have been using NEON and Mali OpenCL to improve their latest products. Our own Gian Marco Iodice and timhar01 have also detailed how Mali can help with problems such as stereo processing and efficiently use heterogeneous computing. You can access the proceeding of this event here.
OEMs have an important challenge in their hands: how to deliver the best user experiences whilst respecting the lowest possible energy budgets. Huawei’s approach using GPU Compute in their camera, illustrates the merits of this technology. This is the latest exciting step in the adoption of this technology.
Beyond smartphone cameras, GPU compute enables a new field of innovation and new customer experiences driven by real-time visual computing. Target applications include computational photography, computer vision, deep learning, and the enablement of new emerging multimedia codecs and algorithms. So far we have only scratched the surface of the potential that GPU Compute can deliver. ARM is always collaborating with new partners in exciting use-cases areas and I personally look forward to continue this journey and see GPU Compute proliferate and delivering more and more benefits to users.