ARM's new video processor was publicly launched at Computex recently and we introduced a bunch of new technology in that device. Here I'd like to discuss just one of these new features: ARM® Frame Buffer Compression (AFBC).
The backgroundA mid-range smartphone which does not have a built-in camera, instant YouTube access, video recording and playback capabilities is now firmly categorized as an oxymoron. But as we see resolutions increase on handsets from 480p to 1080p and on tablets to 4k and beyond, engineers are battling to handle ever-growing volumes of system data in order to satisfy consumers' continuing demands for higher quality graphics and displays on all sorts of devices. Meanwhile SoCs, especially for mobile devices, are limited by very real thermal, power and area constraints which create complexities for OEMs and developers trying to meet these customer demands.
In digital video, the content is compressed to save on storage and transmission bandwidth. Since the early days of MPEG-1, video codecs have relied on temporal data redundancy to predict the current frame from already decoded frames. A motion vector field specifies how the current frame has moved compared to the previous frame. The process of fetching and filtering blocks of pixels from a previously decoded frame is called motion compensation and a huge amount of memory is required to store the reference frames. For example, storing four 4K resolution frames requires 48 MBytes of memory and clearly it is not feasible to store this amount of data in on-chip memory, so the reference frames have to be stored in power hungry off-chip external memory, e.g. LPDDR2.
In video codecs, reading and writing these reference frames is the major source of memory bandwidth and so it is clear that an effective compression scheme has the potential to reduce bandwidth and therefore system power significantly.
The solutionFor some years, ARM has been leading the industry in the field of data compression. ASTC is an example of where we have produced a compression method adopted as an industry standard (in that case, a lossy texture compression method for graphics). To address the problem described above for video, ARM has produced ARM Frame Buffer Compression (AFBC), a format which is able to provide fast, lossless compression and decompression in real-time, minimizing the amount of data transferred between different IP blocks within the SoC. This reduces the system-wide bandwidth and offers a corresponding power saving of up to 50%. AFBC even has an additional benefit beyond those of many other compression formats of providing random access right down to the 4x4 pixel block level.
Video codecs are generally defined to be bit-exact. To be standard-compliant, a video decoder must produce exactly the same sequence of decoded frames as the golden reference decoder, down to each individual bit. To be clear, the reason for this seemingly excessive accuracy is not so much to please picky videophiles as it is to control error propagation. The motion compensation process uses previously decoded frames to predict the current frame. If we allowed errors, for example by storing the reference frames using lossy compression, those errors would actually stay in the system, and accumulate in an uncontrolled way. For this reason it is absolutely essential to use a lossless compression algorithm like AFBC for the reference frames.
The compression algorithm must also provide random access. Since the motion vector fields are arbitrary, the motion compensation process must be able to access pixels at arbitrary locations. Standard lossless compression formats, like PNG, do not provide random access and are therefore not useful for reference frame compression.
AFBC can be implemented across almost the entire range of multimedia IP within an SoC. An AFBC-capable display controller or an AFBC-capable GPU (such as the next generation of Mali GPUs) can directly read the compressed frames produced by an AFBC-capable video decoder (such as MaliTM-V500). This use of a common format will clearly produce further bandwidth savings, since the frame is stored in compressed format throughout the processing chain.
The resultsThe diagram below shows the reduction of video decoder memory bandwidth provided by AFBC when decoding a 4K H.264 video stream. The blue curve shows the bandwidth when AFBC is not used (for reference). The green curve shows the bandwidth of Mali-V500 when AFBC is used for internal reference frame compression only and the red curve shows the bandwidth when AFBC compression is used for the output frame as well (with an AFBC-enabled display processor).
As you can see, the bandwidth reductions are considerable. The power savings associated with this depend entirely on the design of the SoC and the memory system used, but power for bandwidth used is commonly of the order of 150mW per GByte/s in mobile systems, so the savings are very worthwhile.
Reducing system memory bandwidth is of course, just one element of ARM's overall power reduction strategy — are many other things we do, both large and small, that lead to us having a low-power solution. However, the introduction of AFBC is a major contribution to reducing overall SoC power.
AvailabilityAFBC is now available for all Mali Video Engines, future Mali GPUs and as licensable IP for the display processor. Visit the product site on www.arm.com for more details.