This year’s High Performance Graphics event takes place in LA 28th-30th July 2017. It brings together the very best of the graphics industry’s researchers, engineers, and innovators to examine the latest innovations in graphics algorithms, architectures, and implementations. As you know, Arm’s Mali High Performance GPUs feature cutting edge graphics technology designed for all the latest high-end use cases from high fidelity mobile gaming to virtual and mixed realities and this year’s hottest topic: machine learning on mobile devices. These technologies wouldn’t be supportable on mobile devices without our team of experts utilising a deep understanding of the requirements and limitations of the mobile form factor.
Those of you lucky enough to attend this year’s event will have a unique opportunity to chat to Arm graphics gurus Hardik ShArma, Tom Olson, and Alex Chalfin who will be hosting a poster session about the awesome work they’ve been doing to explore the compression of deep neural networks with ASTC. ASTC, or adaptive scalable texture compression, is a technology developed by Arm in collaboration with partners, to provide a standardized compression format for use across a wide range of operating systems and APIs.
Neural network weights represent a significant proportion of the memory footprint of the overall network. Applying compression to the weights is a way of enabling larger networks on the edge device which would have been prohibitively large pre-compression. For the AlexNet network, the uncompressed weight size is 240MB while the compressed network using ASTC results in a 27MB footprint with no accuracy loss. As the compression reduces the memory traffic it also provides a power and performance benefit crucial for mobile.
Many network implementations benefit from the simplicity of dense representations and parallel processing afforded by modern GPU and CPU clusters. Applying compression, while maintaining the after mentioned characteristics, is a challenge shared by textures used in GPUs for adding fidelity to computer graphics. Texture compression requires random access and efficient hardware decoders to be viable in hardware.
In practice, there is a significant difference between weights in a neural network and texture data. Neural network weights tend to be spatially uncorrelated, particularly for fully connected classification layers. Essentially, this means textures tend to look like natural images while network weights, especially in later layers of the network, tend to appear more like random noise.
This research targets the fully connected layers as we recognize that they provide the biggest challenge due to the data being uncorrelated, as opposed to typical convolutional layers which have fewer weights than fully connected layers, but are also more image based.
Most texture compression schemes assume spatially correlated inputs, and compress the data by removing redundancy from the encoded image. In this respect, ASTC is no different. What makes ASTC especially well-suited to weight compression is its support for single-channel image data, and its ability to adaptively allocate bits between local endpoints (min/max within a small region) and per-weight offsets (coefficients used to interpolate between local endpoints).
Encoding with ASTC introduces artifacts into the compressed weights. Current state of the art research has demonstrated that networks are resilient to artifacts introduced by quantization with a retraining cycle. The assertion serving as the impetus behind this research is that networks should therefore also be resilient to the artifacts introduced by the ASTC compression process.
Inserting ASTC into training provided some practical challenges. Starting with a pre-trained, float32 network, the simple approach is to use a standard ASTC encode/decode cycle to modify the weights prior to producing the gradient for back propagation. Including an ASTC encode/decode cycle in the retraining iterations proved to be prohibitive due to the expense of ASTC encode, adding multiple hours to the retraining cycle. The expense of ASTC encode is related to the selection of the partition function. Attempts to decouple the ASTC partition selection at the start of the retraining process and allowing the retraining to change only the endpoint values yielded poor results. The subsequent approach is to include a full ASTC encode after a fixed number of iterations of retraining. Doing so allows the optimal partition function as well as optimal endpoints for the network.
The results are compelling. The compression ratio is reasonably high, the retraining cost, while somewhat expensive due to ASTC encode time, is not prohibitive. This allows us to exploit and utilise much bigger, more complex neural networks for on-device inference. The ability to perform these functions reduces the number of times the data has to go to and from the cloud, reducing both latency and security concerns. Given the wide adoption of ASTC in already shipping silicon, this research can enable creative use cases on mobile devices when previously they may have been prohibitive. We look forward to seeing you at this year’s High Performing Graphics event, as well as HotChips 2017, where you can pick our experts’ brains!