Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Research Collaboration and Enablement
Research Collaboration and Enablement
Research Articles Alpha-Blending: Quantizing networks without using the STE
  • Research Articles
  • Arm Research - Most active
  • Arm Research Events
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
Research Collaboration and Enablement requires membership for participation - click to join
More blogs in Research Collaboration and Enablement
  • Research Articles

Tags
  • Arm Research
  • Neural Network
  • Machine Learning (ML)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Alpha-Blending: Quantizing networks without using the STE

Matthew Mattina
Matthew Mattina
August 16, 2019
2 minute read time.

Increasingly, intelligent applications are using neural networks at their core to deliver new functionality to users. These applications include language understanding and translation, image recognition, and object tracking and localization. Furthermore, to reduce latency and improve privacy, there is increasing pressure to move these applications out of centralized datacenters and onto embedded devices. While neural networks have demonstrated state of the art accuracy for the types of applications listed above, they require significant memory storage for holding the multitude of parameters (ie neural network weights and activations) needed to deliver this high accuracy. An obstacle to executing these applications on embedded devices is the relatively scarce memory resources present.

At Arm’s ML Research Lab, we’ve been exploring different techniques for reducing the memory requirements of advanced neural networks. One of these techniques is quantization, wherein the neural network weights and activations are stored in a lower bit width format, thereby reducing the overall storage requirements. A common approach is to quantize the IEEE FP32 (32 bits) representation to an 8 bit integer representation, reducing storage requirements by a factor of 4. Quantization can be performed during neural network training by using the 8b integer representation during the *forward pass* execution of the neural network, while performing the gradient update using the FP32 representation. This approach requires differentiation of quantization functions whose derivatives are almost everywhere equal to zero. To avoid this “vanishing gradient” problem, the straight through estimator (STE) is commonly used. The STE replaces the quantization function with the identity function during backpropagation.

We have developed an alternative training recipe for quantizing networks without using the STE. Our method - Alpha-Blending - avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient α and 1−α. During training, α is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, (1−α)w, of the affine combination; the model is converted from full-precision to low-precision progressively. Our results with MobileNet v1 on ImageNet are shown in the table below. Alpha-Blending performs best at very low bitwidths: with weights quantized to 4b and activations quantized to 8b, Alpha-Blending achieves 68.7% top-1 accuracy, which is only 2.2% worse than full FP32 precision. This encouraging result suggests that significant reductions in memory footprints are possible while still retaining high accuracy, thus enabling future neural network-based applications on embedded devices and significantly broadening the scope of the tasks that could be completed by these devices.

Quantization

You can read the full paper below, and lead author Zhi-gang Liu will be presenting this work at the 2019 International Joint Conference on Artificial Intelligence (IJCAI) in Macao, China.

Read the Paper 

Anonymous
Research Articles
  • HOL4 users' workshop 2025

    Hrutvik Kanabar
    Hrutvik Kanabar
    Tue 10th - Wed 11th June 2025. A workshop to bring together developers/users of the HOL4 interactive theorem prover.
    • March 24, 2025
  • TinyML: Ubiquitous embedded intelligence

    Becky Ellis
    Becky Ellis
    With Arm’s vast microprocessor ecosystem at its foundation, the world is entering a new era of Tiny ML. Professor Vijay Janapa Reddi walks us through this emerging field.
    • November 28, 2024
  • To the edge and beyond

    Becky Ellis
    Becky Ellis
    London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
    • November 5, 2024