Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Research Collaboration and Enablement
Research Collaboration and Enablement
Research Articles Alpha-Blending: Quantizing networks without using the STE
  • Research Articles
  • Arm Research - Most active
  • Resources
  • Arm Research Events
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
Research Collaboration and Enablement requires membership for participation - click to join
More blogs in Research Collaboration and Enablement
  • Research Articles

Tags
  • Arm Research
  • Neural Network
  • Machine Learning (ML)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Alpha-Blending: Quantizing networks without using the STE

Matthew Mattina
Matthew Mattina
August 16, 2019
2 minute read time.

Increasingly, intelligent applications are using neural networks at their core to deliver new functionality to users. These applications include language understanding and translation, image recognition, and object tracking and localization. Furthermore, to reduce latency and improve privacy, there is increasing pressure to move these applications out of centralized datacenters and onto embedded devices. While neural networks have demonstrated state of the art accuracy for the types of applications listed above, they require significant memory storage for holding the multitude of parameters (ie neural network weights and activations) needed to deliver this high accuracy. An obstacle to executing these applications on embedded devices is the relatively scarce memory resources present.

At Arm’s ML Research Lab, we’ve been exploring different techniques for reducing the memory requirements of advanced neural networks. One of these techniques is quantization, wherein the neural network weights and activations are stored in a lower bit width format, thereby reducing the overall storage requirements. A common approach is to quantize the IEEE FP32 (32 bits) representation to an 8 bit integer representation, reducing storage requirements by a factor of 4. Quantization can be performed during neural network training by using the 8b integer representation during the *forward pass* execution of the neural network, while performing the gradient update using the FP32 representation. This approach requires differentiation of quantization functions whose derivatives are almost everywhere equal to zero. To avoid this “vanishing gradient” problem, the straight through estimator (STE) is commonly used. The STE replaces the quantization function with the identity function during backpropagation.

We have developed an alternative training recipe for quantizing networks without using the STE. Our method - Alpha-Blending - avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient α and 1−α. During training, α is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, (1−α)w, of the affine combination; the model is converted from full-precision to low-precision progressively. Our results with MobileNet v1 on ImageNet are shown in the table below. Alpha-Blending performs best at very low bitwidths: with weights quantized to 4b and activations quantized to 8b, Alpha-Blending achieves 68.7% top-1 accuracy, which is only 2.2% worse than full FP32 precision. This encouraging result suggests that significant reductions in memory footprints are possible while still retaining high accuracy, thus enabling future neural network-based applications on embedded devices and significantly broadening the scope of the tasks that could be completed by these devices.

Quantization

You can read the full paper below, and lead author Zhi-gang Liu will be presenting this work at the 2019 International Joint Conference on Artificial Intelligence (IJCAI) in Macao, China.

Read the Paper 

Anonymous
Research Articles
  • Overcoming resistance

    Andrew Pickard
    Andrew Pickard
    Improving the characteristics of interconnects as device dimensions scale smaller.
    • September 22, 2022
  • Hands-on experience at Singapore Management University

    Andrew Pickard
    Andrew Pickard
    SMU has been working on the SAP Next-Gen student project, to develop innovative sustainability solutions using SAP software and real-world IoT devices from Arm's partner ecosystem.
    • May 30, 2022
  • Cryptography: what is under the mask?

    Andrew Pickard
    Andrew Pickard
    Sorbonne Université has been using Arm processor source code for modelling and verification on the hardware at the micro-architectural level.
    • May 26, 2022