Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Research Collaboration and Enablement
Research Collaboration and Enablement
Research Articles Reducing the Cost of Neural Network Inference with Residue Number Systems
  • Research Articles
  • Arm Research - Most active
  • Resources
  • Arm Research Events
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
Research Collaboration and Enablement requires membership for participation - click to join
More blogs in Research Collaboration and Enablement
  • Research Articles

Tags
  • Arm Research
  • Machine Learning (ML)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Reducing the Cost of Neural Network Inference with Residue Number Systems

Matthew Mattina
Matthew Mattina
August 21, 2020
5 minute read time.

The size and computational complexity of neural network models continues to grow exponentially. The reason for this growth is easy to understand; generally, larger neural networks deliver higher accuracy on many image and language tasks that users care about. For example, the recent GPT-3 transformer-based neural network from OpenAI has over 175 billion parameters, and generates human-level text. However, the increase in the computational requirements when executing (inferencing) these massive networks presents a major challenge to their adoption. This challenge is one of the primary avenues of research being pursued by Arm’s Machine Learning Research Lab. Our lab is focused on finding novel ways to efficiently execute advanced machine learning models on Arm-based embedded and mobile platforms. To this end, we have published various research, ranging from AutoML for deeply embedded devices, novel factorization schemes, and hardware designs for executing compressed models.

Combining low-precision and complexity-reducing techniques

Our recent paper, which will be presented at ECCV in August, attacks the computational problem from a different angle. It is well established that the use of low-precision numbers—such as INT8 parameters and computation--significantly reduces the power, memory, and execution-time requirements for advanced neural networks. It is also well known that transform techniques—in particular, the Winograd transform—can be used to significantly reduce the number of arithmetic operations required for the execution of these networks.

However, the combination of these two techniques – low-precision representation and the complexity-reducing Winograd transform – has, until now, resulted in an unacceptably high loss in prediction accuracy. The loss in accuracy arises due to numerical problems that occur when performing the transform operations required by the Winograd algorithm. As can be seen in the following Figure, several transform coefficients are either very large or very small, and thus cannot be accurately represented with INT8 precision.

10 x 10 convolution

Figure 1. The 10 x 10 convolution y (in brown, far right) of 12 x 12 input d (in blue, far left) and 3 x 3 kernel g (in green, center) 

y = AT ((BT dB) ⊙ (GgGT)) A

Where

Convolution with Winograd

Maintaining prediction accuracy using a Residue Number System (RNS)

We have developed a technique that allows the complexity-reducing Winograd transform to be applied to convolutional neural networks with INT8 parameters. The foundation of our technique is the use of a residue number system (RNS). An RNS is used to represent integers by their values modulo pairwise co-prime integers, as shown in Figure 2. The RNS representation enables us to perform the transformations and operations required to execute the network in the Winograd domain, without suffering the numerical problems (underflow and overflow) that typically result in a loss of prediction accuracy. This means that the resulting lower-complexity network incurs no degradation of prediction accuracy compared to the original INT8 network.

RNS(m0, m1,...mn-1)
          An integer x can be represented by remainder set
                    {x mod(m0), x mod(m1), ... x mod (mn-1)}
          where moduli {mi} are pairwise co-prime
Arithmetic operations in RNS: Addition(+), Subtract(-) and Multiply(*)
          x = {x0, x1,...xn-1} and {y0, y1,...yn-1} ε RNS (m0, m1,...mn-1)
                    x ± y = { x0 ± y0, x1 ± y1, ... xn-1 ± yn-1}
                    x * y = { x0 * y0, x1 * y1, ... xn-1 * yn-1}
Division x/y in RNS {mi} is well-defined if y is co-prime to moduli {mi}
                    x/y = x * y-1 mod{mi}
                    where y-1 * y = 1 mod{mi}
                    y-1 is the multiplicative inverse of y

Figure 2: RNS representation of integers by their values modulo pairwise co-prime integers

The following equation shows the same computation for the MxM output y as was shown in Figure 1, except in Figure 3 the calculation is performed using RNS(247, 251, 253). The weight, activation, and output transform matrices for RNS(253) are shown. As shown, the transform coefficients (G, B, A matrices) can all be represented precisely with an INT8 representation, and y, (the result of the convolution) can be reconstructed using either the Chinese Remainder Theorem or Mixed Radix Conversion.

Winograd convolution table

Winograd convolution table 2

Figure 3. The Winograd convolution F (10x10,3x3) over RNS (247,251,253)

In Table 1, we present the speedup achieved on different layers of the VGG16 convolution neural network using our RNS-based Winograd convolution with ImageNet dataset, compared to the baseline INT8 and INT16 approaches. As shown, we achieve around a 2x speedup over the standard im2col+GEMM implementation on an Arm Cortex-A73 platform with our residual number system-based Winograd approach. We anticipate that speedups of this magnitude will enable the next generation of advanced convolution neural networks for image, video, and speech applications to execute efficiently on embedded and mobile platforms.

Inference performance table

Table 1: Inference performance of 8-bit activation and 8-bit weight quantized CNN layers of VGG16 with Winograd algorithm F(14 14; 3 3) over RNS(251,241,239) and RNS(4001,4331) on Arm Cortex-A73, having 71.4% top-1 prediction accuracy with ImageNet dataset. The corresponding transforms are in the supplementary materials. The speed-up of RNS(251,241,239) and RNS(4001,4331) are the runtime improvement relative to the standard INT8 and INT16 Im2col+GEMM convolution baselines respectively.

Find out more

Zhi-Gang Liu from Arm’s ML Research Lab presented the details of this research at ECCV - take a look at the full paper to learn more.

Discover more about ML Research at Arm Read the full paper

If you enjoyed this post...

Take a look at some of the other blogs published recently by our Machine Learning researchers:

  • Adapting Models to the Real World: On-Device Training for Edge Model Adaptation by Mark O'Connor 
  • Even Faster Convolutions: Winograd Convolutions meet Integer Quantization and Architecture Search by Javier Fernandez-Marques
  • SCALE-Sim: A cycle-accurate NPU simulator for your research experiments by Paul Whatmough
Anonymous
Research Articles
  • Overcoming resistance

    Andrew Pickard
    Andrew Pickard
    Improving the characteristics of interconnects as device dimensions scale smaller.
    • September 22, 2022
  • Hands-on experience at Singapore Management University

    Andrew Pickard
    Andrew Pickard
    SMU has been working on the SAP Next-Gen student project, to develop innovative sustainability solutions using SAP software and real-world IoT devices from Arm's partner ecosystem.
    • May 30, 2022
  • Cryptography: what is under the mask?

    Andrew Pickard
    Andrew Pickard
    Sorbonne Université has been using Arm processor source code for modelling and verification on the hardware at the micro-architectural level.
    • May 26, 2022