Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • SystemReady Certification
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Research Collaboration and Enablement
Research Collaboration and Enablement
Research Articles Efficient Hardware for Mobile Computer Vision via Transfer Learning
  • Research Articles
  • Arm Research - Most active
  • Resources
  • Arm Research Events
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
Research Collaboration and Enablement requires membership for participation - click to join
More blogs in Research Collaboration and Enablement
  • Research Articles

Tags
  • Arm Research
  • Neural Network
  • Machine Learning (ML)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Efficient Hardware for Mobile Computer Vision via Transfer Learning

Paul Whatmough
Paul Whatmough
April 1, 2019
4 minute read time.

Mobile computing is on the rise, and currently moving into some really exciting new applications and form factors ­– augmented reality (AR) glasses, unmanned aerial vehicles (UAVs), automated driver assistance systems (ADAS) in automobiles, and more.

See Arm Community articles exploring different applications of mobile computing, such as:

      Enabling Augmented Reality Mobile Apps through Low Power Machine Learning

      Inside Microsoft's Hololens 2

      Not just droning on! The rise of Kinibi-M

      Advances in ADAS – Getting Closer to the Self-Driving Car

One interesting trait that these applications tend to share in common is a ‘real-time’ performance requirement. The term ‘real-time’ means that the computing hardware needs to guarantee that it responds within a certain specified time period.  For example, in the case of AR glasses, the vision system needs to meet a minimum frame-rate in order to provide a convincing experience as the user moves their head around. Or, in the case of ADAS applications in the automotive industry, the real-time latency must be extremely low in order to ensure that any changes in the environment, such as another car overtaking, are quickly conveyed to the system. To make matters worse, on top of the real-time performance constraint, the majority of these platforms are also heavily energy constrained. For example, the power budget for the real-time vision system in AR glasses could be as low as 1W. 

One of the biggest challenges we face in meeting these real-time throughput and energy constraints is computer vision (CV) algorithms. In recent years, CV workloads have become heavily reliant on machine learning algorithms such as neural networks (NNs) and have become prevalent in emerging mobile computing applications. In fact, NNs have become such an important workload that Arm have introduced the ML Processor, a dedicated hardware processor to accelerate and reduce the power consumption of NN workloads specifically. For more details on the Arm ML Processor, I’d highly recommend Ian Bratt’s excellent talk at the Hot Chips conference.

Specialization and Transfer Learning

 Figure 1: Visualization of the low-level features typically learnt by CNNs trained on natural images.  Reproduced from Yosinski et al., 2014

At the Arm ML Research Lab, we are focused on enabling NN workloads on constrained hardware platforms, including real-time and low-energy systems. One option to improve the efficiency of the hardware is to design fixed-function hardware that performs inference on a single network for a single application. There are severe limitations to this approach. Although it drastically increases efficiency, in doing this, we lose flexibility, and the hardware is unlikely to be useful on new application datasets in the future. This tension between efficiency and flexibility is a common theme in computer architecture.

In grappling with the challenges of hardware specialization, we recently took inspiration from the machine learning community, by means of the concept of transfer learning – an interesting property of NNs. Transfer learning shows that it is possible to reuse the early layers of a network trained on task A for a different network trained on task B. There are some limitations on this, such as that task A and task B must be from a similar problem domain, for example both being image classification problems. Even with this caveat, transfer learning is a powerful concept.  A simple interpretation of this is that the front layers of vision NNs are very similar. For example, Figure 1 shows a visualization of the filters learnt by the early layers of a convolutional neural network (CNN). These features are extremely common to CNNs trained with natural images. If we circle back to the specialization discussion earlier, I hope it becomes clear that there is an opportunity to specialize the hardware that processes these early layers, without losing the flexibility to tackle new datasets.

The FixyNN Architecture

FixyNN

Figure 2: A simplified FixyNN concept

One of our technical focusses lately has been around jointly co-designing the NN model architecture and the hardware architecture. Traditionally, one team will design the NN architecture, and another team will design the hardware. Our finding was that considering the design of both together at the system level delivered some interesting results. FixyNN is an example of what is possible with the co-design approach.

Let’s dive in and take a look at Figure 2, which shows the simplified FixyNN concept.  The CNN is split into two pieces, a fixed front-end feature extractor which is shared by all tasks, and a programmable back-end section which is trained specifically for each task. In this arrangement, the hardware used to implement the common front-end layers can be heavily optimized. The weights are fixed in hardware and no longer need to be loaded from main memory.  The result of all this is that the shared front-end becomes very fast, whilst remaining low energy!

Please do check out our paper for more details. My co-authors are Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo and Matthew Mattina. The findings we report show that the FixyNN architecture can achieve nearly 2× greater energy efficiency compared to a conventional programmable CNN accelerator of the same silicon area. On top of this, we demonstrate that flexibility is not sacrificed – we were able to train a suite of six datasets via transfer learning with an accuracy loss of < 1%. If you’re interested in exploring this in more detail, we’ve also open sourced our tools for automatically generating hardware for fixed neural networks, which is called DeepFreeze.

Read the Paper      Access DeepFreeze Tools

I’ll be presenting more details of FixyNN at the SysML conference this week. SysML is a new conference providing a venue for systems research in the area of machine learning, and I’m really excited to see what’s going on in the field!

Anonymous
Research Articles
  • An injection of ingenuity

    Andrew Pickard
    Andrew Pickard
    A group of French researchers are collaborating on Arm IP to build defences against one particular type of attack, fault injection.
    • September 15, 2023
  • Overcoming resistance

    Andrew Pickard
    Andrew Pickard
    Improving the characteristics of interconnects as device dimensions scale smaller.
    • September 22, 2022
  • Hands-on experience at Singapore Management University

    Andrew Pickard
    Andrew Pickard
    SMU has been working on the SAP Next-Gen student project, to develop innovative sustainability solutions using SAP software and real-world IoT devices from Arm's partner ecosystem.
    • May 30, 2022