Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Research Collaboration and Enablement
Research Collaboration and Enablement
Research Articles Adapting Models to the Real World: On-Device Training for Edge Model Adaptation
  • Research Articles
  • Arm Research - Most active
  • Resources
  • Arm Research Events
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
Research Collaboration and Enablement requires membership for participation - click to join
More blogs in Research Collaboration and Enablement
  • Research Articles

Tags
  • Arm Research
  • Neural Network
  • Machine Learning (ML)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Adapting Models to the Real World: On-Device Training for Edge Model Adaptation

Mark O'Connor
Mark O'Connor
July 15, 2020
4 minute read time.

From driver-assistance features keeping us safe on the road to asking our phones to set a reminder, neural networks are powering an increasing proportion of the computer interaction in our lives.

One drawback of all these neural network models, however, is that, typically, they are ‘one size fits all’. A neural network model learns to minimize error over all the data. But for many use cases, the only kind of error that matters to me as a user is the error on my data. If you are part of the majority you might never notice a problem, but neural networks will frequently perform worse for minorities, resulting in a particularly insidious form of technological inequality.

We have recently begun investigating ways to solve this problem. Fundamentally, the capacity of a neural network model capacity is a finite resource. We try to train models to equally represent data from a wide range of potential users:

A diagram representing train models to equally represent data from a wide range of potential users.

In practice, however, many deployments will mostly be used by one user (or with one microphone, or in one location):

A diagram showing that many deployments will mostly be used by one user.

No dataset can be perfectly fair or balanced and minorities will always be under-represented. However, when I talk to my phone, I want it to recognize my accent. If it does so at the expense of being less accurate on a strong Australian accent, that’s fine with me. The same is true of many kinds of image, audio and video tasks.

What if there were a way to adapt a model to devote more of its capacity to minimizing the error on the examples it actually sees in real-world use? Could it learn to be more accurate?

Adapting a model to devote more of its capacity to minimizing the error on the examples it actually sees in real-world use.

This might result in a different version of the model for each user:

Different models present for each user when adapted.

Alternatively, instead of increasing the accuracy, could we achieve higher model compression if we knew more about the real-world distribution of inputs?

Instead of increasing the accuracy, we could achieve higher model compression if we knew more about the real-world distribution of inputs.

There are many ways to attempt to solve this problem. We recently completed some research into one approach based on edge distillation.

Learning without Labels

The biggest challenge to learning on the edge is that nobody wants to sit down and provide a written transcript of every command they give their phone, or sort through their entire photo album, tagging every family by hand. In most situations, there are no “correct” labels available to learn from.

We looked at a technique that side-steps this issue by using an on-device teacher. The principle here is to deploy not one, but two neural network models onto the device:

  1. The runtime model, which is highly optimized for low-latency inference.
  2. A teacher model, which is larger and more accurate but much too slow to run in real-time.

Because the teacher model is less capacity-constrained than the runtime model, it can better capture data from all users and not just the majority.

During normal use, the device uses the runtime model to give real-time feedback to the user and saves samples of its inputs locally.

During downtime (for example, while charging or overnight), the device uses the teacher model to generate more accurate predictions for these sampled inputs. It then trains the real-time model to match the teacher’s predictions. All the data remains on-device, ensuring privacy and eliminating the need for an active internet connection.

How Well Does This Work?

We evaluated this using keyword recognition on the Google Speech Commands dataset. Three baseline neural networks were evaluated to investigate how robust different architectures are to this approach. In each case, the teacher was a much larger and more accurate recurrent neural network model that is unsuitable for real-time use for this low-power, always-on application.

A diagram showing the relative error reduction in  using on-device adaption.Across all speakers, using on-device adaptation reduced error rates by an average of 15%. For some speakers this was far higher, suggesting some users benefit from this much more than others.

Adaptive Learning Enabled by Total Compute

Crucially, very little data was used for each speaker – as little as 20 samples in total (two from each class) and these benefits were attained after ~100 training steps.

This means that for an always-on application such as speech recognition, the real-time model can run on an Arm Cortex-M CPU or Ethos NPU. The training can happen on an attached Arm Cortex-A CPU making use of its floating-point units to perform power-efficient training and optimization.

The seamless interaction and cooperation between Arm IP required for this is directly enabled by Arm’s Total Compute approach. This maximizes utilization of accelerator IP while providing the on-device training and optimization capabilities required to continually improve that solution.

There are many other approaches to on-device model adaptation and we are following up on several promising leads, so expect to hear more from us on this topic soon.

Whichever approach turns out to be the best, on-device learning is here to stay – and Arm Total Compute provides the flexibility and performance to implement that in whatever form it may take.

Learn more about Total Compute and Arm Research ML Lab. Please do reach out to me if you have any questions.

Contact Mark O'Connor 

Anonymous
Research Articles
  • Overcoming resistance

    Andrew Pickard
    Andrew Pickard
    Improving the characteristics of interconnects as device dimensions scale smaller.
    • September 22, 2022
  • Hands-on experience at Singapore Management University

    Andrew Pickard
    Andrew Pickard
    SMU has been working on the SAP Next-Gen student project, to develop innovative sustainability solutions using SAP software and real-world IoT devices from Arm's partner ecosystem.
    • May 30, 2022
  • Cryptography: what is under the mask?

    Andrew Pickard
    Andrew Pickard
    Sorbonne Université has been using Arm processor source code for modelling and verification on the hardware at the micro-architectural level.
    • May 26, 2022