Collaboration Case Study: Machine Learning Hardware with Harvard University

November 18, 2019

7 minute read time.

Partnerships are important to us at Arm. We are an ecosystem company, which means that we strive to work together with partner companies for mutual success. This philosophy extends to Arm Research, where partnerships allow us to extend our reach further into the future of our industry. University engagements form an essential element of these partnerships and help us build up understanding of new technologies as they emerge on the “radar” in academia. The Arm ML Research Lab in Boston has several of these collaborations with local universities, including Boston University and Harvard University.

Figure 1: Annually published papers by topic. Source: Artificial Intelligence Index, 2018 annual report

I am fascinated by the sheer interest in Artificial Intelligence (AI) and Machine Learning (ML). For example, the oft-cited trends that are outlined in Figure 1 show the volume of research resources that are currently focused in this area. Publications on AI have increased seven-fold since 1996; while computer science in general has grown only five-fold during the same timeframe. It is a topic that is simply exploding in popularity and the area is moving quickly, with new advances published on a seemingly daily basis. In ML research, we track and actively contribute to many of the latest trends. I have previously blogged about a few of the active projects in Arm ML Research, including Augmented Reality, Hardware Transfer Learning, Alpha-Blending for learned quantization and TinyML.

However, we do not have infinite resources, and we cannot follow everything. Therefore, academic collaborations are so important. Collaborating with universities allows us to extend our reach and sound out topics and ideas that are much further out from current products and technologies. In addition to this, universities also commonly harbor specialist capabilities and expertise that are not available inside Arm. The relationship is similarly beneficial from the academic side. Close collaboration with industry helps guide research and teaching, as well as creating employment opportunities for graduates and helping to attract funding. In this blog, I’ll talk about our strategic partnership with Harvard University on the topic of hardware for machine learning.

Harvard University needs little introduction, being arguably one of the most famous universities in the world. I have had the pleasure of working and collaborating directly with several Harvard faculty, students, and post-docs over the last few years. Recently, we wrote a short text on computer architecture for machine learning. This year, we formally extended the relationship by sponsoring a three-year collaboration. This is in machine learning hardware between Harvard and Arm Research. This collaboration involves two faculties at Harvard: Prof. David Brooks and Prof. Gu-Yeon Wei. They are established experts in computer architecture and circuits. In addition to this, Prof. Alexander Rush (who was previously with Harvard and now at Cornell Tech) is a leader in natural language processing and machine learning. From the Arm side, we provide research support, feedback on industry requirements, access to IP and funding.

The initial focus for the collaboration is around a sci-fi inspired universal translator device. The goal for this device is to demonstrate ultra-low power technologies that allow speech recognition and translation tasks in a battery powered device. This is without relying on cloud-based computing. It is quite a different technology to currently deployed devices. This is because we often upload audio data to the cloud to do the “heavy-lifting” of neural speech recognition and translation. We want to avoid transmitting the audio to the cloud to alleviate security and privacy concerns with exposing this personal data. However, it is also a challenging goal because it requires the use of neural network inference in a heavily energy-constrained environment. To meet these goals, advances in ML theory are required, computer architecture and circuits. We want to demonstrate a real working system with manufactured chips. We are confident we have assembled the perfect collaboration with the Harvard team to meet this challenging project head on.

I am pleased to say we have already made strides towards this goal in several areas. This summer, I presented a paper at the prestigious Symposium on VLSI in Kyoto, Japan. The paper demonstrates a test chip for ML workloads from the Harvard collaboration. The chip that is described is also the first academic test chip with an Arm A-class CPU, this particular CPU is called an Arm Cortex-A53. It is exactly the type of processor that is most commonly found in a wide range of phones and IoT consumer devices. Therefore, it is a great platform on which to base our research studies.

In terms of technical details, the test chip was fabricated in a 16nm process technology. This is widely used for commercial products. Our work explores a range of so-called hardware “accelerators”, which are computing components that are uniquely specialized for performing ML tasks. These can be compared to a general-purpose CPU. Figure 2 shows a block-diagram of the components on the test chip. These include low power always-on subsystems. The industry standard Arm Cortex-A53 CPU cluster, cache-coherent datapath accelerators, and an embedded FPGA core. You can read more about this CPU in this paper. I also talked about this chip at the Arm Research Summit 2019; my slides can be viewed here.

Figure 2: This is a block diagram of a 16nm Harvard test chip. It demonstrates the various accelerator technologies operating in a System on Chip with industry-standard Arm CPUs and interconnects.

In fact, over the last few years, Harvard have successfully taped out a whole string of exciting test chips, all of which demonstrate new technology innovation in aggressive technology nodes with industry-standard Arm IP. For proof of this, Figure 3 shows die photos of a few of these chips.

Figure 3: A ‘rogues’ gallery’ of Harvard tape-outs from the last few years.

Building on this existing body of work, I am excited to see that the fruits of the collaboration continue to develop over the next few years, as we work towards the goal of a practical universal translator device. I am confident that this kind of technology will land in nearly all consumer electronics devices; ranging from thermostats to microwave ovens and cars. After all, speech is the most natural form of communication for humans. It makes sense to me that we must teach our electrical gadgets to understand us… Without using the cloud…

We are also grateful for the kind funding support on this project, including DARPA and an NSF grant written jointly by the collaborators.

Arm Research Collaboration and Enablement

Arm has a long-standing commitment to academia and research, with the goal of making technology accessible to all. Our dedicated Research Enablement and Collaboration (RCE) team provides access to IP and tools as well as establishing collaborations with academic institutions to enable innovative research around the world.

We have recently announced Arm Flexible Access for Research, through which the academic research community can access more Arm IP than ever before; giving academic researchers even more freedom to experiment and explore the possibilities available with real-world, commercially proven IP, such as the Cortex-A53. The program will be available in early 2020 – read this blog post for more information.

Explore Collaboration and Enablement Discover Arm Flexible Access for Research

References

¹ P. N. Whatmough et al., "A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to Cache-Coherent Accelerators" 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019 pp. C34-C35. doi: 10.23919/VLSIC.2019.8778002

² S. K. Lee, P. N. Whatmough, D. Brooks and G. Wei, "A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs," in IEEE Journal of Solid-State Circuits, vol. 54, no. 7, pp. 1982-1992, July 2019. doi: 10.1109/JSSC.2019.2913098.

³ N. Whatmough, S. K. Lee, D. Brooks and G. Wei, "DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications," in IEEE Journal of Solid-State Circuits, vol. 53, no. 9, pp. 2722-2731, Sept. 2018. doi: 10.1109/JSSC.2018.2841824.

0 comments
0 members are here

Research Articles

HOL4 users' workshop 2025

Hrutvik Kanabar

Tue 10th - Wed 11th June 2025. A workshop to bring together developers/users of the HOL4 interactive theorem prover.
- March 24, 2025
TinyML: Ubiquitous embedded intelligence

Becky Ellis

With Arm’s vast microprocessor ecosystem at its foundation, the world is entering a new era of Tiny ML. Professor Vijay Janapa Reddi walks us through this emerging field.
- November 28, 2024
To the edge and beyond

Becky Ellis

London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
- November 5, 2024

Research Articles

Collaboration Case Study: Machine Learning Hardware with Harvard University

Arm Research Collaboration and Enablement

References

HOL4 users' workshop 2025

TinyML: Ubiquitous embedded intelligence

To the edge and beyond