TVM 2020

December 2, 2020

3 minute read time.

The ApacheTVM community continues to grow. Last year the conference hosted roughly ~400 engineers and researchers. This year the conference is going virtual.

What is TVM? I thought I would start with this because this question has come to me a lot over the past year from across the industry – both from investors and developers. However, instead of getting into what it is, let us talk about the who.

TVM is a project built by an engineering-first community. TVM focuses on bridging the gap between academia and industry in a way that benefits both. It also focuses on bridging the gap between the world of many different frameworks and hardware. The community consists of many engineers and researchers from different institutions. Despite the wide variety of organizations, since they all build tools focused on ML, they have many shared problems. The community provides common ground to create solutions to these shared problems.

Arm is a significant contributor to TVM. Arm is present across the breadth of the ML space – we have the Cortex-A line in the full operating system space and the Cortex-M line of processors, focused on embedded systems. Both of these are extremely different when it comes to ML performance and development flows. That's not even mentioning other types of processors, including GPUs and NPUs. TVM works across all these systems with their varied requirements.

To learn more about TVM and how Arm is contributing to the TVM project, be sure to attend the Apache TVM and Deep Learning Compilation Conference from December 2^nd -4^th 2020. Ramana Radhakrishnan (Senior Principal software engineer at Arm) and Jem Davies (VP fellow and GM ML at Arm) is presenting at the conference. Arm AI ecosystem partner, OctoML, is also a significant contributor and will be presenting . This is an excellent opportunity to not only learn more about TVM in general, but how companies like Arm, OctoML, AWS, Microsoft, Alibaba, and more are using TVM in real world AI solutions.

What does it mean in code?

To summarize in as much as I can as someone who loathes getting close to compilers, TVM is a compiling tool for ML workloads. This is an alternative to an interpreted framework, like Tensorflow or PyTorch. Even on embedded devices, TensorflowLiteµ is interpreted. This means it needs a translator to tell the code what to do with trained models. TVM is built as a compiled framework first. The highlight is AutoTVM, which is the auto-magically way to compile new code and models. Right now, TVM is not hitting the performance metrics of human-written kernels (you still can keep your job, kernel engineers), but these are used in it is stack. For more details, see this blog from OctoML.

However, it is perfromant without human-written kernels. As someone who has never written a kernel and who never wants to, this is great news. What if I want to build a custom model with a custom layer on a piece of hardware that is has no kernel support but has TVM support? Cool. I can. It is no longer a "it is not supported" answer. Functional but not optimal is better than "in-progress" in many situations, including a marathon. And as we all know in the world of ML, as-good-as-human is not far away.

What does this mean for interpreted frameworks?

We shall see. Probably nothing. They have been around a long time and I do not see them going anywhere. Also, in terms of fast iteration, interpreted is great. If I want to train and test a model's accuracy, I am going to do it interpreted before I compile to a specific device.

Why should you care?

Whether it is a monumental shift in the way ML inference is done or it is just the new thing that has the efforts of a significant portion of the industry working on it, it is worth checking out. How? Go to the conference. It started yesterday, it is free, and it is virtual, as all things in life should be. I will see you there (not really, but as much as I see anyone nowadays).

0 comments
0 members are here

AI blog

Coaching AI coding agents: A guide for senior engineers

Alex Spinelli

Learn how senior engineers can coach AI coding agents to design, debug, and deliver high-quality code in immersive dev environments.
- June 30, 2025
Optimize Llama.cpp with Arm I8MM instruction

Yibo Cai

Boosted Llama.cpp Q6\_K & Q4\_K inference using Arm's I8MM (smmla) for faster, efficient int8 matrix multiplies on Neoverse-N2 CPUs.
- June 27, 2025
Build AI responsibly with the Yellow Teaming methodology and LLM assistant

Zach Lasiuk

Yellow Teaming helps developers build responsible AI by aligning products with long-term value, not just short-term success.
- June 6, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

TVM 2020

What does it mean in code?

What does this mean for interpreted frameworks?

Why should you care?

Coaching AI coding agents: A guide for senior engineers

Optimize Llama.cpp with Arm I8MM instruction

Build AI responsibly with the Yellow Teaming methodology and LLM assistant