Code Size – a comprehensive comparison of microMIPS32 and Thumb code size using many Megabytes of customer code

April 28, 2014

4 minute read time.

In this blog I take a close look at the code size of ARM’s Thumb instruction set against the microMIPS32 instruction set, as used in the microAptiv processor family. More specifically I look at recent claims that microMIPS32 has 17%-30% better code size than ARM-based MCUs (as mentioned here: MIPS MCUs Outrun ARM). ARM’s findings show this recent data to be inaccurate and that on average the code size of the Thumb instruction set is 23.5% smaller than microMIPS32 when measured on a large database of source code collected at ARM over the last 20 years.

Background

From the early days of ARM7TDMI through the development of ARM9 and ARM11 processors, right up to the current Cortex-A, Cortex-R and Cortex-M families of ARM processors, ARM has designed its ARM and Thumb instruction sets to deliver the high performance of a 32-bit architecture with code size smaller than that of 8- and 16-bit architectures. ARM has always strived to target that sweet spot of 32-bit performance at 16-bit code density by using an optimized mix of 32-bit and 16-bit instructions in the “Thumb-2” instruction set. The Thumb-2 instruction set was first introduced in the ARM1156 in 2003, and subsequently used in the Cortex family of processors. High performance from compact code is built right into ARM’s DNA.

How does one measure code size?

When measuring the size of code running in a real system, there are a number of factors which influence the amount of memory used. This includes the quality of the compiler and linker, the exact separation of read-only code, read-only data, read-write data and the way in which these are mapped into real physical memory. The measurements also need to be made over a large enough collection of real software in order to have confidence that your measurement methodology allows a customer to judge the likely size of their code in their target system.

Regarding the aforementioned inaccurate data that the MIPS microAptiv CPU can achieve 17% (and even 30% in some cases) better code density than Cortex-M3 and Cortex-M4, there are several unknown variables. For example, the size of the software base used for this claim is unknown, but ARM determined it best to test this claim on a large body of source code.

Over the last 20 years or so, ARM has accumulated many megabytes of test source code, with a substantial amount of that code coming from ARM customers who have worked with us to optimize the delicate trade-off between code size and performance. In fact in the early days of portable consumer devices a reduction in code size of just 1 or 2% could be the difference between whether or not your product included that new killer feature. As a result our CPU product development and compiler teams have lived and breathed in this environment for many years. During the development of new CPU products and new versions of the compiler, the substantial database of test code is regularly used to check that ARM has the correct trade-off between performance and code size. Below I present ARM’s findings from that source code database.

Test methodology

In order to give a fair comparison between microMIPS32 and Thumb (as used in Cortex-M3 and M4), ARM downloaded the latest development tools for microAptiv processors from the Mentor Embedded website to ensure use of the latest compiler targeting the microMIPS instruction set – this is Sourcery Codebench 2013.11-36, released on 12th December 2013. For the Cortex-M3 and Cortex-M4, the ARM C compiler version 5.02.28 was used.

Since the ARM C compiler has a C run-time library which is better tuned to only including the minimum necessary set of run-time functions, we have excluded the run-time library from our measurements – if we had included the run-time library, then the ARM results would be even better than those shown. However, we wanted to make a fair comparison of object code size. As such, we have also chosen not to use the ARM linker’s data compression feature which would have further improved the ARM results.

The source code which has been used for these tests comes from a variety of application areas and totals some 125Mbytes of source, compiling to tens of Mbytes of object code. ARM sees this as a significant sample size. Much of this code comes from ARM’s customers in telecoms, automotive, storage and industrial control markets –the names of these customers are represented with Customer 1, Customer 2 in order to maintain their confidentiality. In addition ARM has built binaries of application code such as the ARM C compiler itself, ARM gcc, gnuchess, gzip etc.

Summary of results

ARM’s results show that on this large sample of source code, the microMIPS32 object code was on average (taking the geometric mean) 23.5% larger than the same code compiled for Cortex-M3.

The results ranged from 1.8% larger (for “Customer 1”) to 57% larger (for “Customer 28”). I should also note that the code size for Cortex-M3 was significantly smaller for large object code samples (such as armcc at 405kB, showing 30% smaller code than microMIPS32) right down to small object code (such as gzip at 26kB, showing 32% smaller code than microMIPS32).

daith over 9 years ago

That's a huge difference in object code size - and even more compared to what they say! It would be interesting to find out how they came to their figures and where the main comparative savings came from - and with that size difference in general why Customer 1's code 'only' had a 1.8% difference.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Ker Liu

In-depth analysis of what the PMU of L2D_CACHE_WR counts on the Neoverse N2 server.
- April 15, 2024
Arm SPE: SoC Telemetry & Performance Analysis using Statistical Profiling Extension

Brian Jeff

We refer to the SPE performance methodology whitepaper published by Arm for details on the content of this blog.
- December 8, 2023
Implementing the WebAssembly bitmask operations on the 64-bit Arm architecture

Anton Kirilov

We discuss some of the challenges that we face when we are trying to implement the WebAssembly SIMD bitmask operations on the 64-bit Arm architecture.
- December 6, 2023

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Code Size – a comprehensive comparison of microMIPS32 and Thumb code size using many Megabytes of customer code

Background

How does one measure code size?

Test methodology

Summary of results

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Arm SPE: SoC Telemetry & Performance Analysis using Statistical Profiling Extension

Implementing the WebAssembly bitmask operations on the 64-bit Arm architecture