Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog A tour of the Cortex-M3 Core
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Processor
  • data
  • Address
  • Microcontroller (MCU)
  • Cortex-M3
  • Memory
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

A tour of the Cortex-M3 Core

Diya Soubra
Diya Soubra
November 4, 2013
3 minute read time.

In the previous post we looked at five features of Cortex-M processors. In this one, we will look at Cortex-M3 specifically.

The central Cortex-M3 core is based on the Harvard architecture which is characterized by separate buses for instructions and data. By being able to read both an instruction and data from memory at the same time, the Cortex-M3 processor performs operations in parallel, speeding application execution.

The core pipeline has 3 stages:

  • Instruction Fetch
  • Instruction Decode
  • Instruction Execute

When a branch instruction is encountered, the decode stage also includes a speculative instruction fetch that could lead to faster execution. The processor fetches the branch destination instruction during the decode stage itself. Later, during the execute stage, the branch is resolved and it is known which instruction is to be executed next. If the branch is not to be taken, the next sequential instruction is already available. If the branch is to be taken, the branch instruction is made available at the same time as the decision is made, restricting idle time to just one cycle.

The Cortex-M3 core contains a decoder for traditional Thumb and new Thumb-2 instructions, an advanced ALU with support for hardware multiply and divide, control logic, and interfaces to the other components of the processor. The Cortex-M3 processor is a 32-bit processor, with a 32-bit wide data path, register bank and memory interface. There are 13 general-purpose registers, two stack pointers, a link register, a program counter and a number of special registers including a program status register.

registers.PNG

The Cortex-M3 processor supports two operating modes, Thread and Handler and two levels of access for the code, privileged and unprivileged, enabling the implementation of complex and open systems without sacrificing the security of the application. Unprivileged code execution limits or excludes access to some resources like certain instructions and specific memory locations. The Thread mode is the typical operating mode and supports both privileged and unprivileged code. The Handler mode is entered when an exception occurs and all code is privileged during this mode.

handler-thread.PNG

The Cortex-M3 processor is a memory mapped system with a simple, fixed linear memory map of 4 gigabytes of addressable memory space with predefined, dedicated addresses for code (code space), SRAM(memory space), external memories/devices and internal/external peripherals.

memory map.PNG

The Cortex-M3 processor enables direct access to single bits of data in simple systems by implementing a technique called bit-banding. The memory map includes two 1MB bitband regions in the SRAM and peripheral space that map on to 32MB of alias regions. Load/store operations on an address in the alias region directly get translated to an operation on the bit aliased by that address. Writing to an address in the alias region with the least-significant bit set writes a 1 to the bit-band bit and writing with the least-significant bit cleared writes a 0 to the bit. Reading the aliased address directly returns the value in the appropriate bit-band bit. Additionally, this operation is atomic and cannot be interrupted by other bus activities.

bitbanding.PNG

The Cortex-M3 processor implements unaligned data access that enables unaligned data transfers in a single core access. When unaligned transfers are used, they are converted into multiple aligned transfers and remain transparent to application programmers. In addition the Cortex-M3 processor supports 32-bit multiply operations in a single cycle and also supports signed and unsigned divide operations with the SDIV and UDIV instructions that take between 2 and 12 cycles depending upon the size of the operands. The division operation is completed faster if the dividend and the divisor are closer in size. These improvements in the mathematical capabilities make the Cortex-M3 processor ideal for many numerically intensive applications such as sensor reading and scaling.

Anonymous
Architectures and Processors blog
  • Arm A-Profile Architecture developments 2025

    Martin Weidmann
    Martin Weidmann
    Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
    • October 2, 2025
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025