In the previous post we looked at five features of Cortex-M processors. In this one, we will look at Cortex-M3 specifically.
The central Cortex-M3 core is based on the Harvard architecture which is characterized by separate buses for instructions and data. By being able to read both an instruction and data from memory at the same time, the Cortex-M3 processor performs operations in parallel, speeding application execution.
The core pipeline has 3 stages:
When a branch instruction is encountered, the decode stage also includes a speculative instruction fetch that could lead to faster execution. The processor fetches the branch destination instruction during the decode stage itself. Later, during the execute stage, the branch is resolved and it is known which instruction is to be executed next. If the branch is not to be taken, the next sequential instruction is already available. If the branch is to be taken, the branch instruction is made available at the same time as the decision is made, restricting idle time to just one cycle.
The Cortex-M3 core contains a decoder for traditional Thumb and new Thumb-2 instructions, an advanced ALU with support for hardware multiply and divide, control logic, and interfaces to the other components of the processor. The Cortex-M3 processor is a 32-bit processor, with a 32-bit wide data path, register bank and memory interface. There are 13 general-purpose registers, two stack pointers, a link register, a program counter and a number of special registers including a program status register.
The Cortex-M3 processor supports two operating modes, Thread and Handler and two levels of access for the code, privileged and unprivileged, enabling the implementation of complex and open systems without sacrificing the security of the application. Unprivileged code execution limits or excludes access to some resources like certain instructions and specific memory locations. The Thread mode is the typical operating mode and supports both privileged and unprivileged code. The Handler mode is entered when an exception occurs and all code is privileged during this mode.
The Cortex-M3 processor is a memory mapped system with a simple, fixed linear memory map of 4 gigabytes of addressable memory space with predefined, dedicated addresses for code (code space), SRAM(memory space), external memories/devices and internal/external peripherals.
The Cortex-M3 processor enables direct access to single bits of data in simple systems by implementing a technique called bit-banding. The memory map includes two 1MB bitband regions in the SRAM and peripheral space that map on to 32MB of alias regions. Load/store operations on an address in the alias region directly get translated to an operation on the bit aliased by that address. Writing to an address in the alias region with the least-significant bit set writes a 1 to the bit-band bit and writing with the least-significant bit cleared writes a 0 to the bit. Reading the aliased address directly returns the value in the appropriate bit-band bit. Additionally, this operation is atomic and cannot be interrupted by other bus activities.
The Cortex-M3 processor implements unaligned data access that enables unaligned data transfers in a single core access. When unaligned transfers are used, they are converted into multiple aligned transfers and remain transparent to application programmers. In addition the Cortex-M3 processor supports 32-bit multiply operations in a single cycle and also supports signed and unsigned divide operations with the SDIV and UDIV instructions that take between 2 and 12 cycles depending upon the size of the operands. The division operation is completed faster if the dividend and the divisor are closer in size. These improvements in the mathematical capabilities make the Cortex-M3 processor ideal for many numerically intensive applications such as sensor reading and scaling.