Condition Codes 2: Conditional Execution

In my previous post (Condition Codes 1), I explained that some instructions can set some global condition codes, and that these codes can be used to conditionally execute code. I gave some examples of usage. One such example was an assembly implementation of C's if/else construct:

  cmp     r0, #20
  bhi     do_something_else
do_something:
  @ This code runs if (r0 <= 20).
  b       continue    @ Prevent do_something_else from executing.
do_something_else:
  @ This code runs if (r0 > 20).
continue:
  @ Other code.

The example is valid, and will work on any ARM core. However, is this an efficient solution if you only need to execute one or two instructions in each case? Consider the following C code:

if (a > 10) {
  a = 10;
} else {
  a = a + 1;
}

It should be clear that the code increments a unless it has hit or exceeded a limit of 10, in which case it is set to 10. Mapping this onto our if/else example, this might be implemented in assembly as follows:

  cmp     r0, #10
  blo     r0_is_small
r0_is_big:
  mov     r0, #10
  b       continue
r0_is_small:
  add     r0, r0, #1
continue:
  @ Other code.

The above code executes one of two instructions, either the mov or the add. However, it uses two branch instructions to achieve this. Without branch prediction, these branches can take several cycles to execute. Even with branch prediction, the pattern may not be easily predicted. Finally, even with perfect branch prediction, each branch instruction takes four bytes of instruction memory, so code size may become a problem.

An Improved Example

One of the features of the ARM instruction set is that almost every instruction encoding includes a 4-bit field that represents a condition code. If the condition attached to an instruction passes, the instruction executes. Otherwise, it has no effect, as if you had used a nop instruction. Using this knowledge, we can implement the previous example more efficiently as follows:

  cmp     r0, #10
  movhs   r0, #10
  addlo   r0, r0, #1

Unconditionally-Executed Instructions

In the ARM instruction set, the condition code is encoded using a 4-bit field in the instruction. The encoding includes 3 bits to identify an operation, and a fourth bit to invert the condition. The eq condition, for example, is the exact opposite of the ne condition. It may interest authors of JIT compilers to know that the least significant bit of the condition code can be inverted to obtain the opposite condition code. For example, eq (equal) is encoded as '0000' and ne (not equal) as '0001'. This works for every condition code with the exception of the al (always) condition, encoded as '1110'. It would be wasteful to dedicate one sixteenth of the instruction set to instructions that can never execute. Instead, this portion of the instruction set is used for the few instructions which cannot be executed conditionally.

Here are a few examples of instructions which will always execute unconditionally in the ARM instruction set:

  • blx <label> cannot be conditionally executed, but blx <register> (and all other branch instructions) can.
  • Most NEON instructions. For example, SIMD (NEON) variants of vadd cannot be conditionally executed, though the scalar (VFP) variants can.
  • Hint instructions, such as pld (preload data).
  • Barriers, such as dmb (data memory barrier), dsb (data synchronization barrier), isb (instruction synchronization barrier).

As always, the ARMv7-AR Architecture Reference Manual contains the most complete and accurate information, as does the Instruction Set Quick Reference Card.

Conditional Execution and High-Performance Processors

In the time when few processors had branch prediction and when code size was very constrained, conditional execution was an excellent way to save code space whilst also improving performance in many programs. This is still true for today's real-time processors and micro-controllers. However, ARM's application-class processors include branch predictors which often make the branch-based if/else construction more attractive than conditional instructions. A predicted branch may be very cheap, or even free in some cases. In addition, conditional execution can, in some cases, prevent out-of-order execution as it adds additional instruction stream dependencies.

In some cases, it can be difficult to know whether to use conditional execution or traditional conditional branches for a particular application. However, as a general rule-of-thumb, it's probably best to use conditional instructions for sequences of three instructions or fewer, and branches for longer sequences. The best-performing solution varies between processors as they have different pipeline and branch predictor designs, and it also varies depending on the specific instruction sequence you are using. Also note that the fastest solution is not necessarily the smallest.

Thumb

In the original 16-bit Thumb instruction set, only branches could be conditional. In Thumb-2, the it instruction was added to provide functionality and behaviour similar to conditional instructions in ARM. Thumb-2's it instruction can also conditionally execute some instructions which are normally unconditionally executed in ARM state. I won't say more about it now, though it will be covered in detail in my next post in this series.

Anonymous