In my previous post (Condition Codes 1), I explained that some instructions can set some global condition codes, and that these codes can be used to conditionally execute code. I gave some examples of usage. One such example was an assembly implementation of C's if/else construct:
if/else
cmp r0, #20 bhi do_something_else do_something: @ This code runs if (r0 <= 20). b continue @ Prevent do_something_else from executing. do_something_else: @ This code runs if (r0 > 20). continue: @ Other code.
The example is valid, and will work on any ARM core. However, is this an efficient solution if you only need to execute one or two instructions in each case? Consider the following C code:
if (a > 10) { a = 10; } else { a = a + 1; }
It should be clear that the code increments a unless it has hit or exceeded a limit of 10, in which case it is set to 10. Mapping this onto our if/else example, this might be implemented in assembly as follows:
a
cmp r0, #10 blo r0_is_small r0_is_big: mov r0, #10 b continue r0_is_small: add r0, r0, #1 continue: @ Other code.
The above code executes one of two instructions, either the mov or the add. However, it uses two branch instructions to achieve this. Without branch prediction, these branches can take several cycles to execute. Even with branch prediction, the pattern may not be easily predicted. Finally, even with perfect branch prediction, each branch instruction takes four bytes of instruction memory, so code size may become a problem.
mov
add
One of the features of the ARM instruction set is that almost every instruction encoding includes a 4-bit field that represents a condition code. If the condition attached to an instruction passes, the instruction executes. Otherwise, it has no effect, as if you had used a nop instruction. Using this knowledge, we can implement the previous example more efficiently as follows:
nop
cmp r0, #10 movhs r0, #10 addlo r0, r0, #1
In the ARM instruction set, the condition code is encoded using a 4-bit field in the instruction. The encoding includes 3 bits to identify an operation, and a fourth bit to invert the condition. The eq condition, for example, is the exact opposite of the ne condition. It may interest authors of JIT compilers to know that the least significant bit of the condition code can be inverted to obtain the opposite condition code. For example, eq (equal) is encoded as '0000' and ne (not equal) as '0001'. This works for every condition code with the exception of the al (always) condition, encoded as '1110'. It would be wasteful to dedicate one sixteenth of the instruction set to instructions that can never execute. Instead, this portion of the instruction set is used for the few instructions which cannot be executed conditionally.
eq
ne
'0000'
'0001'
al
'1110'
Here are a few examples of instructions which will always execute unconditionally in the ARM instruction set:
blx <label>
blx <register>
vadd
pld
dmb
dsb
isb
As always, the ARMv7-AR Architecture Reference Manual contains the most complete and accurate information, as does the Instruction Set Quick Reference Card.
In the time when few processors had branch prediction and when code size was very constrained, conditional execution was an excellent way to save code space whilst also improving performance in many programs. This is still true for today's real-time processors and micro-controllers. However, ARM's application-class processors include branch predictors which often make the branch-based if/else construction more attractive than conditional instructions. A predicted branch may be very cheap, or even free in some cases. In addition, conditional execution can, in some cases, prevent out-of-order execution as it adds additional instruction stream dependencies.
In some cases, it can be difficult to know whether to use conditional execution or traditional conditional branches for a particular application. However, as a general rule-of-thumb, it's probably best to use conditional instructions for sequences of three instructions or fewer, and branches for longer sequences. The best-performing solution varies between processors as they have different pipeline and branch predictor designs, and it also varies depending on the specific instruction sequence you are using. Also note that the fastest solution is not necessarily the smallest.
In the original 16-bit Thumb instruction set, only branches could be conditional. In Thumb-2, the it instruction was added to provide functionality and behaviour similar to conditional instructions in ARM. Thumb-2's it instruction can also conditionally execute some instructions which are normally unconditionally executed in ARM state. I won't say more about it now, though it will be covered in detail in my next post in this series.
it