If Conversion Within .NET - Part 3

October 2, 2023

25 minute read time.

This is the final part of the blog series on If Conversion in .NET. In part 1 we walked through how .NET optimizes Boolean operations, then in part 2 went further into the details within the Lowering phase. This part walks through If Conversion phase, building on the knowledge we learnt previously. Finally, we will look at some additional variations. The examples continue to use the same example code from the first part.

If Conversion Within .NET - Part 1 If Conversion Within .NET - Part 2

If Conversion

In the previous posts we looked at reducing all the branches down to a chain of compares ending in a single compare and branch. With If conversion, we aim to remove this final compare and branch.

This is done using a CSEL (conditional select) Arm64 instruction. Consider:

CSEL x0, x1, x2, lt

This says if the condition flags for lt are set, then set x0 to x1, otherwise set x0 to x2. It is simply a MOV with a source conditionally selected. This can easily be mapped into an If(...) {x=a;} else {x=b;} statement. Alternatively, an If(...) {x=a;} statement can be modelled by reusing the destination register:

CSEL x0, x1, x0, lt

If Conversion phase

Going back to our original program, we now enable the optimization, if conversion:

export DOTNET_JitDump=TestAndOrWithElse
export DOTNET_TieredCompilation=0
unset DOTNET_JitDoIfConversion

After the optimize bools phase, the IR blocks look like:

------------ BB01 [000..00D) -> BB05 (cond), preds={} succs={BB04,BB05}
***** BB01
STMT00001 ( 0x009[E-] ... 0x00B )
N016 ( 32, 22) [000007] -----------                         *  JTRUE     void   $VN.Void
N015 ( 30, 20) [000029] -----------                         \--*  NE        int   
N013 ( 25, 17) [000027] -------N---                            +--*  AND       int   
N009 ( 18, 12) [000026] -----------                            |  +--*  EQ        int   
N007 ( 13,  9) [000024] -------N---                            |  |  +--*  AND       int   
N003 (  6,  4) [000002] N------N-U-                            |  |  |  +--*  GT        int   
N001 (  1,  1) [000000] -----------                            |  |  |  |  +--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----------                            |  |  |  |  \--*  CNS_INT   int    3 $43
N006 (  6,  4) [000018] -------N---                            |  |  |  \--*  EQ        int    $101
N004 (  1,  1) [000016] -----------                            |  |  |     +--*  LCL_VAR   int    V01 arg1         u:1 $81
N005 (  1,  2) [000017] -----------                            |  |  |     \--*  CNS_INT   int    10 $44
N008 (  1,  2) [000025] -----------                            |  |  \--*  CNS_INT   int    0
N012 (  6,  4) [000006] N------N-U-                            |  \--*  GT        int    $102
N010 (  1,  1) [000004] -----------                            |     +--*  LCL_VAR   int    V01 arg1         u:1 $81
N011 (  1,  2) [000005] -----------                            |     \--*  CNS_INT   int    4 $45
N014 (  1,  2) [000028] -----------                            \--*  CNS_INT   int    0
------------ BB04 [00D..013) -> BB06 (always), preds={BB01} succs={BB06}
***** BB04
STMT00005 ( 0x00D[E-] ... 0x00F )
N002 (  1,  3) [000015] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:3 $VN.Void
N001 (  1,  2) [000014] -----------                         \--*  CNS_INT   int    9 $47
------------ BB05 [013..017), preds={BB01} succs={BB06}
***** BB05
STMT00002 ( 0x013[E-] ... 0x015 )
N002 (  1,  3) [000009] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:4 $VN.Void
N001 (  1,  2) [000008] -----------                         \--*  CNS_INT   int    12 $46
------------ BB06 [017..01F) (return), preds={BB04,BB05} succs={}
***** BB06
STMT00007 ( ??? ... ??? )
N004 (  0,  0) [000021] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:2 $VN.Void
N003 (  0,  0) [000020] -----------                         \--*  PHI       int    $140
N001 (  0,  0) [000023] ----------- pred BB05                  +--*  PHI_ARG   int    V00 arg0         u:4 $46
N002 (  0,  0) [000022] ----------- pred BB04                  \--*  PHI_ARG   int    V00 arg0         u:3 $47
***** BB06
STMT00003 ( 0x017[E-] ... 0x01E )
N003 ( 16,  6) [000012] --CXG------                         *  CALL      void   CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void
N001 (  1,  1) [000010] ----------- arg0 in x0              +--*  LCL_VAR   int    V00 arg0         u:2 (last use) $140
N002 (  1,  1) [000011] ----------- arg1 in x1              \--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
***** BB06
STMT00004 ( 0x01E[E-] ... ??? )
N001 (  0,  0) [000013] -----------                         *  RETURN    void   $VN.Void

For the purposes of this optimization, we care about BB01 (the if statement) - mostly just the JTRUE and NE, plus BB04 (the then statement) and BB05 (the else statement).

Next, the If Conversion phase starts. Like optimize bools, it iterates through each block in turn.

Starting with BB01, it checks the last statement is a JTRUE block with a compare child. Then an If/then or If/then/else flow is checked for:

Example if/then flow:

BB01 (if statement): conditionally jumps to BB03 on failure, otherwise falls through to BB02.
BB02 (then statement): always falls through to BB03.
BB03 (rest of function)

Example if/then/else flow:

BB01 (if statement): conditionally jumps to BB03 on failure, otherwise falls through to BB02.
BB02 (then statement): always jumps to BB04.
BB03 (else statement): always falls through to BB04.
BB04 (rest of function)

The exact BB number here is irrelevant - an If could start with BB17. In addition, the then statement could be comprised of multiple sequential blocks, the same for the else block. This would happen if the code inside the then or else section comprised of more than one expression.

For our test case, an if/then/else flow is matched.

Next the contents of the Then and Else blocks are checked. Any NOP statements (those just containing a no operation (NOP) node) are ignored. Otherwise, the block must contain a single statement ending in a STORE_LCL_VAR (store local variable). In addition, there must be no side effects, it must be of integer type and cannot include PHI nodes.

In addition, if this is an if/then/else flow, then RETURN (return) nodes are allowed. In all if/then/else flows, the ASG in both sides must update the same variable or both must return.

This means the following patterns are allowed:

if (...) { x=...; }
if (...) { x=...; } else { x=...; }
if (...) { return ...; } else { return ...; }
x = (...) ? ... : ...
return (...) ? ... : ...

And the following are rejected:

if (...) { x=...; y=....; }
if (...) { x=...; } else { y=...; }
if (...) { return ...; }
if (...) { x=...; } else { return ...; }
if (...) { return ...; } else { x=...; }

In all the allowed cases, this maps down to a single CSEL instruction.

There may be cases where is optimal to use CSEL that are currently rejected. This is left as an exercise for the reader to implement without overly complicating RyuJIT.

Once the conditions have been checked, transformation can begin. To fix control flow the Then and Else conditions are moved outside of the loop into the first block. The three statements now in the first block (If, Then and Else) are combined into a single statement using a SELECT node. This node takes in three arguments and is similar to a CSEL instruction. The first argument is a condition that sets and checks flags. The second argument is the result to use if the condition passes. The third argument is the result to use if the condition fails. The parent of the CSEL is set to the root of the Then statement, and the root of the Else (which is identical to the root of the Then) is deleted.

In the example, two STORE_LCL_VARs are moved into the first block and the block now always falls through to BB06. Then the code creates a SELECT. The NE child of the JTRUE, the if condition, is added as the first child. The childless JTRUE is discarded and replaced with a NOP. The children of the two STORE_LCL_VARs become the other two children of the SELECT and the SELECT node becomes the child of the first STORE_LCL_VAR. The second STORE_LCL_VAR can be discarded. That is a lot of shuffling. Afterwards this looks like:

------------ BB01 [000..00D) -> BB05 (always), preds={} succs={BB05}
***** BB01
STMT00001 ( 0x009[E-] ... 0x00B )
N001 (  0,  0) [000007] -----------                         *  NOP       void  
***** BB01
STMT00005 ( 0x00D[E-] ... 0x00F )
N019 ( 33, 25) [000015] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:3 $VN.Void
N018 ( 33, 25) [000030] -----------                         \--*  SELECT    int   
N015 ( 30, 20) [000029] -----------                            +--*  NE        int   
N013 ( 25, 17) [000027] -------N---                            |  +--*  AND       int   
N009 ( 18, 12) [000026] -----------                            |  |  +--*  EQ        int   
N007 ( 13,  9) [000024] -------N---                            |  |  |  +--*  AND       int   
N003 (  6,  4) [000002] N------N-U-                            |  |  |  |  +--*  GT        int   
N001 (  1,  1) [000000] -----------                            |  |  |  |  |  +--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----------                            |  |  |  |  |  \--*  CNS_INT   int    3 $43
N006 (  6,  4) [000018] -------N---                            |  |  |  |  \--*  EQ        int    $101
N004 (  1,  1) [000016] -----------                            |  |  |  |     +--*  LCL_VAR   int    V01 arg1         u:1 $81
N005 (  1,  2) [000017] -----------                            |  |  |  |     \--*  CNS_INT   int    10 $44
N008 (  1,  2) [000025] -----------                            |  |  |  \--*  CNS_INT   int    0
N012 (  6,  4) [000006] N------N-U-                            |  |  \--*  GT        int    $102
N010 (  1,  1) [000004] -----------                            |  |     +--*  LCL_VAR   int    V01 arg1         u:1 $81
N011 (  1,  2) [000005] -----------                            |  |     \--*  CNS_INT   int    4 $45
N014 (  1,  2) [000028] -----------                            |  \--*  CNS_INT   int    0
N016 (  1,  2) [000008] -----------                            +--*  CNS_INT   int    12 $46
N017 (  1,  2) [000014] -----------                            \--*  CNS_INT   int    9 $47
***** BB01
STMT00002 ( 0x013[E-] ... 0x015 )
N001 (  0,  0) [000009] -----------                         *  NOP       void  
------------ BB04 [00D..013) -> BB06 (always), preds={} succs={BB06}
------------ BB05 [013..017), preds={BB01} succs={BB06}

BB04 and BB05 are now empty and no code flows into them. This will be spotted by the sanity checks that automatically run after every phase. The checks will first delete these empty blocks. Then they will spot that there is no requirement for BB06 to be a separate block (as only BB01 flows into it and does so unconditionally). The two blocks will be merged. This leaves a single block, split into six statements, for the entire function:

***** BB01
STMT00007 ( ??? ... ??? )
N004 (  0,  0) [000021] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:2 $VN.Void
N003 (  0,  0) [000020] -----------                         \--*  PHI       int    $140
N001 (  0,  0) [000023] ----------- pred BB05                  +--*  PHI_ARG   int    V00 arg0         u:4 $46
N002 (  0,  0) [000022] ----------- pred BB04                  \--*  PHI_ARG   int    V00 arg0         u:3 $47
***** BB01
STMT00001 ( 0x009[E-] ... 0x00B )
N001 (  0,  0) [000007] -----------                         *  NOP       void  
***** BB01
STMT00005 ( 0x00D[E-] ... 0x00F )
N019 ( 33, 25) [000015] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:3 $VN.Void
N018 ( 33, 25) [000030] -----------                         \--*  SELECT    int   
N015 ( 30, 20) [000029] -----------                            +--*  NE        int   
N013 ( 25, 17) [000027] -------N---                            |  +--*  AND       int   
N009 ( 18, 12) [000026] -----------                            |  |  +--*  EQ        int   
N007 ( 13,  9) [000024] -------N---                            |  |  |  +--*  AND       int   
N003 (  6,  4) [000002] N------N-U-                            |  |  |  |  +--*  GT        int   
N001 (  1,  1) [000000] -----------                            |  |  |  |  |  +--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----------                            |  |  |  |  |  \--*  CNS_INT   int    3 $43
N006 (  6,  4) [000018] -------N---                            |  |  |  |  \--*  EQ        int    $101
N004 (  1,  1) [000016] -----------                            |  |  |  |     +--*  LCL_VAR   int    V01 arg1         u:1 $81
N005 (  1,  2) [000017] -----------                            |  |  |  |     \--*  CNS_INT   int    10 $44
N008 (  1,  2) [000025] -----------                            |  |  |  \--*  CNS_INT   int    0
N012 (  6,  4) [000006] N------N-U-                            |  |  \--*  GT        int    $102
N010 (  1,  1) [000004] -----------                            |  |     +--*  LCL_VAR   int    V01 arg1         u:1 $81
N011 (  1,  2) [000005] -----------                            |  |     \--*  CNS_INT   int    4 $45
N014 (  1,  2) [000028] -----------                            |  \--*  CNS_INT   int    0
N016 (  1,  2) [000008] -----------                            +--*  CNS_INT   int    12 $46
N017 (  1,  2) [000014] -----------                            \--*  CNS_INT   int    9 $47
***** BB01
STMT00002 ( 0x013[E-] ... 0x015 )
N001 (  0,  0) [000009] -----------                         *  NOP       void  
***** BB01
STMT00003 ( 0x017[E-] ... 0x01E )
N003 ( 16,  6) [000012] --CXG------                         *  CALL      void   CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void
N001 (  1,  1) [000010] ----------- arg0 in x0              +--*  LCL_VAR   int    V00 arg0         u:2 (last use) $140
N002 (  1,  1) [000011] ----------- arg1 in x1              \--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
***** BB01
STMT00004 ( 0x01E[E-] ... ??? )
N001 (  0,  0) [000013] -----------                         *  RETURN    void   $VN.Void

Note how the PHIs from BB06 have been moved to the start of the single block. There is no branching control flow remaining, so these will be optimized away when the IR is rationalized to LIR.

The rationalized LIR looks as would be expected:

------------ BB01 [000..01F) (return), preds={} succs={}
               [000031] -----------                            IL_OFFSET void   INLRT @ 0x009[E-]
N001 (  0,  0) [000007] -----------                            NOP       void  
               [000032] -----------                            IL_OFFSET void   INLRT @ 0x00D[E-]
N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----------                    t1 =    CNS_INT   int    3 $43
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  6,  4) [000002] N------N-U-                    t2 = *  GT        int   
N004 (  1,  1) [000016] -----------                   t16 =    LCL_VAR   int    V01 arg1         u:1 $81
N005 (  1,  2) [000017] -----------                   t17 =    CNS_INT   int    10 $44
                                                            /--*  t16    int    
                                                            +--*  t17    int    
N006 (  6,  4) [000018] -------N---                   t18 = *  EQ        int    $101
                                                            /--*  t2     int    
                                                            +--*  t18    int    
N007 ( 13,  9) [000024] -------N---                   t24 = *  AND       int   
N008 (  1,  2) [000025] -----------                   t25 =    CNS_INT   int    0
                                                            /--*  t24    int    
                                                            +--*  t25    int    
N009 ( 18, 12) [000026] -----------                   t26 = *  EQ        int   
N010 (  1,  1) [000004] -----------                    t4 =    LCL_VAR   int    V01 arg1         u:1 $81
N011 (  1,  2) [000005] -----------                    t5 =    CNS_INT   int    4 $45
                                                            /--*  t4     int    
                                                            +--*  t5     int    
N012 (  6,  4) [000006] N------N-U-                    t6 = *  GT        int    $102
                                                            /--*  t26    int    
                                                            +--*  t6     int    
N013 ( 25, 17) [000027] -------N---                   t27 = *  AND       int   
N014 (  1,  2) [000028] -----------                   t28 =    CNS_INT   int    0
                                                            /--*  t27    int    
                                                            +--*  t28    int    
N015 ( 30, 20) [000029] -----------                   t29 = *  NE        int   
N016 (  1,  2) [000008] -----------                    t8 =    CNS_INT   int    12 $46
N017 (  1,  2) [000014] -----------                   t14 =    CNS_INT   int    9 $47
                                                            /--*  t29    int    
                                                            +--*  t8     int    
                                                            +--*  t14    int    
N018 ( 33, 25) [000030] -----------                   t30 = *  SELECT    int   
                                                            /--*  t30    int    
N019 ( 33, 25) [000015] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:3 $VN.Void
               [000033] -----------                            IL_OFFSET void   INLRT @ 0x013[E-]
N001 (  0,  0) [000009] -----------                            NOP       void  
               [000034] -----------                            IL_OFFSET void   INLRT @ 0x017[E-]
N001 (  1,  1) [000010] -----------                   t10 =    LCL_VAR   int    V00 arg0         u:2 (last use) $140
N002 (  1,  1) [000011] -----------                   t11 =    LCL_VAR   int    V01 arg1         u:1 (last use) $81
                                                            /--*  t10    int    arg0 in x0
                                                            +--*  t11    int    arg1 in x1
N003 ( 16,  6) [000012] --CXG------                         *  CALL      void   CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void
               [000035] -----------                            IL_OFFSET void   INLRT @ 0x01E[E-]
N001 (  0,  0) [000013] -----------                            RETURN    void   $VN.Void

During lowering the compare nodes are all lowered to a CMP/CCMP/SETCC chain of nodes as described previously. Note that the first input to the SELECT (the if condition) is now the SETCC which produces the result of the entire if statement.

N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -c---------                    t1 =    CNS_INT   int    3 $43
N004 (  1,  1) [000016] -----------                   t16 =    LCL_VAR   int    V01 arg1         u:1 $81
N005 (  1,  2) [000017] -c---------                   t17 =    CNS_INT   int    10 $44
N010 (  1,  1) [000004] -----------                    t4 =    LCL_VAR   int    V01 arg1         u:1 $81
N011 (  1,  2) [000005] -c---------                    t5 =    CNS_INT   int    4 $45
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  6,  4) [000002] -------N-U-                         *  CMP       void  
                                                            /--*  t16    int    
                                                            +--*  t17    int    
N006 (  6,  4) [000018] -------N---                         *  CCMP      void   cond=UGT flags=0
                                                            /--*  t4     int    
                                                            +--*  t5     int    
N012 (  6,  4) [000006] -------N-U-                         *  CCMP      void   cond=UNE flags=z
N013 ( 25, 17) [000027] -------N---                   t27 =    SETCC     int    cond=UGT
N016 (  1,  2) [000008] -----------                    t8 =    CNS_INT   int    12 $46
N017 (  1,  2) [000014] -----------                   t14 =    CNS_INT   int    9 $47
                                                            /--*  t27    int    
                                                            +--*  t8     int    
                                                            +--*  t14    int    
N018 ( 33, 25) [000030] -----------                   t30 = *  SELECT    int

When lowering gets to SELECT, it is lowered in a similar way to the CMP nodes. RyuJIT wants to convert the SELECT to a SELECTCC node. The SELECTCC node is like a SELECT node. However instead of taking in as input the result of a condition after being compared against flags, the input is just the result of the compare. The SELECTCC then holds the flags to check - just like the SETCC, JCC or CCMP nodes.

Skipping SELECT Lowering

Let us see what happens if the lowering on the SELECT is skipped.

export DOTNET_JitDump=TestAndOrWithElse
export DOTNET_TieredCompilation=0
export DOTNET_JitDoLowerSelect=0

Generating: N023 (  6,  4) [000006] -------N-U-                         *  CCMP      void   cond=UNE flags=z REG NA
IN0003:             ccmp    w1, #4, z, ne
Generating: N025 ( 25, 17) [000027] -------N---                   t27 =    SETCC     int    cond=UGT REG x0
IN0004:             cset    x0, hi
Generating: N027 (  1,  2) [000008] -----------                    t8 =    CNS_INT   int    12 REG x2 $46
IN0005:             mov     w2, #12
Generating: N029 (  1,  2) [000014] -----------                   t14 =    CNS_INT   int    9 REG x3 $47
IN0006:             mov     w3, #9
                                                                        /--*  t27    int    
                                                                        +--*  t8     int    
                                                                        +--*  t14    int    
Generating: N031 ( 33, 25) [000030] -----------                   t30 = *  SELECT    int    REG x0
IN0007:             cmp     w0, #0
IN0008:             csel    w0, w2, w3, ne

The full assembly:

G_M6318_IG01:
IN000e: 000000      stp     fp, lr, [sp, #-0x10]!
IN000f: 000004      mov     fp, sp
G_M6318_IG02:
IN0001: 000008      cmp     w0, #3
IN0002: 00000C      ccmp    w1, #10, 0, hi
IN0003: 000010      ccmp    w1, #4, z, ne
IN0004: 000014      cset    x0, hi
IN0005: 000018      mov     w2, #12
IN0006: 00001C      mov     w3, #9
IN0007: 000020      cmp     w0, #0
IN0008: 000024      csel    w0, w2, w3, ne
IN0009: 000028      movz    x2, #0x6EE0      // code for CSharpTutorials.Program:Consume[uint](uint,uint)
IN000a: 00002C      movk    x2, #0x65A9 LSL #16
IN000b: 000030      movk    x2, #0xFFFF LSL #32
IN000c: 000034      ldr     x2, [x2]
IN000d: 000038      blr     x2
G_M6318_IG03:
IN0010: 00003C      ldp     fp, lr, [sp], #0x10
IN0011: 000040      ret     lr

Looking at the generation we can see the result of the if goes into x0 during SETCC. The CSEL instruction checks the flags, so the result in x0 must go back into flags via an additional CMP of x0 and 0.

Lowering SELECT to SELECTCC

Looking back at the previous run, during lowering SELECT is lowered as follows. The then/else inputs to the SELECT are checked to make sure they can be moved prior to the compares. Also, the condition input must either be a SETCC or a compare node. If so, the lowering happens: the inputs are moved; the SELECT becomes a SELECTCC; the SETCC is removed, and its GT condition is put into the SELECTCC. This gives:

N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -c---------                    t1 =    CNS_INT   int    3 $43
N004 (  1,  1) [000016] -----------                   t16 =    LCL_VAR   int    V01 arg1         u:1 $81
N005 (  1,  2) [000017] -c---------                   t17 =    CNS_INT   int    10 $44
N010 (  1,  1) [000004] -----------                    t4 =    LCL_VAR   int    V01 arg1         u:1 $81
N011 (  1,  2) [000005] -c---------                    t5 =    CNS_INT   int    4 $45
N016 (  1,  2) [000008] -----------                    t8 =    CNS_INT   int    12 $46
N017 (  1,  2) [000014] -----------                   t14 =    CNS_INT   int    9 $47
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  6,  4) [000002] -------N-U-                         *  CMP       void  
                                                            /--*  t16    int    
                                                            +--*  t17    int    
N006 (  6,  4) [000018] -------N---                         *  CCMP      void   cond=UGT flags=0
                                                            /--*  t4     int    
                                                            +--*  t5     int    
N012 (  6,  4) [000006] -------N-U-                         *  CCMP      void   cond=UNE flags=z
                                                            /--*  t8     int    
                                                            +--*  t14    int    
N018 ( 33, 25) [000030] -----------                   t30 = *  SELECTCC  int    cond=UGT

This is generated as:

Generating: N027 (  6,  4) [000006] -------N-U-                         *  CCMP      void   cond=UNE flags=z REG NA
IN0005:             ccmp    w1, #4, z, ne
                                                                        /--*  t8     int    
                                                                        +--*  t14    int    
Generating: N029 ( 33, 25) [000030] -----------                   t30 = *  SELECTCC  int    cond=UGT REG x0
IN0006:             csel    w0, w2, w3, hi
                                                                        /--*  t30    int    
Generating: N031 ( 33, 25) [000015] DA---------                         *  STORE_LCL_VAR int    V00 arg0         d:3 x0 REG x0 $VN.Void
                            V00 in reg x0 is becoming live  [000015]
                            Live regs: 0002 {x1} => 0003 {x0 x1}
New debug range: new var or location
                Live vars after [000015]: {V01} => {V00 V01}

The full function is generated as follows:

G_M6318_IG01:
IN000c: 000000      stp     fp, lr, [sp, #-0x10]!
IN000d: 000004      mov     fp, sp
G_M6318_IG02:
IN0001: 000008      mov     w2, #12
IN0002: 00000C      mov     w3, #9
IN0003: 000010      cmp     w0, #3
IN0004: 000014      ccmp    w1, #10, 0, hi
IN0005: 000018      ccmp    w1, #4, z, ne
IN0006: 00001C      csel    w0, w2, w3, hi
IN0007: 000020      movz    x2, #0x6EE0      // code for CSharpTutorials.Program:Consume[uint](uint,uint)
IN0008: 000024      movk    x2, #0x35A9 LSL #16
IN0009: 000028      movk    x2, #0xFFFF LSL #32
IN000a: 00002C      ldr     x2, [x2]
IN000b: 000030      blr     x2
G_M6318_IG03:
IN000e: 000034      ldp     fp, lr, [sp], #0x10
IN000f: 000038      ret     lr

This code is now ideal. CMP, a sequence of compares, then CSEL.

Variations on CSEL

Arm64 provides some variations on the CSEL instruction.

CINC	Conditionally increment register	`CINC x0, x1, lt` - if `LT` flags are set, then `x0 = x1+1`
CSINC	Conditional select increment	`CINC x0, x1, x2 lt` - if `LT` flags are set, then `x0 = x1+1`, else `x0 = x2`	When the two input registers are the same, this is aliased to CINC.
CINV	Conditionally invert register	`CINV x0, x1, lt` - if `LT` flags are set, then `x0 = ~x1`
CSINV	Conditional select invert	`CSINV x0, x1, x2 lt` - if `LT` flags are set, then `x0 = ~x1`, else `x0 = x2`	When the two input registers are the same, this is aliased to CINV.
CNEG	Conditionally invert register	`CNEG x0, x1, lt` - if `LT` flags are set, then `x0 = NOT(x1)`
CSNEG	Conditional select negate	`CSNEG x0, x1, x2 lt` - if `LT` flags are set, then `x0 = NOT(x1)`, else `x0 = x2`.	When the two input registers are the same, this is aliased to CNEG

The CS instructions look strange at first glance. It is not immediately obvious why they are required. However, they fit into the standard 2 input, 1 output format that most Arm64 instructions use. The other three instructions are simply aliases of the CS instructions that are used when the input registers match. Note there is nothing different in the encoding of the instruction when the alias is used, it is purely a stylistic choice or readability.

RyuJIT will spot patterns that can use CINC, CINV and CNEG. This is performed during the lowering of SELECT.

Using CINC

Consider the following, also from the same example program:

[MethodImpl(MethodImplOptions.NoInlining)]
static void TestAndOrWithElse(uint op1, uint op2) {
    if (op1 > 3 && op2 == 10 || op2 <= 4) {
        op1 = 9;
    } else {
        op1 = 12;
    }
    Consume(op1, op2);
}

On entry to lowering, the LIR will be:

------------ BB01 [000..014) (return), preds={} succs={}
               [000017] -----------                            IL_OFFSET void   INLRT @ 0x000[E-]
N001 (  0,  0) [000003] -----------                            NOP       void  
               [000018] -----------                            IL_OFFSET void   INLRT @ 0x004[E-]
N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  2) [000001] -----------                    t1 =    CNS_INT   int    1 $41
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  3,  4) [000002] N------N-U-                    t2 = *  LE        int    $100
N004 (  1,  2) [000004] -----------                    t4 =    CNS_INT   int    8 $43
N005 (  1,  2) [000010] -----------                   t10 =    CNS_INT   int    7 $44
                                                            /--*  t2     int    
                                                            +--*  t4     int    
                                                            +--*  t10    int    
N006 (  6,  9) [000016] -----------                   t16 = *  SELECT    int   
                                                            /--*  t16    int    
N007 (  6,  9) [000011] DA---------                         *  STORE_LCL_VAR int    V01 arg1         d:4 $VN.Void
               [000019] -----------                            IL_OFFSET void   INLRT @ 0x009[E-]
N001 (  0,  0) [000005] -----------                            NOP       void  
               [000020] -----------                            IL_OFFSET void   INLRT @ 0x00C[E-]
N001 (  1,  1) [000006] -----------                    t6 =    LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  1) [000007] -----------                    t7 =    LCL_VAR   int    V01 arg1         u:2 (last use) $140
                                                            /--*  t6     int    arg0 in x0
                                                            +--*  t7     int    arg1 in x1
N003 ( 16,  6) [000008] --CXG------                         *  CALL      void   CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void
               [000021] -----------                            IL_OFFSET void   INLRT @ 0x013[E-]
N001 (  0,  0) [000009] -----------                            RETURN    void   $VN.Void

This, as we would expect, is lowered to a CMP and SELECTCC:

N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  2) [000001] -c---------                    t1 =    CNS_INT   int    1 $41
N004 (  1,  2) [000004] -----------                    t4 =    CNS_INT   int    8 $43
N005 (  1,  2) [000010] -----------                   t10 =    CNS_INT   int    7 $44
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  3,  4) [000002] -------N-U-                         *  CMP       void  
                                                            /--*  t4     int    
                                                            +--*  t10    int    
N006 (  6,  9) [000016] -----------                   t16 = *  SELECTCC  int    cond=ULE

After lowering to SELECTCC, the same function will check the inputs. If they are both constants and one of them is one bigger than the other, then it can be optimized further. If the first input is the higher one, then then the inputs are swapped, and the condition is inverted. Then the lower input is removed and the SELECT_CC is changed to the succinctly named SELECT_INCCC.

N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  2) [000001] -c---------                    t1 =    CNS_INT   int    1 $41
N005 (  1,  2) [000010] -----------                   t10 =    CNS_INT   int    7 $44
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  3,  4) [000002] -------N-U-                         *  CMP       void  
                                                            /--*  t10    int    
N006 (  6,  9) [000016] -----------                   t16 = *  SELECT_INCCC int    cond=ULE

This then generates exactly as expected:

G_M40289_IG01:
IN000a: 000000      stp     fp, lr, [sp, #-0x10]!
IN000b: 000004      mov     fp, sp
G_M40289_IG02:
IN0001: 000008      mov     w2, #7
IN0002: 00000C      cmp     w0, #1
IN0003: 000010      cinc    w2, w2, ls
IN0004: 000014      sxtw    w1, w2
IN0005: 000018      movz    x2, #0x6EE0      // code for CSharpTutorials.Program:Consume[uint](uint,uint)
IN0006: 00001C      movk    x2, #0x55A8 LSL #16
IN0007: 000020      movk    x2, #0xFFFF LSL #32
IN0008: 000024      ldr     x2, [x2]
IN0009: 000028      blr     x2
G_M40289_IG03:
IN000c: 00002C      ldp     fp, lr, [sp], #0x10
IN000d: 000030      ret     lr

Using CINV

The CINV transformation acts in a similar way but allows for slightly stranger patterns. First consider:

        [MethodImpl(MethodImplOptions.NoInlining)]
        static void TestInv(uint op1, uint op2) {
            if (op1 != 7) {
              op2 = ~op2;
            }
            Consume(op1, op2);
        }

The LIR will be:

------------ BB01 [000..010) (return), preds={} succs={}
               [000017] -----------                            IL_OFFSET void   INLRT @ 0x000[E-]
N001 (  0,  0) [000003] -----------                            NOP       void  
               [000018] -----------                            IL_OFFSET void   INLRT @ 0x004[E-]
N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  2) [000001] -----------                    t1 =    CNS_INT   int    7 $43
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  3,  4) [000002] J------N---                    t2 = *  EQ        int    $100
N004 (  1,  1) [000015] -----------                   t15 =    LCL_VAR   int    V01 arg1         
N005 (  1,  1) [000008] -----------                    t8 =    LCL_VAR   int    V01 arg1         u:1 (last use) $81
                                                            /--*  t8     int    
N006 (  2,  2) [000009] -----------                    t9 = *  NOT       int    $82
                                                            /--*  t2     int    
                                                            +--*  t15    int    
                                                            +--*  t9     int    
N007 (  7,  8) [000016] -----------                   t16 = *  SELECT    int   
                                                            /--*  t16    int    
N008 (  7,  8) [000010] DA---------                         *  STORE_LCL_VAR int    V01 arg1         d:3 $VN.Void
               [000019] -----------                            IL_OFFSET void   INLRT @ 0x008[E-]
N001 (  1,  1) [000004] -----------                    t4 =    LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  1) [000005] -----------                    t5 =    LCL_VAR   int    V01 arg1         u:2 (last use) $140
                                                            /--*  t4     int    arg0 in x0
                                                            +--*  t5     int    arg1 in x1
N003 ( 16,  6) [000006] --CXG------                         *  CALL      void   CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void
               [000020] -----------                            IL_OFFSET void   INLRT @ 0x00F[E-]
N001 (  0,  0) [000007] -----------                            RETURN    void   $VN.Void

This will follow the same pattern as the CINC optimization. After lowering the SELECT to SELECTCC, the same function will check the then and else statements. If one of them is a NOT statement, then this can be optimized. A SELECT_INVCC node is created in the expected way. Note that unlike SELECT_INCCC, the SELECT_INVCC has the same then and else inputs as the SELECT node - this distinction will become important later.

N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  2) [000001] -c---------                    t1 =    CNS_INT   int    7 $43
N004 (  1,  1) [000015] -----------                   t15 =    LCL_VAR   int    V01 arg1         
N005 (  1,  1) [000008] -----------                    t8 =    LCL_VAR   int    V01 arg1         u:1 (last use) $81
                                                            /--*  t0     int    
                                                            +--*  t1     int    
N003 (  3,  4) [000002] -------N---                         *  CMP       void  
                                                            /--*  t15    int    
                                                            +--*  t8     int    
N007 (  7,  8) [000016] -----------                   t16 = *  SELECT_INVCC int    cond=UEQ
                                                            /--*  t16    int    
N008 (  7,  8) [000010] DA---------                         *  STORE_LCL_VAR int    V01 arg1         d:3 $VN.Void

This is generated into:

G_M20564_IG01:
IN0008: 000000      stp     fp, lr, [sp, #-0x10]!
IN0009: 000004      mov     fp, sp
G_M20564_IG02:
IN0001: 000008      cmp     w0, #7
IN0002: 00000C      csinv   w1, w1, w1, eq
IN0003: 000010      movz    x2, #0x6EE0      // code for CSharpTutorials.Program:Consume[uint](uint,uint)
IN0004: 000014      movk    x2, #0x55A6 LSL #16
IN0005: 000018      movk    x2, #0xFFFF LSL #32
IN0006: 00001C      ldr     x2, [x2]
IN0007: 000020      blr     x2
G_M20564_IG03:
IN000a: 000024      ldp     fp, lr, [sp], #0x10
IN000b: 000028      ret     lr

Note that the disassembler prints CSINV instead of CINC. This is the same functionality, and it is optional for the disassembler to show it as CINC, although it would help debugging slightly.

Now consider:

[MethodImpl(MethodImplOptions.NoInlining)]
static void TestInv2(uint op1, uint op2) {
    if (op1 != 7) {
        op2 = ~op2;
    } else {
        op2 = 3;
    }
    Consume(op1, op2);
}

This will still get converted at lowering into SELECT_INVCC. The phase does not care that the then and else cases calculate different inputs, only one of them has a ~ operation. This fits the extended CSINV instruction exactly, and the function will generate into:

G_M15974_IG01:
IN0009: 000000      stp     fp, lr, [sp, #-0x10]!
IN000a: 000004      mov     fp, sp
G_M15974_IG02:
IN0001: 000008      mov     w2, #3
IN0002: 00000C      cmp     w0, #7
IN0003: 000010      csinv   w1, w2, w1, eq
IN0004: 000014      movz    x2, #0x6EE0      // code for CSharpTutorials.Program:Consume[uint](uint,uint)
IN0005: 000018      movk    x2, #0x31A6 LSL #16
IN0006: 00001C      movk    x2, #0xFFFF LSL #32
IN0007: 000020      ldr     x2, [x2]
IN0008: 000024      blr     x2
G_M15974_IG03:
IN000b: 000028      ldp     fp, lr, [sp], #0x10
IN000c: 00002C      ret     lr

Imagine a scenario where CSINV does not exists, but CINV does. To use CINV, RyuJIT would have to fully parse the then and else cases to confirm they are the same inputs.

Using CNEG

The CNEG optimization works almost identically to CINV, including the extending CSNEG variants, except it applies to subtraction instead of negation. These are left as an exercise for the reader to follow.

Closing Comments

That wraps up everything I wanted to walk through across the series. Over three parts we went through both the optimize bools and if conversion phases. We saw how they interact with each other to create a complete solution. We then looked at how they interact with other optimizations. Finally, we looked at a couple of further optimizations that build on top. While none of these optimizations offer large performance gains, they each build off each other and eventually add up. In addition, these will aid future optimizations, for example the simplification of the blocks simplifies loop optimizations.

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog