This is the final part of the blog series on If Conversion in .NET. In part 1 we walked through how .NET optimizes Boolean operations, then in part 2 went further into the details within the Lowering phase. This part walks through If Conversion phase, building on the knowledge we learnt previously. Finally, we will look at some additional variations. The examples continue to use the same example code from the first part.
If Conversion Within .NET - Part 1 If Conversion Within .NET - Part 2
In the previous posts we looked at reducing all the branches down to a chain of compares ending in a single compare and branch. With If conversion, we aim to remove this final compare and branch.
This is done using a CSEL (conditional select) Arm64 instruction. Consider:
CSEL
CSEL x0, x1, x2, lt
This says if the condition flags for lt are set, then set x0 to x1, otherwise set x0 to x2. It is simply a MOV with a source conditionally selected. This can easily be mapped into an If(...) {x=a;} else {x=b;} statement. Alternatively, an If(...) {x=a;} statement can be modelled by reusing the destination register:
x0
x1
x2
MOV
If(...) {x=a;} else {x=b;}
If(...) {x=a;}
CSEL x0, x1, x0, lt
Going back to our original program, we now enable the optimization, if conversion:
export DOTNET_JitDump=TestAndOrWithElse export DOTNET_TieredCompilation=0 unset DOTNET_JitDoIfConversion
After the optimize bools phase, the IR blocks look like:
------------ BB01 [000..00D) -> BB05 (cond), preds={} succs={BB04,BB05} ***** BB01 STMT00001 ( 0x009[E-] ... 0x00B ) N016 ( 32, 22) [000007] ----------- * JTRUE void $VN.Void N015 ( 30, 20) [000029] ----------- \--* NE int N013 ( 25, 17) [000027] -------N--- +--* AND int N009 ( 18, 12) [000026] ----------- | +--* EQ int N007 ( 13, 9) [000024] -------N--- | | +--* AND int N003 ( 6, 4) [000002] N------N-U- | | | +--* GT int N001 ( 1, 1) [000000] ----------- | | | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- | | | | \--* CNS_INT int 3 $43 N006 ( 6, 4) [000018] -------N--- | | | \--* EQ int $101 N004 ( 1, 1) [000016] ----------- | | | +--* LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- | | | \--* CNS_INT int 10 $44 N008 ( 1, 2) [000025] ----------- | | \--* CNS_INT int 0 N012 ( 6, 4) [000006] N------N-U- | \--* GT int $102 N010 ( 1, 1) [000004] ----------- | +--* LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- | \--* CNS_INT int 4 $45 N014 ( 1, 2) [000028] ----------- \--* CNS_INT int 0 ------------ BB04 [00D..013) -> BB06 (always), preds={BB01} succs={BB06} ***** BB04 STMT00005 ( 0x00D[E-] ... 0x00F ) N002 ( 1, 3) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void N001 ( 1, 2) [000014] ----------- \--* CNS_INT int 9 $47 ------------ BB05 [013..017), preds={BB01} succs={BB06} ***** BB05 STMT00002 ( 0x013[E-] ... 0x015 ) N002 ( 1, 3) [000009] DA--------- * STORE_LCL_VAR int V00 arg0 d:4 $VN.Void N001 ( 1, 2) [000008] ----------- \--* CNS_INT int 12 $46 ------------ BB06 [017..01F) (return), preds={BB04,BB05} succs={} ***** BB06 STMT00007 ( ??? ... ??? ) N004 ( 0, 0) [000021] DA--------- * STORE_LCL_VAR int V00 arg0 d:2 $VN.Void N003 ( 0, 0) [000020] ----------- \--* PHI int $140 N001 ( 0, 0) [000023] ----------- pred BB05 +--* PHI_ARG int V00 arg0 u:4 $46 N002 ( 0, 0) [000022] ----------- pred BB04 \--* PHI_ARG int V00 arg0 u:3 $47 ***** BB06 STMT00003 ( 0x017[E-] ... 0x01E ) N003 ( 16, 6) [000012] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void N001 ( 1, 1) [000010] ----------- arg0 in x0 +--* LCL_VAR int V00 arg0 u:2 (last use) $140 N002 ( 1, 1) [000011] ----------- arg1 in x1 \--* LCL_VAR int V01 arg1 u:1 (last use) $81 ***** BB06 STMT00004 ( 0x01E[E-] ... ??? ) N001 ( 0, 0) [000013] ----------- * RETURN void $VN.Void
For the purposes of this optimization, we care about BB01 (the if statement) - mostly just the JTRUE and NE, plus BB04 (the then statement) and BB05 (the else statement).
BB01
if
JTRUE
NE
BB04
then
BB05
else
Next, the If Conversion phase starts. Like optimize bools, it iterates through each block in turn.
Starting with BB01, it checks the last statement is a JTRUE block with a compare child. Then an If/then or If/then/else flow is checked for:
If/then
If/then/else
Example if/then flow:
if/then
BB03
BB02.
BB02
BB03.
Example if/then/else flow:
if/then/else
BB04.
The exact BB number here is irrelevant - an If could start with BB17. In addition, the then statement could be comprised of multiple sequential blocks, the same for the else block. This would happen if the code inside the then or else section comprised of more than one expression.
If
BB17
For our test case, an if/then/else flow is matched.
Next the contents of the Then and Else blocks are checked. Any NOP statements (those just containing a no operation (NOP) node) are ignored. Otherwise, the block must contain a single statement ending in a STORE_LCL_VAR (store local variable). In addition, there must be no side effects, it must be of integer type and cannot include PHI nodes.
Then
Else
NOP
STORE_LCL_VAR
PHI
In addition, if this is an if/then/else flow, then RETURN (return) nodes are allowed. In all if/then/else flows, the ASG in both sides must update the same variable or both must return.
RETURN
ASG
This means the following patterns are allowed:
if (...) { x=...; } if (...) { x=...; } else { x=...; } if (...) { return ...; } else { return ...; } x = (...) ? ... : ... return (...) ? ... : ...
if (...) { x=...; }
if (...) { x=...; } else { x=...; }
if (...) { return ...; } else { return ...; }
x = (...) ? ... : ...
return (...) ? ... : ...
And the following are rejected:
if (...) { x=...; y=....; } if (...) { x=...; } else { y=...; } if (...) { return ...; } if (...) { x=...; } else { return ...; } if (...) { return ...; } else { x=...; }
if (...) { x=...; y=....; }
if (...) { x=...; } else { y=...; }
if (...) { return ...; }
if (...) { x=...; } else { return ...; }
if (...) { return ...; } else { x=...; }
In all the allowed cases, this maps down to a single CSEL instruction.
There may be cases where is optimal to use CSEL that are currently rejected. This is left as an exercise for the reader to implement without overly complicating RyuJIT.
Once the conditions have been checked, transformation can begin. To fix control flow the Then and Else conditions are moved outside of the loop into the first block. The three statements now in the first block (If, Then and Else) are combined into a single statement using a SELECT node. This node takes in three arguments and is similar to a CSEL instruction. The first argument is a condition that sets and checks flags. The second argument is the result to use if the condition passes. The third argument is the result to use if the condition fails. The parent of the CSEL is set to the root of the Then statement, and the root of the Else (which is identical to the root of the Then) is deleted.
SELECT
In the example, two STORE_LCL_VARs are moved into the first block and the block now always falls through to BB06. Then the code creates a SELECT. The NE child of the JTRUE, the if condition, is added as the first child. The childless JTRUE is discarded and replaced with a NOP. The children of the two STORE_LCL_VARs become the other two children of the SELECT and the SELECT node becomes the child of the first STORE_LCL_VAR. The second STORE_LCL_VAR can be discarded. That is a lot of shuffling. Afterwards this looks like:
BB06
------------ BB01 [000..00D) -> BB05 (always), preds={} succs={BB05} ***** BB01 STMT00001 ( 0x009[E-] ... 0x00B ) N001 ( 0, 0) [000007] ----------- * NOP void ***** BB01 STMT00005 ( 0x00D[E-] ... 0x00F ) N019 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void N018 ( 33, 25) [000030] ----------- \--* SELECT int N015 ( 30, 20) [000029] ----------- +--* NE int N013 ( 25, 17) [000027] -------N--- | +--* AND int N009 ( 18, 12) [000026] ----------- | | +--* EQ int N007 ( 13, 9) [000024] -------N--- | | | +--* AND int N003 ( 6, 4) [000002] N------N-U- | | | | +--* GT int N001 ( 1, 1) [000000] ----------- | | | | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- | | | | | \--* CNS_INT int 3 $43 N006 ( 6, 4) [000018] -------N--- | | | | \--* EQ int $101 N004 ( 1, 1) [000016] ----------- | | | | +--* LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- | | | | \--* CNS_INT int 10 $44 N008 ( 1, 2) [000025] ----------- | | | \--* CNS_INT int 0 N012 ( 6, 4) [000006] N------N-U- | | \--* GT int $102 N010 ( 1, 1) [000004] ----------- | | +--* LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- | | \--* CNS_INT int 4 $45 N014 ( 1, 2) [000028] ----------- | \--* CNS_INT int 0 N016 ( 1, 2) [000008] ----------- +--* CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- \--* CNS_INT int 9 $47 ***** BB01 STMT00002 ( 0x013[E-] ... 0x015 ) N001 ( 0, 0) [000009] ----------- * NOP void ------------ BB04 [00D..013) -> BB06 (always), preds={} succs={BB06} ------------ BB05 [013..017), preds={BB01} succs={BB06}
BB04 and BB05 are now empty and no code flows into them. This will be spotted by the sanity checks that automatically run after every phase. The checks will first delete these empty blocks. Then they will spot that there is no requirement for BB06 to be a separate block (as only BB01 flows into it and does so unconditionally). The two blocks will be merged. This leaves a single block, split into six statements, for the entire function:
***** BB01 STMT00007 ( ??? ... ??? ) N004 ( 0, 0) [000021] DA--------- * STORE_LCL_VAR int V00 arg0 d:2 $VN.Void N003 ( 0, 0) [000020] ----------- \--* PHI int $140 N001 ( 0, 0) [000023] ----------- pred BB05 +--* PHI_ARG int V00 arg0 u:4 $46 N002 ( 0, 0) [000022] ----------- pred BB04 \--* PHI_ARG int V00 arg0 u:3 $47 ***** BB01 STMT00001 ( 0x009[E-] ... 0x00B ) N001 ( 0, 0) [000007] ----------- * NOP void ***** BB01 STMT00005 ( 0x00D[E-] ... 0x00F ) N019 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void N018 ( 33, 25) [000030] ----------- \--* SELECT int N015 ( 30, 20) [000029] ----------- +--* NE int N013 ( 25, 17) [000027] -------N--- | +--* AND int N009 ( 18, 12) [000026] ----------- | | +--* EQ int N007 ( 13, 9) [000024] -------N--- | | | +--* AND int N003 ( 6, 4) [000002] N------N-U- | | | | +--* GT int N001 ( 1, 1) [000000] ----------- | | | | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- | | | | | \--* CNS_INT int 3 $43 N006 ( 6, 4) [000018] -------N--- | | | | \--* EQ int $101 N004 ( 1, 1) [000016] ----------- | | | | +--* LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- | | | | \--* CNS_INT int 10 $44 N008 ( 1, 2) [000025] ----------- | | | \--* CNS_INT int 0 N012 ( 6, 4) [000006] N------N-U- | | \--* GT int $102 N010 ( 1, 1) [000004] ----------- | | +--* LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- | | \--* CNS_INT int 4 $45 N014 ( 1, 2) [000028] ----------- | \--* CNS_INT int 0 N016 ( 1, 2) [000008] ----------- +--* CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- \--* CNS_INT int 9 $47 ***** BB01 STMT00002 ( 0x013[E-] ... 0x015 ) N001 ( 0, 0) [000009] ----------- * NOP void ***** BB01 STMT00003 ( 0x017[E-] ... 0x01E ) N003 ( 16, 6) [000012] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void N001 ( 1, 1) [000010] ----------- arg0 in x0 +--* LCL_VAR int V00 arg0 u:2 (last use) $140 N002 ( 1, 1) [000011] ----------- arg1 in x1 \--* LCL_VAR int V01 arg1 u:1 (last use) $81 ***** BB01 STMT00004 ( 0x01E[E-] ... ??? ) N001 ( 0, 0) [000013] ----------- * RETURN void $VN.Void
Note how the PHIs from BB06 have been moved to the start of the single block. There is no branching control flow remaining, so these will be optimized away when the IR is rationalized to LIR.
The rationalized LIR looks as would be expected:
------------ BB01 [000..01F) (return), preds={} succs={} [000031] ----------- IL_OFFSET void INLRT @ 0x009[E-] N001 ( 0, 0) [000007] ----------- NOP void [000032] ----------- IL_OFFSET void INLRT @ 0x00D[E-] N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- t1 = CNS_INT int 3 $43 /--* t0 int +--* t1 int N003 ( 6, 4) [000002] N------N-U- t2 = * GT int N004 ( 1, 1) [000016] ----------- t16 = LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- t17 = CNS_INT int 10 $44 /--* t16 int +--* t17 int N006 ( 6, 4) [000018] -------N--- t18 = * EQ int $101 /--* t2 int +--* t18 int N007 ( 13, 9) [000024] -------N--- t24 = * AND int N008 ( 1, 2) [000025] ----------- t25 = CNS_INT int 0 /--* t24 int +--* t25 int N009 ( 18, 12) [000026] ----------- t26 = * EQ int N010 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- t5 = CNS_INT int 4 $45 /--* t4 int +--* t5 int N012 ( 6, 4) [000006] N------N-U- t6 = * GT int $102 /--* t26 int +--* t6 int N013 ( 25, 17) [000027] -------N--- t27 = * AND int N014 ( 1, 2) [000028] ----------- t28 = CNS_INT int 0 /--* t27 int +--* t28 int N015 ( 30, 20) [000029] ----------- t29 = * NE int N016 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 $47 /--* t29 int +--* t8 int +--* t14 int N018 ( 33, 25) [000030] ----------- t30 = * SELECT int /--* t30 int N019 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void [000033] ----------- IL_OFFSET void INLRT @ 0x013[E-] N001 ( 0, 0) [000009] ----------- NOP void [000034] ----------- IL_OFFSET void INLRT @ 0x017[E-] N001 ( 1, 1) [000010] ----------- t10 = LCL_VAR int V00 arg0 u:2 (last use) $140 N002 ( 1, 1) [000011] ----------- t11 = LCL_VAR int V01 arg1 u:1 (last use) $81 /--* t10 int arg0 in x0 +--* t11 int arg1 in x1 N003 ( 16, 6) [000012] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void [000035] ----------- IL_OFFSET void INLRT @ 0x01E[E-] N001 ( 0, 0) [000013] ----------- RETURN void $VN.Void
During lowering the compare nodes are all lowered to a CMP/CCMP/SETCC chain of nodes as described previously. Note that the first input to the SELECT (the if condition) is now the SETCC which produces the result of the entire if statement.
CMP/CCMP/SETCC
SETCC
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 3 $43 N004 ( 1, 1) [000016] ----------- t16 = LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] -c--------- t17 = CNS_INT int 10 $44 N010 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] -c--------- t5 = CNS_INT int 4 $45 /--* t0 int +--* t1 int N003 ( 6, 4) [000002] -------N-U- * CMP void /--* t16 int +--* t17 int N006 ( 6, 4) [000018] -------N--- * CCMP void cond=UGT flags=0 /--* t4 int +--* t5 int N012 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z N013 ( 25, 17) [000027] -------N--- t27 = SETCC int cond=UGT N016 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 $47 /--* t27 int +--* t8 int +--* t14 int N018 ( 33, 25) [000030] ----------- t30 = * SELECT int
When lowering gets to SELECT, it is lowered in a similar way to the CMP nodes. RyuJIT wants to convert the SELECT to a SELECTCC node. The SELECTCC node is like a SELECT node. However instead of taking in as input the result of a condition after being compared against flags, the input is just the result of the compare. The SELECTCC then holds the flags to check - just like the SETCC, JCC or CCMP nodes.
SELECTCC
SETCC, JCC
CCMP
Let us see what happens if the lowering on the SELECT is skipped.
export DOTNET_JitDump=TestAndOrWithElse export DOTNET_TieredCompilation=0 export DOTNET_JitDoLowerSelect=0
Generating: N023 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z REG NA IN0003: ccmp w1, #4, z, ne Generating: N025 ( 25, 17) [000027] -------N--- t27 = SETCC int cond=UGT REG x0 IN0004: cset x0, hi Generating: N027 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 REG x2 $46 IN0005: mov w2, #12 Generating: N029 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 REG x3 $47 IN0006: mov w3, #9 /--* t27 int +--* t8 int +--* t14 int Generating: N031 ( 33, 25) [000030] ----------- t30 = * SELECT int REG x0 IN0007: cmp w0, #0 IN0008: csel w0, w2, w3, ne
The full assembly:
G_M6318_IG01: IN000e: 000000 stp fp, lr, [sp, #-0x10]! IN000f: 000004 mov fp, sp G_M6318_IG02: IN0001: 000008 cmp w0, #3 IN0002: 00000C ccmp w1, #10, 0, hi IN0003: 000010 ccmp w1, #4, z, ne IN0004: 000014 cset x0, hi IN0005: 000018 mov w2, #12 IN0006: 00001C mov w3, #9 IN0007: 000020 cmp w0, #0 IN0008: 000024 csel w0, w2, w3, ne IN0009: 000028 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN000a: 00002C movk x2, #0x65A9 LSL #16 IN000b: 000030 movk x2, #0xFFFF LSL #32 IN000c: 000034 ldr x2, [x2] IN000d: 000038 blr x2 G_M6318_IG03: IN0010: 00003C ldp fp, lr, [sp], #0x10 IN0011: 000040 ret lr
Looking at the generation we can see the result of the if goes into x0 during SETCC. The CSEL instruction checks the flags, so the result in x0 must go back into flags via an additional CMP of x0 and 0.
CMP
Looking back at the previous run, during lowering SELECT is lowered as follows. The then/else inputs to the SELECT are checked to make sure they can be moved prior to the compares. Also, the condition input must either be a SETCC or a compare node. If so, the lowering happens: the inputs are moved; the SELECT becomes a SELECTCC; the SETCC is removed, and its GT condition is put into the SELECTCC. This gives:
then/else
GT
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 3 $43 N004 ( 1, 1) [000016] ----------- t16 = LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] -c--------- t17 = CNS_INT int 10 $44 N010 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] -c--------- t5 = CNS_INT int 4 $45 N016 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 $47 /--* t0 int +--* t1 int N003 ( 6, 4) [000002] -------N-U- * CMP void /--* t16 int +--* t17 int N006 ( 6, 4) [000018] -------N--- * CCMP void cond=UGT flags=0 /--* t4 int +--* t5 int N012 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z /--* t8 int +--* t14 int N018 ( 33, 25) [000030] ----------- t30 = * SELECTCC int cond=UGT
This is generated as:
Generating: N027 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z REG NA IN0005: ccmp w1, #4, z, ne /--* t8 int +--* t14 int Generating: N029 ( 33, 25) [000030] ----------- t30 = * SELECTCC int cond=UGT REG x0 IN0006: csel w0, w2, w3, hi /--* t30 int Generating: N031 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 x0 REG x0 $VN.Void V00 in reg x0 is becoming live [000015] Live regs: 0002 {x1} => 0003 {x0 x1} New debug range: new var or location Live vars after [000015]: {V01} => {V00 V01}
The full function is generated as follows:
G_M6318_IG01: IN000c: 000000 stp fp, lr, [sp, #-0x10]! IN000d: 000004 mov fp, sp G_M6318_IG02: IN0001: 000008 mov w2, #12 IN0002: 00000C mov w3, #9 IN0003: 000010 cmp w0, #3 IN0004: 000014 ccmp w1, #10, 0, hi IN0005: 000018 ccmp w1, #4, z, ne IN0006: 00001C csel w0, w2, w3, hi IN0007: 000020 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0008: 000024 movk x2, #0x35A9 LSL #16 IN0009: 000028 movk x2, #0xFFFF LSL #32 IN000a: 00002C ldr x2, [x2] IN000b: 000030 blr x2 G_M6318_IG03: IN000e: 000034 ldp fp, lr, [sp], #0x10 IN000f: 000038 ret lr
This code is now ideal. CMP, a sequence of compares, then CSEL.
Arm64 provides some variations on the CSEL instruction.
CINC x0, x1, lt
LT
x0 = x1+1
CINC x0, x1, x2 lt
x0 = x2
CINV x0, x1, lt
x0 = ~x1
CSINV x0, x1, x2 lt
CNEG x0, x1, lt
x0 = NOT(x1)
CSNEG x0, x1, x2 lt
The CS instructions look strange at first glance. It is not immediately obvious why they are required. However, they fit into the standard 2 input, 1 output format that most Arm64 instructions use. The other three instructions are simply aliases of the CS instructions that are used when the input registers match. Note there is nothing different in the encoding of the instruction when the alias is used, it is purely a stylistic choice or readability.
RyuJIT will spot patterns that can use CINC, CINV and CNEG. This is performed during the lowering of SELECT.
CINC
CINV
CNEG
Consider the following, also from the same example program:
[MethodImpl(MethodImplOptions.NoInlining)] static void TestAndOrWithElse(uint op1, uint op2) { if (op1 > 3 && op2 == 10 || op2 <= 4) { op1 = 9; } else { op1 = 12; } Consume(op1, op2); }
On entry to lowering, the LIR will be:
------------ BB01 [000..014) (return), preds={} succs={} [000017] ----------- IL_OFFSET void INLRT @ 0x000[E-] N001 ( 0, 0) [000003] ----------- NOP void [000018] ----------- IL_OFFSET void INLRT @ 0x004[E-] N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] ----------- t1 = CNS_INT int 1 $41 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] N------N-U- t2 = * LE int $100 N004 ( 1, 2) [000004] ----------- t4 = CNS_INT int 8 $43 N005 ( 1, 2) [000010] ----------- t10 = CNS_INT int 7 $44 /--* t2 int +--* t4 int +--* t10 int N006 ( 6, 9) [000016] ----------- t16 = * SELECT int /--* t16 int N007 ( 6, 9) [000011] DA--------- * STORE_LCL_VAR int V01 arg1 d:4 $VN.Void [000019] ----------- IL_OFFSET void INLRT @ 0x009[E-] N001 ( 0, 0) [000005] ----------- NOP void [000020] ----------- IL_OFFSET void INLRT @ 0x00C[E-] N001 ( 1, 1) [000006] ----------- t6 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V01 arg1 u:2 (last use) $140 /--* t6 int arg0 in x0 +--* t7 int arg1 in x1 N003 ( 16, 6) [000008] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void [000021] ----------- IL_OFFSET void INLRT @ 0x013[E-] N001 ( 0, 0) [000009] ----------- RETURN void $VN.Void
This, as we would expect, is lowered to a CMP and SELECTCC:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 1 $41 N004 ( 1, 2) [000004] ----------- t4 = CNS_INT int 8 $43 N005 ( 1, 2) [000010] ----------- t10 = CNS_INT int 7 $44 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] -------N-U- * CMP void /--* t4 int +--* t10 int N006 ( 6, 9) [000016] ----------- t16 = * SELECTCC int cond=ULE
After lowering to SELECTCC, the same function will check the inputs. If they are both constants and one of them is one bigger than the other, then it can be optimized further. If the first input is the higher one, then then the inputs are swapped, and the condition is inverted. Then the lower input is removed and the SELECT_CC is changed to the succinctly named SELECT_INCCC.
SELECT_CC
SELECT_INCCC
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 1 $41 N005 ( 1, 2) [000010] ----------- t10 = CNS_INT int 7 $44 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] -------N-U- * CMP void /--* t10 int N006 ( 6, 9) [000016] ----------- t16 = * SELECT_INCCC int cond=ULE
This then generates exactly as expected:
G_M40289_IG01: IN000a: 000000 stp fp, lr, [sp, #-0x10]! IN000b: 000004 mov fp, sp G_M40289_IG02: IN0001: 000008 mov w2, #7 IN0002: 00000C cmp w0, #1 IN0003: 000010 cinc w2, w2, ls IN0004: 000014 sxtw w1, w2 IN0005: 000018 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0006: 00001C movk x2, #0x55A8 LSL #16 IN0007: 000020 movk x2, #0xFFFF LSL #32 IN0008: 000024 ldr x2, [x2] IN0009: 000028 blr x2 G_M40289_IG03: IN000c: 00002C ldp fp, lr, [sp], #0x10 IN000d: 000030 ret lr
The CINV transformation acts in a similar way but allows for slightly stranger patterns. First consider:
[MethodImpl(MethodImplOptions.NoInlining)] static void TestInv(uint op1, uint op2) { if (op1 != 7) { op2 = ~op2; } Consume(op1, op2); }
The LIR will be:
------------ BB01 [000..010) (return), preds={} succs={} [000017] ----------- IL_OFFSET void INLRT @ 0x000[E-] N001 ( 0, 0) [000003] ----------- NOP void [000018] ----------- IL_OFFSET void INLRT @ 0x004[E-] N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] ----------- t1 = CNS_INT int 7 $43 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] J------N--- t2 = * EQ int $100 N004 ( 1, 1) [000015] ----------- t15 = LCL_VAR int V01 arg1 N005 ( 1, 1) [000008] ----------- t8 = LCL_VAR int V01 arg1 u:1 (last use) $81 /--* t8 int N006 ( 2, 2) [000009] ----------- t9 = * NOT int $82 /--* t2 int +--* t15 int +--* t9 int N007 ( 7, 8) [000016] ----------- t16 = * SELECT int /--* t16 int N008 ( 7, 8) [000010] DA--------- * STORE_LCL_VAR int V01 arg1 d:3 $VN.Void [000019] ----------- IL_OFFSET void INLRT @ 0x008[E-] N001 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 1) [000005] ----------- t5 = LCL_VAR int V01 arg1 u:2 (last use) $140 /--* t4 int arg0 in x0 +--* t5 int arg1 in x1 N003 ( 16, 6) [000006] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void [000020] ----------- IL_OFFSET void INLRT @ 0x00F[E-] N001 ( 0, 0) [000007] ----------- RETURN void $VN.Void
This will follow the same pattern as the CINC optimization. After lowering the SELECT to SELECTCC, the same function will check the then and else statements. If one of them is a NOT statement, then this can be optimized. A SELECT_INVCC node is created in the expected way. Note that unlike SELECT_INCCC, the SELECT_INVCC has the same then and else inputs as the SELECT node - this distinction will become important later.
NOT
SELECT_INVCC
SELECT_INCCC,
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 7 $43 N004 ( 1, 1) [000015] ----------- t15 = LCL_VAR int V01 arg1 N005 ( 1, 1) [000008] ----------- t8 = LCL_VAR int V01 arg1 u:1 (last use) $81 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] -------N--- * CMP void /--* t15 int +--* t8 int N007 ( 7, 8) [000016] ----------- t16 = * SELECT_INVCC int cond=UEQ /--* t16 int N008 ( 7, 8) [000010] DA--------- * STORE_LCL_VAR int V01 arg1 d:3 $VN.Void
This is generated into:
G_M20564_IG01: IN0008: 000000 stp fp, lr, [sp, #-0x10]! IN0009: 000004 mov fp, sp G_M20564_IG02: IN0001: 000008 cmp w0, #7 IN0002: 00000C csinv w1, w1, w1, eq IN0003: 000010 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0004: 000014 movk x2, #0x55A6 LSL #16 IN0005: 000018 movk x2, #0xFFFF LSL #32 IN0006: 00001C ldr x2, [x2] IN0007: 000020 blr x2 G_M20564_IG03: IN000a: 000024 ldp fp, lr, [sp], #0x10 IN000b: 000028 ret lr
Note that the disassembler prints CSINV instead of CINC. This is the same functionality, and it is optional for the disassembler to show it as CINC, although it would help debugging slightly.
CSINV
Now consider:
[MethodImpl(MethodImplOptions.NoInlining)] static void TestInv2(uint op1, uint op2) { if (op1 != 7) { op2 = ~op2; } else { op2 = 3; } Consume(op1, op2); }
This will still get converted at lowering into SELECT_INVCC. The phase does not care that the then and else cases calculate different inputs, only one of them has a ~ operation. This fits the extended CSINV instruction exactly, and the function will generate into:
G_M15974_IG01: IN0009: 000000 stp fp, lr, [sp, #-0x10]! IN000a: 000004 mov fp, sp G_M15974_IG02: IN0001: 000008 mov w2, #3 IN0002: 00000C cmp w0, #7 IN0003: 000010 csinv w1, w2, w1, eq IN0004: 000014 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0005: 000018 movk x2, #0x31A6 LSL #16 IN0006: 00001C movk x2, #0xFFFF LSL #32 IN0007: 000020 ldr x2, [x2] IN0008: 000024 blr x2 G_M15974_IG03: IN000b: 000028 ldp fp, lr, [sp], #0x10 IN000c: 00002C ret lr
Imagine a scenario where CSINV does not exists, but CINV does. To use CINV, RyuJIT would have to fully parse the then and else cases to confirm they are the same inputs.
The CNEG optimization works almost identically to CINV, including the extending CSNEG variants, except it applies to subtraction instead of negation. These are left as an exercise for the reader to follow.
CSNEG
That wraps up everything I wanted to walk through across the series. Over three parts we went through both the optimize bools and if conversion phases. We saw how they interact with each other to create a complete solution. We then looked at how they interact with other optimizations. Finally, we looked at a couple of further optimizations that build on top. While none of these optimizations offer large performance gains, they each build off each other and eventually add up. In addition, these will aid future optimizations, for example the simplification of the blocks simplifies loop optimizations.