This blog post is the second and final part of the blog series on Dev Testing in .NET. In part 1, we walked through how .NET uses environment variables, we got some detailed IR dumps and finally we started looking at SuperPMI. This part walks through some other uses of SuperPMI and concludes with fuzzing with Fuzzlyn.
The third mode of SPMI is asmdiffs which works like replay but with more steps.
$ superpmi.py asmdiffs
In addition to downloading all the collections it also downloads a copy of the base libclrjit.so from the RyuJIT rolling build (which stores a built copy of RyuJIT for all platforms for every change to the JIT).
Each method in a collection runs twice. Once with the local RyuJIT, the second time using the downloaded base RyuJIT. If there is any difference in the resulting assembly output for a given method, then both assembly dumps are written to disk. Any compilation failures are discarded. Finally, the tool writes a markdown formatted summary of the number and size of all the differences and some example differences. The assembly dumps are text files, so it is very easy to run diff comparisons on them.
This tool gives the ability to see the impact of a patch across the entire test suite in simple to read text files. It is invaluable when developing code changes. A developer or code reviewer can quickly see the impact of a code change across the entirety of the .NET test suite, benchmarks, and libraries. In all, this is over 1.5 million methods and counting. It will catch unexpected regressions in corner cases and show how often a new optimization is triggered.
It is better to show this example in action. For brevity, we are sticking with a single collection. If DOTNET_JitFailLoweringBitCast is still set from part 1, then unset it first.
❯ python ./src/coreclr/scripts/superpmi.py asmdiffs -mch_files ./artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:49:38] ================ Logging to /home/alahay01/dotnet/runtime_bools/artifacts/spmi/superpmi.6.log [08:49:38] Using JIT/EE Version from jiteeversionguid.h: 4bceb905-d550-4a5d-b1eb-276fff68d183 [08:49:38] Baseline hash: 3bda6e0013ddb5b48a7b2a89fd84bf4fbbed0e37 [08:49:38] Using baseline /home/alahay01/dotnet/runtime_bools/artifacts/spmi/basejit/31234a863efe1a4dc1c6f4f1520f8515d5a90640.linux.arm64.Checked/libclrjit.so [08:49:38] Using coredistools found at /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libcoredistools.so [08:49:38] SuperPMI ASM diffs [08:49:38] Base JIT Path: /home/alahay01/dotnet/runtime_bools/artifacts/spmi/basejit/31234a863efe1a4dc1c6f4f1520f8515d5a90640.linux.arm64.Checked/libclrjit.so [08:49:38] Diff JIT Path: /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so [08:49:38] Using MCH files: [08:49:38] /home/alahay01/dotnet/runtime_bools/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:49:38] Running asm diffs of /home/alahay01/dotnet/runtime_bools/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:49:41] Clean SuperPMI diff (32525 contexts processed) [08:49:41] Asm diffs summary: [08:49:41] Summary Markdown file: /home/alahay01/dotnet/runtime_bools/artifacts/spmi/diff_summary.md [08:49:41] No asm diffs
As expected, this did not produce any difference against the base RyuJIT. This is a good test to show that the compiler output is consistent. Sometimes a full run across all the collections may show a few 0 length differences. This is usually due to address or other constant value changes which the tool aims to ignore.
To show an example where the code generates a different output, we can use 1 of the special options from the example repo. Due to the option only existing in the example repo, it does not affect the result of the base compilation.
$ export DOTNET_JitDoLowerJTrue=0 $ python ./src/coreclr/scripts/superpmi.py asmdiffs -mch_files ./artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:50:22] ================ Logging to /home/alahay01/dotnet/runtime_bools/artifacts/spmi/superpmi.7.log [08:50:22] Using JIT/EE Version from jiteeversionguid.h: 4bceb905-d550-4a5d-b1eb-276fff68d183 [08:50:22] Baseline hash: 3bda6e0013ddb5b48a7b2a89fd84bf4fbbed0e37 [08:50:22] Using baseline /home/alahay01/dotnet/runtime_bools/artifacts/spmi/basejit/31234a863efe1a4dc1c6f4f1520f8515d5a90640.linux.arm64.Checked/libclrjit.so [08:50:22] Using coredistools found at /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libcoredistools.so [08:50:22] SuperPMI ASM diffs [08:50:22] Base JIT Path: /home/alahay01/dotnet/runtime_bools/artifacts/spmi/basejit/31234a863efe1a4dc1c6f4f1520f8515d5a90640.linux.arm64.Checked/libclrjit.so [08:50:22] Diff JIT Path: /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so [08:50:22] Using MCH files: [08:50:22] /home/alahay01/dotnet/runtime_bools/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:50:22] Running asm diffs of /home/alahay01/dotnet/runtime_bools/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:50:25] Asm diffs found [08:50:25] Creating dasm files: /home/alahay01/dotnet/runtime_bools/artifacts/spmi/asm.benchmarks.run.linux.arm64.checked.1/base /home/alahay01/dotnet/runtime_bools/artifacts/spmi/asm.benchmarks.run.linux.arm64.checked.1/diff [08:50:26] Differences found. To replay SuperPMI use: [08:50:26] [08:50:26] export DOTNET_JitAlignLoops=0 [08:50:26] export DOTNET_JitEnableNoWayAssert=1 [08:50:26] export DOTNET_JitNoForceFallback=1 [08:50:26] export DOTNET_JitDisasm=* [08:50:26] export DOTNET_JitUnwindDump=* [08:50:26] export DOTNET_JitEHDump=* [08:50:26] export DOTNET_JitDiffableDasm=1 [08:50:26] export DOTNET_JitDisasmDiffable=1 [08:50:26] export DOTNET_JitDisasmWithGC=1 [08:50:26] /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/superpmi -c ### /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so /home/alahay01/dotnet/runtime_bools/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch [08:50:26] [08:50:26] Total bytes of base: 14920552 [08:50:26] Total bytes of diff: 16097380 [08:50:26] Total bytes of delta: 1176828 (7.89% of base) [08:50:26] Generated asm is located under /home/alahay01/dotnet/runtime_bools/artifacts/spmi/asm.benchmarks.run.linux.arm64.checked.1/base /home/alahay01/dotnet/runtime_bools/artifacts/spmi/asm.benchmarks.run.linux.arm64.checked.1/diff [08:50:26] Textual differences found in generated asm. [08:50:26] jit-analyze not found on PATH. Generate a diff analysis report by building jit-analyze from https://github.com/dotnet/jitutils and running: [08:50:26] jit-analyze -r --base /home/alahay01/dotnet/runtime_bools/artifacts/spmi/asm.benchmarks.run.linux.arm64.checked.1/base --diff /home/alahay01/dotnet/runtime_bools/artifacts/spmi/asm.benchmarks.run.linux.arm64.checked.1/diff [08:50:26] 23,578 contexts with diffs (0 improvements, 23,578 regressions, 0 same size) [08:50:26] +1,176,828 bytes [08:50:26] [08:50:26] [08:50:26] Asm diffs summary: [08:50:26] Summary Markdown file: /home/alahay01/dotnet/runtime_bools/artifacts/spmi/diff_summary.1.md [08:50:26] Asm diffs in 1 MCH files: [08:50:26] /home/alahay01/dotnet/runtime_bools/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/benchmarks.run.linux.arm64.checked.mch
The differing assembly is found in the output directory:
❯ ls artifacts/spmi/asm.benchmarks.run.linux.arm64.checked/diff | head 10177.dasm 10507.dasm 10614.dasm 10678.dasm 11073.dasm 11473.dasm 11940.dasm 12148.dasm 12211.dasm 13441.dasm
One of the methods can quickly be compared:
diff artifacts/spmi/asm.benchmarks.run.linux.arm64.checked/base/3185.dasm artifacts/spmi/asm.benchmarks.run.linux.arm64.checked/diff/3185.dasm 29,30c29,32 < tbz w0, #0, G_M21790_IG12 < ;; size=20 bbWeight=1 PerfScore 5.50 --- > tst w0, #1 > cset x0, eq > cbnz w0, G_M21790_IG12 > ;; size=28 bbWeight=1 PerfScore 6.50 1371c1373 < ; Total bytes of code 5244, prolog size 12, PerfScore 1499.40, instruction count 1311, allocated bytes for code 5244 (MethodHash=cdf6aae1) for method System.Security.Cryptography.OidLookup:InitializeLookupDictionaries() (FullOpts) --- > ; Total bytes of code 5252, prolog size 12, PerfScore 1501.20, instruction count 1313, allocated bytes for code 5252 (MethodHash=cdf6aae1) for method System.Security.Cryptography.OidLookup:InitializeLookupDictionaries() (FullOpts) 1382c1384 < Function Length : 1311 (0x0051f) Actual length = 5244 (0x00147c) --- > Function Length : 1313 (0x00521) Actual length = 5252 (0x001484)
The tool also produces a nice summary file artifacts/spmi/diff_summary.md in the markdown format. This is very useful for posting into github comments or anywhere that supports markdown. It is split into sections which are togglable to save screen space. Each section is then split per collection. The main sections are:
When pasted into github, the summary above looks like this
An alternative way of using asmdiffs is to build 2 different versions of .NET. This can be passed to SPMI as the base and diff versions, allowing the quick comparison of different RyuJIT changes.
RyuJIT is quick to compile methods, but any new functionality will usually bring additional complexity. The compilation speed of a static compiler is of much less importance than a jit, so it has more scope to produce the best possible code. A JIT needs to compile a method and then get out of the way of the program execution as quickly as possible. Therefore, the benefit of an optimization must be weighed accordingly. The time it takes for SPMI replay to run a collection is the time to compile each method plus a fixed overhead. Therefore, SPMI replay can be used as a proxy for measuring the speed of the compiler. This is not an exact recreation of the real world execution of .NET as the runtime VM is very different to the superpmi tool, however the compilation using the RyuJIT libclrjit is identical. Therefore the time difference between 2 different SPMI replay runs can be used as a percentage change.
With a few changes to asmdiffs, it is possible to dump the assembly for every method in a collection. The resulting dumps can then be quickly searched, for example, to count the occurrences of a certain instruction, or number of branches. The changes required are left as an exercise for the reader. Patches are welcome if you want to add a new SPMI mode.
A shorter assembly does not mean a method will run quicker. For example, it could contain instructions which take longer to execute on the CPU. The assembly dump does contain a performance score for the method. However, this is only a rough estimate based on the sum of estimates of the relative time it takes to execute each instruction. SPMI does not currently collate these scores across the entire collection. They need finding manually per method. Again, patches are welcome.
The .NET testsuites generally provide good coverage across RyuJIT. All new code written will usually include new tests in the suite (unless it is already covered). But engineers are not perfect. Edge cases will be missed. Either due to the developer being unaware of those cases or being overlooked due to there being so many cases to test. Perhaps even just a lack of tests at all for a change. It happens. The testsuite can never be considered complete, and so requires additional testing. Fuzzers are helpful in partially filling this gap. .NET has both Fuzzlyn and Antigen. In this blog post, we cover Fuzzlyn, but the usage Antigen is very similar.
In its simplest form, just give Fuzzlyn folder containing the corerun binary from a .NET build and a time to run for. 5 minutes is a good start.
This will now (across multiple threads) run through a loop of:
$ cd Fuzzlyn $ dotnet publish -c Release --self-contained -r linux-arm64 $ ./Fuzzlyn/bin/Release/net7.0/linux-arm64/publish/Fuzzlyn --seconds-to-run 300 --parallelism -1 -host /home/alahay01/dotnet/runtime_base/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun 00:00:08/00:05:00 elapsed, 100 programs generated, 0 examples found 00:00:09/00:05:00 elapsed, 200 programs generated, 0 examples found 00:00:14/00:05:00 elapsed, 300 programs generated, 0 examples found 00:00:15/00:05:00 elapsed, 400 programs generated, 0 examples found 00:00:16/00:05:00 elapsed, 500 programs generated, 0 examples found 00:00:19/00:05:00 elapsed, 600 programs generated, 0 examples found <SNIP> 00:04:52/00:05:00 elapsed, 18100 programs generated, 0 examples found 00:04:54/00:05:00 elapsed, 18000 programs generated, 0 examples found 00:04:55/00:05:00 elapsed, 18300 programs generated, 0 examples found 00:04:55/00:05:00 elapsed, 18200 programs generated, 0 examples found 00:04:59/00:05:00 elapsed, 18400 programs generated, 0 examples found 00:04:59/00:05:00 elapsed, 18500 programs generated, 0 examples found
The example .NET we built has a number of extra config options. One of those options causes the results of some if statements to be inverted. Obviously, this causes any program run with this flag to execute incorrectly. However, in this example it is useful to cause Fuzzlyn to throw an error.
$ export DOTNET_JitInvertIfConversion=1 $ ./Fuzzlyn/bin/Release/net7.0/linux-arm64/publish/Fuzzlyn --num-programs 100 -host /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun Found example with seed 12967064399465141584 Found example with seed 3904486820975108555 00:01:12 elapsed, 100/100 programs generated, 2 examples found
Note that reproducing this code example will, due to the randomness, generate a different seed, or multiple seeds or none at all.
We can re-run the failure above or view the code for it:
$ ./Fuzzlyn/bin/Release/net7.0/linux-arm64/publish/Fuzzlyn --seed 12967064399465141584 -host /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun Found example with seed 12967064399465141584 $ ./Fuzzlyn/bin/Release/net7.0/linux-arm64/publish/Fuzzlyn --seed 12967064399465141584 -host /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun --output-source // Generated by Fuzzlyn v1.6 on 2023-10-24 15:00:17 // Run on Arm64 Linux // Seed: 12967064399465141584 using System.Runtime.CompilerServices; public class Program { public static Fuzzlyn.ExecutionServer.IRuntime s_rt; public static bool s_1 = true; public static uint s_2 = 4294967294U; public static ulong[] s_3 = new ulong[]{1UL, 16888081499558940079UL, 1UL, 14466453191820878323UL, 0UL}; public static bool s_4 = true; public static ulong s_5 = 0UL; public static long s_6 = 1L; public static byte s_7 = 143; public static sbyte s_8 = -2; public static ushort s_9 = 1; public static ulong s_10 = 1UL; public static uint[] s_11 = new uint[]{4294967294U, 1U, 0U, 0U, 1U, 2280772701U}; public static bool s_12 = false; public static byte s_13 = 7; public static int s_14 = -28519332; <SNIP>
The full program can be compiled by itself outside of Fuzzlyn but is over 5000 lines and looks a little weird (but maybe not so once you consider the random nature). If that feels too big to debug easily, then Fuzzlyn can reduce it down. It incrementally bisects the program into smaller pieces until it finds the smallest piece that still generates the same error.
$ ./Fuzzlyn/bin/Release/net7.0/linux-arm64/publish/Fuzzlyn --seed 12967064399465141584 -host /home/alahay01/dotnet/runtime_bools/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun --reduce Simplifying Coarsely. Total elapsed: 00:00:33. Method 90/90. Simplifying Statements. Total elapsed: 00:00:53. Iter: 187/187 Simplifying Expressions. Total elapsed: 00:00:57. Iter: 1029/1029 Simplifying Members. Total elapsed: 00:01:03. Iter: 10/10 Simplifying Statements. Total elapsed: 00:01:03. Iter: 15/15 Simplifying Expressions. Total elapsed: 00:01:03. Iter: 49/49 Simplifying Members. Total elapsed: 00:01:03. Iter: 7/7 Simplifying Statements. Total elapsed: 00:01:04. Iter: 9/9 Simplifying Expressions. Total elapsed: 00:01:04. Iter: 34/34 Simplifying Members. Total elapsed: 00:01:04. Iter: 7/7 Simplifying Statements. Total elapsed: 00:01:04. Iter: 9/9 Simplifying Expressions. Total elapsed: 00:01:04. Iter: 34/34 Simplifying Members. Total elapsed: 00:01:04. Iter: 7/7 // Generated by Fuzzlyn v1.6 on 2023-10-24 15:02:19 // Run on Arm64 Linux // Seed: 12967064399465141584 // Reduced from 199.5 KiB to 0.4 KiB in 00:01:07 // Debug: Outputs 0 // Release: Outputs 2147483647 using System.Runtime.CompilerServices; public class Program { public static bool s_34; public static int s_40; public static void Main() { var vr1 = M58(); System.Console.WriteLine(vr1); } public static int M58() { if (s_34) { return 2147483647; } return s_40; } }
The whole process can take a couple of minutes. But the result is a program that is much easier to debug. The reduced form is also ideal for adding back into the testsuite once your bug has been fixed.
There are a few gaps in using Fuzzlyn (or indeed, any fuzzer tool).
Firstly, if your .NET is heavily broken then it may become tricky to test in Fuzzlyn. Every failing program reduces to the same boilerplate entry function. In those cases, other tools are better at resolving your issues, and you should probably go check SPMI first. In fact, you should always check SPMI replay is clean before trying Fuzzlyn.
Secondly, how long should Fuzzlyn run for before you can be certain it has not found any bugs? Is 1 minute ok, or 10 minutes? When debugging an issue we had with a peephole optimization patch, a corner case issue inside the GC would only trigger once every 30 minutes. With random execution, it is difficult to have guaranteed certainty.
That is everything I wanted to cover in this blog series Over 2 posts we have talked about:
I hope that this blog series is useful and helps you on your .NET journey.