We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello Community,
We are evaluating the performance introduced by MTE on the Pixel 8 Pro. We select SPEC2017 C++ rate suites as benchmarks and pin them on Cortex-A715 cores. The paper, StickyTags(ieeexplore.ieee.org/.../stamp.jsp, points out that frequent memory tagging (STG instruction) is a major performance bottleneck of existing MTE-based solutions. However, our evaluation results show more details and are confusing. Specifically, we only enable MTE tag memory for the stack and heap respectively(via mprotect(..., PROT_MTE...)), and evaluate on different MTE check modes: ignore mode, async mode, sync mode, and disable tag check. In particular, the User-mode process doesn't execute any STG instruction, i.e., there is no frequent memory tagging. We have run each benchmark at least 3 times, and select the average value as the last result.
The result on the stack is shown below(stack-mte-ignore-tcf means enable MTE tag memory for the stack and ignore tag check fault, enable-tco means disable MTE tag check via "msr tco, #1"):
The percentage indicates the performance overhead when compared to performance without enabling MTE tag memory.
Q1: What causes the increased performance overhead when enabling TCO on 531.deepsjeng_r and 526.blender_r? For instance, the overhead of stack-mte-ignore-tcf on 526.blender_r is 0%; yet, the overhead of stack-mte-ignore-tcf-enable-tco on 526.blender_r is 14.20%. As per the Arm® Architecture Reference Manual for A-profile architecture, enabling TCO disables tag checks.
Q2: The outcome observed in 510.parest_r contrasts with that of 531.deepsjeng_r/526.blender_r. Enabling TCO effectively diminishes the overhead on 510.parest_r. For instance, while the overhead of stack-mte-ignore-tcf on 510.parest_r is 6.82%, the overhead of stack-mte-ignore-tcf-enable-tco on 510.parest_r is 2.35%. Therefore, the question arises: why does enabling TCO have divergent effects on different benchmarks? While it decelerates performance on 531.deepsjeng_r and 526.blender_r, it accelerates performance on 510.parest_r.
The result on the heap is shown below(heap-mte-ignore-tcf means enable MTE tag memory for the heap and ignore tag check fault, enable-tco means disable MTE tag check via "msr tco, #1"):
When enabling MTE tag memory for the heap, most benchmarks slow down noticeably. The worst is 508.namd_r, with >30% performance overhead. For 511.povray_r, enabling TCO also slows down the performance. While for other benchmarks, enabling TCO can effectively reduce the performance overhead, except for 541.leela_r.
Q3:Is the performance overhead shown above considered reasonable? StickyTags highlights that frequent memory tagging (via STG instruction) is a significant performance bottleneck in current MTE-based solutions. However, our assessment reveals that solely enabling MTE tag memory(via mprotect(..., PROT_MTE...)) for the heap already leads to a noticeable performance overhead.
Your help would be very much appreciated. And thank you very much in advance.