CFP RCTX instruction on Neoverse-N3

Hello,

I am running on a Neoverse-N3 (AArch64) CPU and attempting to invalidate or separate the Branch Prediction Unit (BPU) between different pieces of code as part of an experiment.

The code runs inside a Linux kernel module at EL1 (kernel mode), and all the tested code segments execute in the same exception level (EL1) - there is no switching between exception levels.

My goal is to ensure that each piece of code runs in an independent branch prediction context, so that the execution of one code segment does not influence the branch prediction state of another.

I attempted to execute the relevant BPU invalidation instruction from EL1, but based on my observations it appears that the BPU state is not actually being invalidated between contexts.

So I have a few questions:

Is there any formal or reliable test/benchmark that can be used to verify that the BPU was successfully invalidated?
Could you provide an explanation or example of how this instruction should be correctly executed from EL1?

My overall goal is to run multiple code snippets sequentially in independent prediction contexts, such that their branch entruies within the BPU do not interfere with one another.

Any clarification or pointers would be greatly appreciated.

Thank you.