I'm running a simple bare-metal application on Armv8-R AEM FVP (FVP_BaseR_AEMv8R 11.20.15). For the same exact code on the model's Armv8-A counterpart (FVP_Base_RevC-2xAEMvA), I noticed that setting cache_state_modelled=0 considerably speeds up emulation. However, when I apply the same option o FVP_BaseR_AEMv8R, the model does not seem to work. I would suspect the app's code if it was the other way around (i.e., stop working when enabling cache modeling), but couldn't see how disabling it would impact it. A bit of debugging lead me to conclude that the exclusive ld/st instructions stopped working. Section "2.4.7 Global exclusive monitor in Fast Models" in Fast Models Reference Guide 11.20, seems to imply that when cache_state_modelled=0, there is some backup implementation for the exclusive monitor that does not rely on the coherence protocol thus not depending on cache state modelling. This seems to be true for FVP_Base_RevC-2xAEMvA but not for FVP_BaseR_AEMv8R 11.20.15. Any idea on might be going wrong?
Hi josecm ,
Arm-v8A and Arm-vAR are different architecture so the exact same software image (binary) cannot be run. Setting cache_state_modelled=0 can speed up the simulation because the cache state is disabled so overhead of it is removed.
What type of the issue you see when cache_state_modelled=0 on FVP_BaseR_AEMv8R? Can you please explain a little bit more about "the exclusive ld/st instructions stopped working" ? I suspect setting cache_state_modelled is a red herring. With or without cache_state_modelled will differ the actual sequence of the instruction execution so it might hide the issue on the exclusive LD/ST instructions resided in the software.
If you still need any assistance, can you please create a support case through https://developer.arm.com/All%20Support%20Services ?
Well, you are almost right in that respect, the binaries are not *exactly* the same, although contrarily to what you are saying, one can run the same binary for Armv8-A and Armv8-R as the base instruction set is the same - essentially, only the memory protection scheme differs. The single difference between the binaries I am using is that, while one enables the MMU (page tables are generated at compile time), the other setups up the MPU and this is selected at compile time through conditional compilation. So, only a handful of instructions differ from binary to binary which, btw, execute well before the lock in question stalls. The ld/st exclusive instructions I'm mentioning are part of the spinlock code that *IS* common for both images, and I'm using the same exact compiler, so I fail to see how it could be a software issue as you suggest.
Also, there's the fact that the same Armv8-R binary works fine if cache_state_modelled=1