I've been playing around with MTE and BTI, and I was wondering if there was some way to do performance benchmarking on applications that use instructions from these instruction sets. I see that FVP supports up to ARM 8.5, but to my knowledge FVP isn't cycle accurate. If I can't emulate these instructions accurately, maybe someone knows of a way to get a rough estimate of the number of cycles each instruction might consume? Similar work has been done with pointer authentication instructions, in which a cycle count was estimated for the PAC* and AUT* instructions based on the nature of QARMA.