How can I emulate an ARM Cortex-M processor cycle-accurately?

Can I emulate an ARM Cortex-M processor cycle-accurately? Preferably as part of some microcontroller (i.e., also emulating Flash and RAM). How?
For example, I want to be able to measure how many CPU cycles it takes to execute a given part of a program, but without using real hardware.
It seems that in the past I could use Arm Cycle Models for this purpose, but they are end-of-life now (community.arm.com/.../ip-exchange-and-cycle-models-end-of-life-update). On the other hand, Arm Virtual Hardware or Fixed Virtual Platforms are based on Fast Models (don't they?) that seem to be only functionally accurate, so they won't give me the correct number of elapsed cycles.