I'm looking for a tool to iterate faster on my ARM NEON optimizations all through software, i.e without using any hardware / dev boards. I came across ARM Development studio and its Fixed Vritual Platforms (FVPs). I am not very particular on cycle count accuracy when compared to real hardware. As long as i can get consistent cycle count numbers on multiple runs of the simulation, it will be sufficient for me to optimize my code better.
It would be good if i can select a Cortex A series processor (say A53 for now), and some memory model for the DRAM to go with it.
PS - there is also a PMU_AArch64 example provided with Arm Development Studio which you may find useful.
Thanks Ronan. This is really useful.