Is the FVP accurate in terms of measuring performance of programs? Is it cycle accurate? If I use clock_gettime to measure time taken on applications, is it meaningful? If not, is there an accurate way to measure performance of programs on the FVP?
Hi Mohannad,FVPs, and Fast Models in general, are functionally accurate, meaning that they fully execute all instructions correctly, however they are not cycle accurate (a separate technolgy, Cycle Models, are available for that use case).As the name implies, Fast Models (the technology from which the FVPs are built) are designed to execute code quickly, typically in the order of 100M instructions/sec, whereas Cycle Models run in the 10k-100k range.Some high level timing annotation can be applied to the FVP (use <fvp_executable> --list-params to see all the available options, then edit as appropriate) to change cache and memory access characteristics etc, the effect of these can be seen with the --stat output. I tend to use this as a relative comparison rather than absolute. Some further annotation (pipeline models etc) can be applied with the full Fast Models tool. Note that these annotations will impact the performance of the model.For more information, see https://developer.arm.com/docs/100965/1110/timing-annotation
Hello Ronan,
Thank you for your reply.
That's interesting! Are there any Cycle Models that support ARMv8.3 and above? I took a look but could only find for the already released Cortex processors. I'll take a look at the timing annotation link and experiment with it a bit more on the FVP. I'll let you know if I have any more questions. As always, thank you very much for your help and support!
Mohannad Ismail
Sorry, I should have been more clear on that. As Cycle Models are derived from the actual RTL design, these only become available once the CPUs are released. I was just making the point in general that these are the models for true cycle accuracy.
Hi Mohannad,
adding some more detail to Ronan's answer on Fast Models Timing Annotation.
Each of the Fast Models CPU Models has parameters "cpi_mul" and "cpi_div". By default, Fast Models execute 1 instruction per clock tick, i.e., CPI = 1). These parameters can be used to modify that. e.g. to get a CPI of 1.25, you would set cpi_mul=5, cpi_div=4. (Fast Models doesn't support real numbers as parameters, so to create a fraction you need the two integer parameters).
Caches (and TLB Page Tables, etc) in Fast Model CPUs have a set of parameters to define estimated latency caused by accesses. By default cache modelling is set OFF in the Fast Model as it won't affect software functionality. In order for the latency to be applied to accesses cache modelling must be switched on by parameter. This can be done at the start of simulation or at a pre-defined clock count in the simulation.
The delays on downstream Memory Accesses are not directly supported on the FVP. These rely on annotating the delay to SystemC/TLM b_transport transactions between the CPU and downstream models. As the FVP does not use these there is no way of inserting the delay. Using Fast Models and building the platform from source is required. You could use the delay annotation on cache miss operations for the outermost cache (the L2 in the Base FVP) that include an estimated delay for a downstream memory access. Note: although memory is the most usual use of the TA it could be on peripherals, or interconnects, or other components as long as they are using b_transports between the CPU model and the component.
In general, to use the Timing Annotations that are available requires that TA is enabled in the Fast Model. It is set off by default. This is done by setting the environment variable "FASTSIM_DISABLE_TA" to 0 before starting the model.
Hello Rob,
Thank you very much for the detailed explanation. This will help! I will experiment with this further and will come back to ask if I have any questions.
View all questions in Simulation Models forum