Is the FVP accurate in terms of measuring performance of programs? Is it cycle accurate? If I use clock_gettime to measure time taken on applications, is it meaningful? If not, is there an accurate way to measure performance of programs on the FVP?
Hello Ronan,
Thank you for your reply.
That's interesting! Are there any Cycle Models that support ARMv8.3 and above? I took a look but could only find for the already released Cortex processors. I'll take a look at the timing annotation link and experiment with it a bit more on the FVP. I'll let you know if I have any more questions. As always, thank you very much for your help and support!
Mohannad Ismail
Sorry, I should have been more clear on that. As Cycle Models are derived from the actual RTL design, these only become available once the CPUs are released. I was just making the point in general that these are the models for true cycle accuracy.
Hi Mohannad,
adding some more detail to Ronan's answer on Fast Models Timing Annotation.
Each of the Fast Models CPU Models has parameters "cpi_mul" and "cpi_div". By default, Fast Models execute 1 instruction per clock tick, i.e., CPI = 1). These parameters can be used to modify that. e.g. to get a CPI of 1.25, you would set cpi_mul=5, cpi_div=4. (Fast Models doesn't support real numbers as parameters, so to create a fraction you need the two integer parameters).
Caches (and TLB Page Tables, etc) in Fast Model CPUs have a set of parameters to define estimated latency caused by accesses. By default cache modelling is set OFF in the Fast Model as it won't affect software functionality. In order for the latency to be applied to accesses cache modelling must be switched on by parameter. This can be done at the start of simulation or at a pre-defined clock count in the simulation.
The delays on downstream Memory Accesses are not directly supported on the FVP. These rely on annotating the delay to SystemC/TLM b_transport transactions between the CPU and downstream models. As the FVP does not use these there is no way of inserting the delay. Using Fast Models and building the platform from source is required. You could use the delay annotation on cache miss operations for the outermost cache (the L2 in the Base FVP) that include an estimated delay for a downstream memory access. Note: although memory is the most usual use of the TA it could be on peripherals, or interconnects, or other components as long as they are using b_transports between the CPU model and the component.
In general, to use the Timing Annotations that are available requires that TA is enabled in the Fast Model. It is set off by default. This is done by setting the environment variable "FASTSIM_DISABLE_TA" to 0 before starting the model.
Hello Rob,
Thank you very much for the detailed explanation. This will help! I will experiment with this further and will come back to ask if I have any questions.
Oh I see. Thanks for the clarification!