We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Good morning at all,
I am profiling an application code and I am currently dealing with timing-related issue.
1. I would like to understand how the pipeline is working with vfma instruction. The official documentation says vfma takes 3 cycles. In the following piece of ASM code from my performance counter I got 9 cycles for the execution.
8000730: vfma.f32 s11, s7, s15 8000734: vfma.f32 s12, s15, s8 8000738: vfma.f32 s13, s15, s9800073c: vfma.f32 s14, s15, s108000740: bne.n 800071a <dualClass_svmPredict_unroll4+0x6a>
I am thinking that, due to pipelining, the timings are the following:
vfma.f32 s11, s7, s15 3 cyclesvfma.f32 s12, s15, s8 2 cyclesvfma.f32 s13, s15, s9 2 cyclesvfma.f32 s14, s15, s10 2 cycles
Am I right?
2. If the code is the following:
800073c: vfma.f32 s14, s15, s108000740: bne.n 800071a <dualClass_svmPredict_unroll4+0x6a>
The cycles required to execute vfma are 3?
3. Why when vfma is pipelined takes 2 cycles and not 1?
Thanks,