This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

program execution time in ARM Cortex-A9 processor

Note: This was originally posted on 4th January 2013 at http://forums.arm.com

  I'm using ARM Cortex-A9 and trying to read the value from CCNT time counter through the assembly code.  I am following this post http://stackoverflow.com/questions/3247373/how-to-measure-program-execution-time-in-arm-cortex-a8-processor?answertab=oldest#tab-top .  In accordance with it, before I can read the value from timer, I have  to enable counter, enable a 64-bit divider and clear overflows. These  operations are performed by writing inside the appropriate registers  (for instance, PMCR (Performance Monitro Control Register)). So, I am  printed counter values in a loop to keep track how overflow occurs and I  have this behavior:
[size="1"]1     (starts to incrementing after it was reset to zero)
4650
4858
4943
5023
...
...     (incrementing...)
...
4293939054
4293939128    (overflow happens)
1602570          
1602703
1602788
...
...
4293522911
4293522987
4293523062
4293523137
1186243
1186367
1186453
1186536
1186612
1186686
...
4293536300
4293536377
4293536456
4293536533
4293536612
1199090
1199209
1199295
1199373
1199453
1199530
....
and so forth.

[/size]  Accordingly, I have a set of questions:


  a) Which or the said above registers are used by the Linux kernel ?  (how reliable is the information for further kernel versions). How safe  can be the change of their values?

  [size="3"]b ) What is the accurate value of CCNT frequency and how to get it?  Unfortunately, I can't find the value in processor spec. However, dmesg  says that [/size]
   [ 0.000000] OMAP clocksource: GPTIMER2 at 24000000 Hz
   [ 0.000000] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
   [ 0.132855] Switching to clocksource gp timer 
  But identifying it manually, against the clock_gettime,  gives me 7 MHz. So, why it is not 24 MHz as expected?

  c) According to my first output, why after the overflow it starts not with zero, but from about 1 mil ?

  d) Why without 64 divider am I getting wrong results? The value starts to jump this way:

  ...
134110099
134114934
134119656
302352300
302361825
302367135
...
2885588930
2885593776
2885598630
3053958670
3053966752
3053972232
...
261130096
261134909
429343853
429351487
429356735

  I'd appreciate any help. Thanks
Parents
  • Note: This was originally posted on 7th January 2013 at http://forums.arm.com

    b - It does sort of depend on what you care about,  Cycles are often preferable to time, as they're frequency independent.  For example, it takes a Cortex-A9 X cycles to execute this code fragment.  Allowing for memory system effects, that'll be true for all A9 based parts.  If what you need is time, then I'm not sure you have much option but to use a system call.  All the timers are memory mapped, and the kernel shouldn;t allow direct access by user space applications.

    c - I'm sorry, I don't follow.  Code you post a code snippet? 

    But note, what is important is the frequency you sample the CCNT at, not the total number of times you sample it.  Think of it this way...  The processor is probably running at something close to 1 GHz, which gives around 1,000,000,000 cycles (ticks) per second.  So it takes 1/1000 th of a second for CCNT to be incremented by 1,000,000.  Now your app is sample the CCNT and printing it's value, how long does one iteration of that loop take?  Add in the fact that your app isn't running all the time, as the kernel is sometimes switching it out to run something else....  It could simply be that you just miss the lower count values.
Reply
  • Note: This was originally posted on 7th January 2013 at http://forums.arm.com

    b - It does sort of depend on what you care about,  Cycles are often preferable to time, as they're frequency independent.  For example, it takes a Cortex-A9 X cycles to execute this code fragment.  Allowing for memory system effects, that'll be true for all A9 based parts.  If what you need is time, then I'm not sure you have much option but to use a system call.  All the timers are memory mapped, and the kernel shouldn;t allow direct access by user space applications.

    c - I'm sorry, I don't follow.  Code you post a code snippet? 

    But note, what is important is the frequency you sample the CCNT at, not the total number of times you sample it.  Think of it this way...  The processor is probably running at something close to 1 GHz, which gives around 1,000,000,000 cycles (ticks) per second.  So it takes 1/1000 th of a second for CCNT to be incremented by 1,000,000.  Now your app is sample the CCNT and printing it's value, how long does one iteration of that loop take?  Add in the fact that your app isn't running all the time, as the kernel is sometimes switching it out to run something else....  It could simply be that you just miss the lower count values.
Children
No data