This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Pandaboard - ARM Cortex A9 - cache test

Note: This was originally posted on 10th January 2013 at http://forums.arm.com

Hi there, I executed a cache test on Pandaboard (crosscompiled armv7a Linux kernel 3.6.2-rt4), but it  seems not to use L1/L2 cache. My test program reads an array of chars many times and varying array dimention. First array dim=512 Second array dim=1024 ... Then doubling upto dim =16777216.
ARM Cortes A9 L1 cache should be 32kHz and L2 cache should be 8MB, therefore I should have  slower read time when array dim is 16777216 due to the fact that 16MB is major than 8MB (L2  cache size). Instead nanoseconds per byte time is always the same either for array dim=512 or for aray  dim=16777216. So I think cache is not working properly or all data are being fetched from RAM  and not from cache. I don't know if this behaviour depends on ARM VIPT d-cache.  I attach the cache_test file and its output is shown below. I compiled and executed this way:

#gcc -lrt cache_velocitest_prova-clockgettime_SUPERMOD.c -o  cache_velocitest_prova-clockgettime_SUPERMOD.out

#./cache_velocitest_prova-clockgettime_SUPERMOD.out

Array of   512 bytes: 1380700484 nanoseconds                                               
Array of  1024 bytes: 1359279485 nanoseconds                                                
Array of  2048 bytes: 1348870161 nanoseconds                                               
Array of   4096 bytes: 1343367477 nanoseconds                                              
  Array of    8192 bytes: 1341046176 nanoseconds                                              
  Array of   16384 bytes: 1339836722 nanoseconds                                              
  Array of    32768 bytes: 1338830707 nanoseconds                                              
  Array of   65536 bytes: 1338569092 nanoseconds                                               
Array of  131072 bytes: 1338442816 nanoseconds                                                
Array of   262144 bytes: 1338121853 nanoseconds                                              
  Array of  524288 bytes: 1338144144 nanoseconds                                              
  Array of  1048576 bytes: 1338410602 nanoseconds                                               
Array of  2097152 bytes: 1338235811 nanoseconds                                               
Array of 4194304 bytes: 1338208154 nanoseconds                                               
Array of 8388608 bytes: 1338324248 nanoseconds                                               
Array of 16777216 bytes: 1338262426 nanoseconds  
  
Any idea about why nanoseconds per bytes is unchanged  even if data are bigger than L2 cache?
Parents
  • Note: This was originally posted on 11th January 2013 at http://forums.arm.com

    The processor supports up to 8MB L2 cache, but your SoC (OMAP4430) only has 1MB of L2 cache. This is pretty normal for Cortex-A9 SoCs.

    Before looking further into it'd be good if you can confirm that your timing really works right. There's no good reason to use either long doubles or tv_nsec, and with the former and printfs you're really asking for trouble from bad compiler implementations for the platform. It'd be good if you could confirm that the values output with the printfs for start and end time make sense. If I were you I'd just calculate the number of usec passed in an int.

    If you need to be sure try running a loop with a very long iteration so you can compare it with time elapsed on a watch.

    The whole thing does look suspiciously like some kind of data representation problem. The bottom digits of the number still change in a way that seems to follow the cache configuration.
Reply
  • Note: This was originally posted on 11th January 2013 at http://forums.arm.com

    The processor supports up to 8MB L2 cache, but your SoC (OMAP4430) only has 1MB of L2 cache. This is pretty normal for Cortex-A9 SoCs.

    Before looking further into it'd be good if you can confirm that your timing really works right. There's no good reason to use either long doubles or tv_nsec, and with the former and printfs you're really asking for trouble from bad compiler implementations for the platform. It'd be good if you could confirm that the values output with the printfs for start and end time make sense. If I were you I'd just calculate the number of usec passed in an int.

    If you need to be sure try running a loop with a very long iteration so you can compare it with time elapsed on a watch.

    The whole thing does look suspiciously like some kind of data representation problem. The bottom digits of the number still change in a way that seems to follow the cache configuration.
Children
No data