This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Slow performance on samsung S3C6410

Note: This was originally posted on 18th January 2011 at http://forums.arm.com

Hi,

I'am a software developer and I am trying to port our product to new device. This is Windows CE 6 device with S3C6410 (ARM1176JZF-S) CPU.  The problem is that Q-Bench benchmarks show that this is very fast system but after executing our application it is actually very slow.

I have spend a lot of time profiling various parts of our product, but it shows nothing. Finally what I have found out is that the problem is with the huge code amount. Actually our .exe is ~10MB in size. I have made tests in which I have auto generated huge amounts of code (~200,000 lines of c++ code, VS2005 compiled), and now executing this exe (~1.5MB) on this device shows significant slow down, 8 - 10 times comparing it to other devices (with slower CPUs). This auto generated code does nothing with data, it just executes lots of functions which just increment some variables.

My question is what is the source of problem? From What I know this CPU has  16 KiB instruction cache. Can it be somehow badly configured? I actually have no contact with this device manufacturer. I can only give some hints to its reseler to maybe push information further.

some more info:
Q-Bench Pro - shows that Cache Line == 8, while on other devices it is 32
CeGetCacheInfo - gives below results:
dwL1Flags=0
dwL1ICacheSize=16384
dwL1ICacheLineSize=32
dwL1ICacheNumWays=4
dwL1DCacheSize=16384
dwL1DCacheLineSize=32
dwL1DCacheNumWays=4
dwL2Flags=0
dwL2ICacheSize=0
dwL2ICacheLineSize=0
dwL2ICacheNumWays=0
dwL2DCacheSize=0
dwL2DCacheLineSize=0
dwL2DCacheNumWays=0

Thank You for any help
Martin
Parents
  • Note: This was originally posted on 21st January 2011 at http://forums.arm.com

    Do you have any quantitative benchmarks for your application which rate it on these three platforms?

    There are two numbers in your QBench results which are of interest - although I don't know the guts of the benchmark or your application, so these are educated guesses.

    The first two devices have similar CPU to memory performance ratios, which means that hopefully performance scales with frequency across the two platforms (if you have a faster CPU you need the memory system to speed up too). The for S3C6410 the CPU number is almost as high as the at550, but the memory rating is only just over half of the score of the at550.  If your application is essentially a memory bound problem because it isn't caching well you should be seeing just over half the performance - you simply do not get the advantage of the faster CPU speed because the memory is your bottleneck.

    Last point would be that the file I/O performance of the S3C6410 integration is dire compared to the first two platforms. If your benchmark has to spend time loading code or data from file then that is obviously really not helping - again, the CPU is going to just sit idle if it cannot get data fast enough. If you do use file i/o then you may want to try and remove that from your application (or load from a ram drive if it fits) to remove that as a possible cause.

    Iso
Reply
  • Note: This was originally posted on 21st January 2011 at http://forums.arm.com

    Do you have any quantitative benchmarks for your application which rate it on these three platforms?

    There are two numbers in your QBench results which are of interest - although I don't know the guts of the benchmark or your application, so these are educated guesses.

    The first two devices have similar CPU to memory performance ratios, which means that hopefully performance scales with frequency across the two platforms (if you have a faster CPU you need the memory system to speed up too). The for S3C6410 the CPU number is almost as high as the at550, but the memory rating is only just over half of the score of the at550.  If your application is essentially a memory bound problem because it isn't caching well you should be seeing just over half the performance - you simply do not get the advantage of the faster CPU speed because the memory is your bottleneck.

    Last point would be that the file I/O performance of the S3C6410 integration is dire compared to the first two platforms. If your benchmark has to spend time loading code or data from file then that is obviously really not helping - again, the CPU is going to just sit idle if it cannot get data fast enough. If you do use file i/o then you may want to try and remove that from your application (or load from a ram drive if it fits) to remove that as a possible cause.

    Iso
Children
No data