This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RVDS profiler for arm and exceptions, broken?

Note: This was originally posted on 24th March 2011 at http://forums.arm.com

Hello all,
I was advised by a co-worker to try arm's RVDS for profiling my code. I followed tutorial to load profiler sample project for xvid that comes with RVDS. Then, I created my own project that uses my code and I compile it and run to be able to see which functions needs optimization on ARM cpu.
However, I have impression that this entire thing is very buggy and doesn't work properly. The moment I try to enable optimiations (-O3 -Otime) it doesn't work at all. This code works properly on real device, so I know it works, but in RVDS emulator it does not work.
In short, when I compile with -O3 for example, then when I run profiler it works properly:
here's the output I get:
Attaching the profiler trace to component RTSM_EB_Cortex_A8.coretile.core.
Simulation is started
Loading symbols from amr-wb-fixed-test.axf...
Finished loading symbols.
Transferring target image
Enabled streaming trace.
ARM Profiler WARNING: Call chain max depth (511) exceeded. Call chain will be incorrect.
200 frames processed, time: 4.930
...

It runs fine, but at some point the green instruction graph drops and the exception graph goes up and the progress bar starts to increase fast with orange color.
and this is the pic when the instruction graph drops and the exception graph goes up [removed].

That same build runs without any issues with the debugger, but from profiler it has this problem.
Overall, it looks quite broken: I use limited amount of std libc: malloc, free, string.h functions and mem* functions. All of them resulted in exactly the same problem with exception coutner, so eventually I had to rewrite (provide my own implementations) of memcmp, memcpy etc, otherwise I wasn't able to use profiler at all! I have handcoded neon code and I wanted to profile the code to see what kind of improvement it gives me, but simply by including that asm file into the build makes the same problem with exception graph and it take TOO much time to run anything and profiler shows nonexistent functions at the end. The asm file uses gas for compilation, I changed it to arm's syntax, but it didn't help: the problem happens even if I do not call that asm optimized function, simply by including the object file into linker's list generated broken build.

Anybody can explain me what I do wrong, or the armcc is so broken and unusable. That's the impression I got.


PS> I downloaded latest compiler update, in release notes it mentions that there is a problem fixed that -O3 -Otime could result in invalid code... I was enthusiastic, thinking that I finally will be able to run my code, but it didn't help all. This code works on real devices, I build it with GCC and with MS's compiler for WinCE.
Here's the options that I pass to the compiler: CFG = --arm --cpu=6K -Otime -g, i tried also amt7-a, cortex-a8 etc, all have the same problem.
Parents
  • Note: This was originally posted on 25th March 2011 at http://forums.arm.com


    It's quite possible for invalid code to work on one target or with one compiler or at one optimization level and not work on another target or another compiler or another optimization level.  It's also quite possible that code is fine and you've encountered problems with the tools.

    I agree, but I'm quite confident that there is no problem with the code, it's production code taken from android, it doesn't use any quirks, it's pretty standard c code.



    I think the "ARM Profiler WARNING" can be ignored for the time being; it's just telling you that the call chain information will be incomplete.  I don't think it's having any effect on the execution  You may be able to avoid the warning by disabliing data compression at link-time (I vaguely remember data decompression bogusly causing this warning).


    Yes, you are right. I'm not worried about the warning. If I use linker manually I can pass "--datacompressor off " to disable this compression warning.



    Do you mean that you can run (without profiling) the same image on the RTSM with no problem but when you try to profile it on the RTSM (without rebuilding) it has the exceptions problem?


    Yes, that's exactly what I mean. I used the same binary and it runs from the debugger and produces correct output at the end. It's most likely some sort of bug of the emulator.



    Unfortunately I don't think it's possible to debug during the profile run to find out what is causing the exception.  It might be worthwhile trying to include exception handlers that emit a message to find out which exception is happening and what the registers are (especially R14) when it happens.


    I don't know how to trap this exceptions and how to print any output about them. Is there any simple example so that I could try it?



      What happened when you used the versions of memcmp, etc. that are supplied with the tools?


    I had identical problem: exception graph would go high from start and the instruction counter would go low, the program would never complete. I wasn't sure what was the source of problem, but when I replaced all calls to memcmp and others, the problem was gone. If I simply plug in original function call in any place, the problem will show up.



       Do you mean you can build it with gcc on run it sucessfully on a real ARM target?
    Have you tried running and/or profiling the gcc-built version on the RTSM?
    Are you using your version of memcmp, etc. with the gcc-built version?


    I build mainly for Windows CE, simply because i really like MS tools. Their arm compiler is old, but it provides quite good performance in my benchmarks. I use cegcc to compile for WinCE as well (cegcc allows inline asm, whereas ms compiler does not other than limited number of some intrinsics) and when I use cegcc everything works fine without changing std functions off course.


        The profiler may be able to help (after the exceptions problem is resolved) but the RTSM does not accurately model the timing of NEON instructions and memory access (and it is only approximate for others).  It is accurate about the number of times a function/instruction was executed.


    I don't see the -O3 -- is it somewhere else?

    Have you contacted ARM support (sw-support@arm.com) about this problem?



    I'm sure that this code does not have any bugs (buffer overruns, reading random memory etc) because I used it the same code in different operating systems for quite some time. I never had any problem except profiler from RVDS. This is a voice codec library (amr-wb) and the test program that I run basically runs some test vectors to verify that output is binary exact. You don't see -O3 in the screenshot simply because I tried different settings.
    Simply by changing compiler settings I can make the build not runnable from profiler. It does nto work if I use cortex-a8 as CPU target, but it works if I use --cpu=7-A.
    The code doesn't actually use any NEON at all (it's plain c), I have some neon asm, but simply by including these files into the build (even without referencing these neon functions) the resulting binary become non-runable in the profiler.
    I did not contact arm about this issue. I simply downloaded the trial to try their tools and I posted about my experience here.
Reply
  • Note: This was originally posted on 25th March 2011 at http://forums.arm.com


    It's quite possible for invalid code to work on one target or with one compiler or at one optimization level and not work on another target or another compiler or another optimization level.  It's also quite possible that code is fine and you've encountered problems with the tools.

    I agree, but I'm quite confident that there is no problem with the code, it's production code taken from android, it doesn't use any quirks, it's pretty standard c code.



    I think the "ARM Profiler WARNING" can be ignored for the time being; it's just telling you that the call chain information will be incomplete.  I don't think it's having any effect on the execution  You may be able to avoid the warning by disabliing data compression at link-time (I vaguely remember data decompression bogusly causing this warning).


    Yes, you are right. I'm not worried about the warning. If I use linker manually I can pass "--datacompressor off " to disable this compression warning.



    Do you mean that you can run (without profiling) the same image on the RTSM with no problem but when you try to profile it on the RTSM (without rebuilding) it has the exceptions problem?


    Yes, that's exactly what I mean. I used the same binary and it runs from the debugger and produces correct output at the end. It's most likely some sort of bug of the emulator.



    Unfortunately I don't think it's possible to debug during the profile run to find out what is causing the exception.  It might be worthwhile trying to include exception handlers that emit a message to find out which exception is happening and what the registers are (especially R14) when it happens.


    I don't know how to trap this exceptions and how to print any output about them. Is there any simple example so that I could try it?



      What happened when you used the versions of memcmp, etc. that are supplied with the tools?


    I had identical problem: exception graph would go high from start and the instruction counter would go low, the program would never complete. I wasn't sure what was the source of problem, but when I replaced all calls to memcmp and others, the problem was gone. If I simply plug in original function call in any place, the problem will show up.



       Do you mean you can build it with gcc on run it sucessfully on a real ARM target?
    Have you tried running and/or profiling the gcc-built version on the RTSM?
    Are you using your version of memcmp, etc. with the gcc-built version?


    I build mainly for Windows CE, simply because i really like MS tools. Their arm compiler is old, but it provides quite good performance in my benchmarks. I use cegcc to compile for WinCE as well (cegcc allows inline asm, whereas ms compiler does not other than limited number of some intrinsics) and when I use cegcc everything works fine without changing std functions off course.


        The profiler may be able to help (after the exceptions problem is resolved) but the RTSM does not accurately model the timing of NEON instructions and memory access (and it is only approximate for others).  It is accurate about the number of times a function/instruction was executed.


    I don't see the -O3 -- is it somewhere else?

    Have you contacted ARM support (sw-support@arm.com) about this problem?



    I'm sure that this code does not have any bugs (buffer overruns, reading random memory etc) because I used it the same code in different operating systems for quite some time. I never had any problem except profiler from RVDS. This is a voice codec library (amr-wb) and the test program that I run basically runs some test vectors to verify that output is binary exact. You don't see -O3 in the screenshot simply because I tried different settings.
    Simply by changing compiler settings I can make the build not runnable from profiler. It does nto work if I use cortex-a8 as CPU target, but it works if I use --cpu=7-A.
    The code doesn't actually use any NEON at all (it's plain c), I have some neon asm, but simply by including these files into the build (even without referencing these neon functions) the resulting binary become non-runable in the profiler.
    I did not contact arm about this issue. I simply downloaded the trial to try their tools and I posted about my experience here.
Children
No data