This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

DS-5 OR RVDS, which one to use for profiling code

Note: This was originally posted on 13th June 2012 at http://forums.arm.com

I used RVDS in the past and it was great for profiling test code. I was able to see how many cycles each instruction takes, data hazards etc. I was very satisfied with it. BUT, RVDS was pretty buggy in that regard: there was no way to profile code that uses unaligned memory access (it just hangs, there was no reply from arm at all if there was a way to fix it), and profiling for neon code was non-existent (every neon instruction takes 1 cycle in RVDS profiler).

I tried to use DS5 trial and CE version and wasn't even able to figure out how to even do any profiling at all. Debugging ... I must be dreaming, it was world of pain to get anything working and I think it wasn't working properly (followed all kinds of guides, pinned at the top of the forum for example). It's nothing even close to experience that I had with RVDS: I had profiling results within 20 minutes after I registered for trial. If it matters, even for android my primary dev environment is VS2009 and I debug native code on windows mobile devices if I need to, all that clunky eclipse feels like ... **censored** :)


THE QUESTION:

should I keep on wasting time trying to get DS5 profiling working (I would like to be able to profile on emulator, or on real device), Or its Streamline will be useless for me: does it show the same detail as profiler that comes with RVDS or not? For some reason I think that Streamline is more like profiler that comes with XCode and iPhone sdk: it shows sampling usage of the full app but not opcode level profiling info like RVDS (e.g. I could see each instruction and how much cycles it took and any register waits if there were any).

If DS5 isn't good for that, maybe somebody can recommend me alternative solution? My main target is the android phone, although I build my code almost for all devices that run on ARM.
Basically, what's the best tool for profiling arm code? I would prefer some RTSMs so I could profile for different CPUs (like with RVDS), but if there is no good alternative I could as well buy any development board or anything that could provide me opcode level profiling info. Please advise anybody! Thanks

Ideally, I would like something similar to RVDS but fully working: 1) unaligned memory access fixed, preferably running some kind of OS so that 2) I could use files that I use for testing (I had 250MB input files that I passed for my test runs and in RVDS I had to embed all that data to final executable, which was really annoying compared to all platforms where I run my code and where I was able to use files one way or the other). 3) Normal neon profiling info, and not that 1cpi nonsense that RVDS profiler shows. Something similar to ARM Cortex-A8 cycle counter online tool
Parents
  • Note: This was originally posted on 15th June 2012 at http://forums.arm.com

    Sam, Ellis, thank you very much for replies.


    You are developing applications for Android. That makes a difference


    I develop code that runs almost on all major mobile OSs (including other obscure targets like some set-top boxes etc). I know what parts of code take CPU (from sampling-based profiler). This way I know what I need to work on and I write simple test apps that take some test input files and run that CPU-intensive code on the data. This way I'm able to run that same test on every device including RVDS profiler. In RVDS I can see good stats about instructions and cycles. I know that cycle info isn't very correct, but it's more or less indicative for some parts of code. It is very useful to me at least. I remember a case where RVDS showed me some badly generated code with extreme register inter-dependencies in a very performance critical loop. I had to manually add temporary variables for intermediates and that gave me like 3-5% boost overall on entire encoder simply by changing c-code.


    The documentation does rather assume that you are familiar with  configuring and building a Linux kernel (this also applies to Android)


    I've built linux or bsd kernels, but I've never built android. I'm using windows workstation, so probably it's a world of pain to build android on windows. From documentation it wasn't absolutely clear if I need to rebuild kernel, rebuild entire image and re-flash a phone, or if I needed src simply to be able to build that required module so that I could add it to existing phone. Now I understand that I need full rebuild to get it running. Is that correct? Or I simply can build a kernel and copy it to device and boot my kernel instead of the original one.


    This requires a hardware target that is capable of generating trace  data, as well as a DSTREAM unit to allow the debugger to control the hardware


    Can you please give some suggestions on capable hw and what's the price of that DSTREAM unit (my guess it's like a few thousands, right?).


    Also, I have a question about RTSMs. I downloaded eval version of FastModels and built myself Cortex-a8 example model using VS2008. The model that I built and models that come with RVDS have something in common:
    I ABSOLUTELY can't find a way to load unaligned memory, e.g. this code will never work (it won't load unaligned int, and it won't load that using old-style unaligned load either):
    __asm main(){ ldr r0, [sp, #2] }

    First of all, when accessing unaligned memory with RTSMs it jumps to PC 0x00000010 and after that executes all these junk instructions showing millions of exceptions. Setting unaligned access and unaligned trap bits in cp15 doesn't make any difference. That seems like a bug with FastModels. I tried to run that unaligned access example in profiler (where it just starts showing millions of exceptions and nothing happens), and I tried to run that code in 2 available debuggers (one ghetto-looking debugger that comes with rvds and the other better one what comes with FastModels) and in these debuggers reading unaligned memory jumps to pc=0x10 or something like that.


    Actually, that problem made me look for alternatives to RVDS: I'm tired to write junk code to avoid that alignment issue only for RVDS profiler, also, I completely can't work with some code that actually needs to use unaligned access for performance reasons (basically, it's faster to read two 16-bit shorts from unaligned address and then use top and bottom parts instead of loading two registers and using them).


    There's also a possibly useful blog entry about using Streamline on a Galaxy Nexus http://www.linaro.or...ing-aosp-4-0-4/


    Thanks for the link, seems like it's the best way for me to get running on Galaxy Nexus. I'll try that when I have some free time.
    If it's not trivial to get Streamline on regular phone, what's the recommended HW to work with Streamline? I guess, it would be best if I could use some android phone for that so I could profile entire app instead of limited test apps. Is there any phone that has compatible gpu with streamline?
Reply
  • Note: This was originally posted on 15th June 2012 at http://forums.arm.com

    Sam, Ellis, thank you very much for replies.


    You are developing applications for Android. That makes a difference


    I develop code that runs almost on all major mobile OSs (including other obscure targets like some set-top boxes etc). I know what parts of code take CPU (from sampling-based profiler). This way I know what I need to work on and I write simple test apps that take some test input files and run that CPU-intensive code on the data. This way I'm able to run that same test on every device including RVDS profiler. In RVDS I can see good stats about instructions and cycles. I know that cycle info isn't very correct, but it's more or less indicative for some parts of code. It is very useful to me at least. I remember a case where RVDS showed me some badly generated code with extreme register inter-dependencies in a very performance critical loop. I had to manually add temporary variables for intermediates and that gave me like 3-5% boost overall on entire encoder simply by changing c-code.


    The documentation does rather assume that you are familiar with  configuring and building a Linux kernel (this also applies to Android)


    I've built linux or bsd kernels, but I've never built android. I'm using windows workstation, so probably it's a world of pain to build android on windows. From documentation it wasn't absolutely clear if I need to rebuild kernel, rebuild entire image and re-flash a phone, or if I needed src simply to be able to build that required module so that I could add it to existing phone. Now I understand that I need full rebuild to get it running. Is that correct? Or I simply can build a kernel and copy it to device and boot my kernel instead of the original one.


    This requires a hardware target that is capable of generating trace  data, as well as a DSTREAM unit to allow the debugger to control the hardware


    Can you please give some suggestions on capable hw and what's the price of that DSTREAM unit (my guess it's like a few thousands, right?).


    Also, I have a question about RTSMs. I downloaded eval version of FastModels and built myself Cortex-a8 example model using VS2008. The model that I built and models that come with RVDS have something in common:
    I ABSOLUTELY can't find a way to load unaligned memory, e.g. this code will never work (it won't load unaligned int, and it won't load that using old-style unaligned load either):
    __asm main(){ ldr r0, [sp, #2] }

    First of all, when accessing unaligned memory with RTSMs it jumps to PC 0x00000010 and after that executes all these junk instructions showing millions of exceptions. Setting unaligned access and unaligned trap bits in cp15 doesn't make any difference. That seems like a bug with FastModels. I tried to run that unaligned access example in profiler (where it just starts showing millions of exceptions and nothing happens), and I tried to run that code in 2 available debuggers (one ghetto-looking debugger that comes with rvds and the other better one what comes with FastModels) and in these debuggers reading unaligned memory jumps to pc=0x10 or something like that.


    Actually, that problem made me look for alternatives to RVDS: I'm tired to write junk code to avoid that alignment issue only for RVDS profiler, also, I completely can't work with some code that actually needs to use unaligned access for performance reasons (basically, it's faster to read two 16-bit shorts from unaligned address and then use top and bottom parts instead of loading two registers and using them).


    There's also a possibly useful blog entry about using Streamline on a Galaxy Nexus http://www.linaro.or...ing-aosp-4-0-4/


    Thanks for the link, seems like it's the best way for me to get running on Galaxy Nexus. I'll try that when I have some free time.
    If it's not trivial to get Streamline on regular phone, what's the recommended HW to work with Streamline? I guess, it would be best if I could use some android phone for that so I could profile entire app instead of limited test apps. Is there any phone that has compatible gpu with streamline?
Children
No data