This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

DS-5 OR RVDS, which one to use for profiling code

Note: This was originally posted on 13th June 2012 at http://forums.arm.com

I used RVDS in the past and it was great for profiling test code. I was able to see how many cycles each instruction takes, data hazards etc. I was very satisfied with it. BUT, RVDS was pretty buggy in that regard: there was no way to profile code that uses unaligned memory access (it just hangs, there was no reply from arm at all if there was a way to fix it), and profiling for neon code was non-existent (every neon instruction takes 1 cycle in RVDS profiler).

I tried to use DS5 trial and CE version and wasn't even able to figure out how to even do any profiling at all. Debugging ... I must be dreaming, it was world of pain to get anything working and I think it wasn't working properly (followed all kinds of guides, pinned at the top of the forum for example). It's nothing even close to experience that I had with RVDS: I had profiling results within 20 minutes after I registered for trial. If it matters, even for android my primary dev environment is VS2009 and I debug native code on windows mobile devices if I need to, all that clunky eclipse feels like ... **censored** :)


THE QUESTION:

should I keep on wasting time trying to get DS5 profiling working (I would like to be able to profile on emulator, or on real device), Or its Streamline will be useless for me: does it show the same detail as profiler that comes with RVDS or not? For some reason I think that Streamline is more like profiler that comes with XCode and iPhone sdk: it shows sampling usage of the full app but not opcode level profiling info like RVDS (e.g. I could see each instruction and how much cycles it took and any register waits if there were any).

If DS5 isn't good for that, maybe somebody can recommend me alternative solution? My main target is the android phone, although I build my code almost for all devices that run on ARM.
Basically, what's the best tool for profiling arm code? I would prefer some RTSMs so I could profile for different CPUs (like with RVDS), but if there is no good alternative I could as well buy any development board or anything that could provide me opcode level profiling info. Please advise anybody! Thanks

Ideally, I would like something similar to RVDS but fully working: 1) unaligned memory access fixed, preferably running some kind of OS so that 2) I could use files that I use for testing (I had 250MB input files that I passed for my test runs and in RVDS I had to embed all that data to final executable, which was really annoying compared to all platforms where I run my code and where I was able to use files one way or the other). 3) Normal neon profiling info, and not that 1cpi nonsense that RVDS profiler shows. Something similar to ARM Cortex-A8 cycle counter online tool
  • Note: This was originally posted on 13th June 2012 at http://forums.arm.com

    On top of that I'd like to add... maybe I'm complete retard, but I find that "Setting up an Android target from "ARM DS-5 Using ARM Streamline" is the dumbest ever guide. It reminds of my friends that call for help and then they tell me what they see on on their screen and talk to me as if I had their screen in front of my eyes.
    In the kernel configuration menu, use the arrow keys to navigate to the required submenu and press Enter


    WTF IS THAT BS?! Seriously, I'm trying to press arrow keys, but all I see is the web page moving. Where the hell am I supposed to press arrow keys??? That rediculous mentioning of the location of gator source... WTF IS THAT??? In older version it mentioned installdir/arm... now it's something else, but I still don't get, where the hell it's supposed to be! Is that instal dir of DS-5, right? What about ds5-ce then!
    To be able to use Streamline on any of devices on my desk (I have like 50 phones lying around), do I need to reflash phones and build android myself???!?!? Is that what that guide says??...
    I've never built android or any kernel modules, but it's strange to assume that somebody who simply wants to use profiler needs to only rebuild kernel and no freaking info, like it's a helloworld task that everybody knows by heart... No wonder there is no singly clue on the web how to set it up and use it and get any results from it... At least I'm not able to find anything at all!

    I'm very sorry, perhaps that guide missed to mention that mind reading class was a prerequisite.
  • Note: This was originally posted on 15th June 2012 at http://forums.arm.com

    Sam, Ellis, thank you very much for replies.


    You are developing applications for Android. That makes a difference


    I develop code that runs almost on all major mobile OSs (including other obscure targets like some set-top boxes etc). I know what parts of code take CPU (from sampling-based profiler). This way I know what I need to work on and I write simple test apps that take some test input files and run that CPU-intensive code on the data. This way I'm able to run that same test on every device including RVDS profiler. In RVDS I can see good stats about instructions and cycles. I know that cycle info isn't very correct, but it's more or less indicative for some parts of code. It is very useful to me at least. I remember a case where RVDS showed me some badly generated code with extreme register inter-dependencies in a very performance critical loop. I had to manually add temporary variables for intermediates and that gave me like 3-5% boost overall on entire encoder simply by changing c-code.


    The documentation does rather assume that you are familiar with  configuring and building a Linux kernel (this also applies to Android)


    I've built linux or bsd kernels, but I've never built android. I'm using windows workstation, so probably it's a world of pain to build android on windows. From documentation it wasn't absolutely clear if I need to rebuild kernel, rebuild entire image and re-flash a phone, or if I needed src simply to be able to build that required module so that I could add it to existing phone. Now I understand that I need full rebuild to get it running. Is that correct? Or I simply can build a kernel and copy it to device and boot my kernel instead of the original one.


    This requires a hardware target that is capable of generating trace  data, as well as a DSTREAM unit to allow the debugger to control the hardware


    Can you please give some suggestions on capable hw and what's the price of that DSTREAM unit (my guess it's like a few thousands, right?).


    Also, I have a question about RTSMs. I downloaded eval version of FastModels and built myself Cortex-a8 example model using VS2008. The model that I built and models that come with RVDS have something in common:
    I ABSOLUTELY can't find a way to load unaligned memory, e.g. this code will never work (it won't load unaligned int, and it won't load that using old-style unaligned load either):
    __asm main(){ ldr r0, [sp, #2] }

    First of all, when accessing unaligned memory with RTSMs it jumps to PC 0x00000010 and after that executes all these junk instructions showing millions of exceptions. Setting unaligned access and unaligned trap bits in cp15 doesn't make any difference. That seems like a bug with FastModels. I tried to run that unaligned access example in profiler (where it just starts showing millions of exceptions and nothing happens), and I tried to run that code in 2 available debuggers (one ghetto-looking debugger that comes with rvds and the other better one what comes with FastModels) and in these debuggers reading unaligned memory jumps to pc=0x10 or something like that.


    Actually, that problem made me look for alternatives to RVDS: I'm tired to write junk code to avoid that alignment issue only for RVDS profiler, also, I completely can't work with some code that actually needs to use unaligned access for performance reasons (basically, it's faster to read two 16-bit shorts from unaligned address and then use top and bottom parts instead of loading two registers and using them).


    There's also a possibly useful blog entry about using Streamline on a Galaxy Nexus http://www.linaro.or...ing-aosp-4-0-4/


    Thanks for the link, seems like it's the best way for me to get running on Galaxy Nexus. I'll try that when I have some free time.
    If it's not trivial to get Streamline on regular phone, what's the recommended HW to work with Streamline? I guess, it would be best if I could use some android phone for that so I could profile entire app instead of limited test apps. Is there any phone that has compatible gpu with streamline?
  • Note: This was originally posted on 15th June 2012 at http://forums.arm.com

    Isn't armcc should be the best tool to provide that kind of info? Internally, it needs to weigh alternative instruction sequences based on their execution speeds and also based on availability of dependent data that come from previous instructions. On top of that armcc does all that based on configured cpu or architecture.

    That would be nice to have some kind of switch so that it could add extra instruction analysis in generated asm listing, or process asm file and generate similar info. Here I put simple asm example armcc should be able to give similar info!
  • Note: This was originally posted on 18th June 2012 at http://forums.arm.com


    That would be nice to have some kind of switch so that it could add extra instruction analysis in generated asm listing, or process asm file and generate similar info.


    Armasm has a confugurable message about interlocks, see http://infocenter.arm.com/help/topic/com.arm.doc.dui0473g/CIAGIDIH.html
  • Note: This was originally posted on 18th June 2012 at http://forums.arm.com


    Now I understand that I need full rebuild to get it running. Is that correct? Or I simply can build a kernel and copy it to device and boot my kernel instead of the original one.
    ...
    If it's not  trivial to get Streamline on regular phone, what's the recommended HW to  work with Streamline? I guess, it would be best if I could use some  android phone for that so I could profile entire app instead of limited  test apps.


    It's definitely non-trivial.  It's possible that some production phone has a kernel that is correctly configured, but rebuilding and replacing the kernel with one correctly configured for gator (the target part of Streamline) is probably the only way to be sure.  Also building gator requires matching kernel headers that may be difficult to find for a production phone.  Installing/running gator requires root access. The details of how to copy the kernel, etc. to the phone will vary from phone to phone.

    It's almost certainly easier to use Android on some development board.  Linaro has Android for at least i.MX53, Pandaboard, Snowball and Origen.  There are probably many other boards with Android support that I'm not as familiar with.


    Is there any phone that has compatible gpu with streamline?


    They are not phones but all of those boards I mention above have a GPU.  With the correct drivers, Streamline can do GPU profiliing on Mali GPUs (for example, Snowball and Origen). [But I'm getting a bit "outside my area of expertise" here.  (That's something my father used to say when he didn't know what the hell he talking about.)]
  • Note: This was originally posted on 18th June 2012 at http://forums.arm.com


    Also, I have a question about RTSMs.[...]


    It's probably best to ask this in a separate thread (or ask support-sw@arm.com).  My first impression is that 0x10 is the Data Abort exception vector and the MMU can be configured to cause data aborts on unaligned accesses.