Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Arm NEON not able to understand the cycles?
Jump...
Cancel
Locked
Locked
Replies
9 replies
Subscribers
119 subscribers
Views
6524 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Arm NEON not able to understand the cycles?
wolfrum aurum
over 12 years ago
Note: This was originally posted on 25th March 2013 at
http://forums.arm.com
I am working on optimizing the code for FFT algorithm using NEON of ARM. I am running Beagle Board xM as target. I am running my program without any operating system on the board(Running program directly on the board). The board is supposed to be run at 1Ghz, I am not where operating near to that frequency. Currently I am facing difficulties regarding basic understanding of NEON. Anyone please help me with the things.
The following are sample programs I ran. LOOP CODE:
Loop Unrolled code:
The following are the results I ran for different frequencies
[size=2]T [/size]
[font="Arial,"][font="Arial,"]The above does not make any sense, Different cycles per instructions at different frequencies.?[/font][/font]
Parents
Gilead Kutnick
over 12 years ago
Note: This was originally posted on 29th March 2013 at
http://forums.arm.com
As far as the whole configuration for the DM3730 goes I don't have any real experience with it and I don't think you'll get a lot of help here.. maybe you should ask on TI's forums? For instance here:
http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/537.aspx
You could also try the BeagleBoard newsgroup
http://beagleboard.org/discuss
I think from what you've said that it's clear at least that the block labeled local interconnect running on ARM_FCLK isn't connected to L3. That you have to set the two separate PLLs correctly proves that they're not on the same clock domain. You can happen to set it to a value that scales like you want because you're using such low CPU clock speeds, but if you want to run the CPU at 1GHz you won't be able to run L3 at half the clock rate.
Still not really sure why the performance seems to suggest your data isn't going through L2 cache. Maybe the page tables aren't setup to allow this for the internal SRAM. That makes sense since it's supposed to be shared, but it doesn't make sense that it'd still be cached in L1, which is what appears to be the case.
When I mentioned L2 cache in lockdown I'm referring to this feature:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/Chdeghcb.html
If you use L2 in lockdown you can treat it kind of like a scratchpad memory, but it still needs to be backed by some real RAM. Anyway, since you've confirmed you aren't doing this it isn't really important.
Cancel
Vote up
0
Vote down
Cancel
Reply
Gilead Kutnick
over 12 years ago
Note: This was originally posted on 29th March 2013 at
http://forums.arm.com
As far as the whole configuration for the DM3730 goes I don't have any real experience with it and I don't think you'll get a lot of help here.. maybe you should ask on TI's forums? For instance here:
http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/537.aspx
You could also try the BeagleBoard newsgroup
http://beagleboard.org/discuss
I think from what you've said that it's clear at least that the block labeled local interconnect running on ARM_FCLK isn't connected to L3. That you have to set the two separate PLLs correctly proves that they're not on the same clock domain. You can happen to set it to a value that scales like you want because you're using such low CPU clock speeds, but if you want to run the CPU at 1GHz you won't be able to run L3 at half the clock rate.
Still not really sure why the performance seems to suggest your data isn't going through L2 cache. Maybe the page tables aren't setup to allow this for the internal SRAM. That makes sense since it's supposed to be shared, but it doesn't make sense that it'd still be cached in L1, which is what appears to be the case.
When I mentioned L2 cache in lockdown I'm referring to this feature:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/Chdeghcb.html
If you use L2 in lockdown you can treat it kind of like a scratchpad memory, but it still needs to be backed by some real RAM. Anyway, since you've confirmed you aren't doing this it isn't really important.
Cancel
Vote up
0
Vote down
Cancel
Children
No data