Support forums

Architectures and Processors forum NE10-Library -> FIR-Filter cycle counts: C-version faster than NEON-version?

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 4 replies
Subscribers 350 subscribers
Views 10317 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NE10-Library -> FIR-Filter cycle counts: C-version faster than NEON-version?

CFriebel over 10 years ago

Hi,

i'm currently trying to measure cycle counts for FIR-filtering with the NE10 library. I'm using a Raspberry Pi 2 with ARM Cortex-A7 running on Raspbian as a target.

I activated the Cortex-A7 performance counter register to read out the cycles before and after the filter-execution.

Now i tested both functions "ne10_fir_float_neon()" and "ne10_fir_float_c()" and expected the NEON-Assembly version to be faster than the C version.

To my surprise i seem to get better results with the plain C version. I checked with different Blocksizes and Filter-lengths but in all my tests the C-only version has a smaller cycle count.

For example using a blocksize of 128 and 21 filter-taps i get this results:

using ne10_fir_float_neon(): average of 10212 cycles which is ~3.8 cycles per sample per tap

using ne10_fir_float_c(): average of 8436 cycles which is ~3.1 cycles per sample per tap

Is there a reason why the NEON version is slower than the C version on the Cortex A-7 and could that be different on a different target, say Cortex A-9?

Or could there be something wrong with my measurements and the NEON version should always be faster? Or is it only faster for specific blocksizes and filter-lengths?

Or maybe i did something wrong and i have to activate NEON correctly?

I used "ne10_init()" and "ne10_HasNEON()" returns "NE10_OK". So this should be fine...

Thank you

Top replies

Matthew Du Puy over 10 years ago in reply to CFriebel +1 verified

I've added this to the Ne10 Issue Tracker, hopefully it'll get more attention there. I'll monitor both the issue and this thread and update: FIR-Filter cycle counts: C-version faster than NEON-version...