This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Float vs Neon (with O3)

Hi Experts,

Trying to port People Counting application on ZCU104 platform, where we want to Off load ML Part to FPGA and other Pre/Post processing modules wanted to use ARM CPU Cores. When we run the application we see that Pre/Post processing modules were taking lot of time. So we wanted to implement using Neon Intrinsics .

Here we see issue, when we compiled float and neon code with -O3 flag we see same latency numbers .

Can you please suggests any tips or how to analyse it further on this?

Thanks and Regards,

Raju

Parents

0 RajuK over 3 years ago in reply to Ben Clark

Hi Ben,

Thanks a lot for your time and reply on this .

I am looking into the optimisation link you have shared here and update you further on this.

how you're compiling?

g++ -O3 decode.cpp -o decode

Decode is a c++ floating point implementation where I have decode the bounding box.

Thanks and Regards,

Raju
Cancel
Up 0 Down

Cancel

Reply

0 RajuK over 3 years ago in reply to Ben Clark

Hi Ben,

Thanks a lot for your time and reply on this .

I am looking into the optimisation link you have shared here and update you further on this.

how you're compiling?

g++ -O3 decode.cpp -o decode

Decode is a c++ floating point implementation where I have decode the bounding box.

Thanks and Regards,

Raju
Cancel
Up 0 Down

Cancel

Children

0 Ben Clark over 3 years ago in reply to RajuK

But what are the results that are causing you concern? And what CPU are you targeting/testing on?

Have you got any code snippets to give us more of an idea about the example?
Cancel
Up 0 Down

Cancel