This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Float vs Neon (with O3)

Hi Experts,

Trying to port People Counting application on ZCU104 platform, where we want to Off load ML Part to FPGA and other Pre/Post processing modules wanted to use ARM CPU Cores. When we run the application we see that Pre/Post processing modules were taking lot of time. So we wanted to implement using Neon Intrinsics  .

Here we see issue, when we compiled float and neon code with -O3 flag we see same latency numbers .

Can you please suggests any tips or how to analyse it further on this?

Thanks and Regards,

Raju 

Parents
  • Hi Ben,

    Thanks a lot for your time and reply on this . 

    I am looking into the optimisation link you have shared here and update you further on this.

    how you're compiling?

    g++ -O3 decode.cpp -o decode 

    Decode is a c++ floating point implementation where I have decode the bounding box.

    Thanks and Regards,

    Raju

Reply
  • Hi Ben,

    Thanks a lot for your time and reply on this . 

    I am looking into the optimisation link you have shared here and update you further on this.

    how you're compiling?

    g++ -O3 decode.cpp -o decode 

    Decode is a c++ floating point implementation where I have decode the bounding box.

    Thanks and Regards,

    Raju

Children