Is there any tricks to efficiently utilise the NEON feature in Cortex-A35. I believe the Cortex-A35 has in-order execution, so what is the correct ways to load and process data,