Yes the two implementations of NEON are different, so I'd expect different performance numbers between the two cores.Can you give as an example of an algorithm you are trying, and how you are building it? The fact you see absolutely no performance difference is "suspicious" - I'd expect some difference, even if only small. Check you are not running the same binary 3 times - it seems like the obvious conclusion to three identical performance numbers =)