Especially since my optimised C code is structured to do the exact same thing as what my ARM assembly code does, but obviously the C compiler didn't agree!