<Code> #define BITS_TO_ROTATE 2 b=(a>>(8-BITS_TO_ROTATE))+(a<<BITS_TO_ROTATE);
After lots of research and experimentation we have determined that this is often the best way as it gives the compiler a chance of recognising it as a rotate and therefore allowing it to generate a RRC/RLC or better instruction.
You could also use inline ASM if the optimiser doesn't produce the correct instructions, which would guarantee each step of the rotate to execute in no more than a single cycle.