Are there any practical differences between the Arm M0 and M3 for the C programmer?

The Arm Cortex-M0 microcontroller supports a subset of the instructions provided by the Cortex M3. Presumably these extra instructions provide better performance for some applications.

But does this have any implications for a developer writing performance critical C code for either of these devices? Would you write performance critical code differently knowing that you were developing for one instruction set rather than the other?

For example,I would write quite different code if I was targeting an 8-bit PIC than I would for a 32-bit ARM. And if I knew that my target hardware didn't have a single-cycle multiply instruction, I might use bit-shifting rather than multiplying where possible (or only multiplying and dividing by powers of 2, so that the compiler can optimised it to bit shifting).

Does GCC even make good use of the extra instructions?