arm compiler v5 code size compared to v6

I was under the impression that arm compiler v6 was meant to produce smaller and more efficient code than v6 compiler.  However, this does not seem to be the case for Cortex M4.

Arm Compiler V5 with O2 optimisation (without Otime) and cross module optimisation produces smaller code then Arm Compiler V6, regardless of which V6 optimisation level is used.  I am not using the V6 link time optimisation as this seems to generate even larger code, causing my link to fail due to lack of space,

The only V6 optimisation level that comes close is Oz but even that produces 4729K of ROM code compared to 4696K.

Does anyone have a similar experience or a solution - I am thinking of going back to V5 as I don't see any advantage in upgrading to V6.