The implementation-specific event 0x68 (Instructions coming out of the core renaming stage) is probably the closest that you are going to get on Cortex-A9 I think. This counts all instructions as they are queued in the out-of-order pipeline, so will count instructions which are speculatively issued and then thrown away because of a branch or an exception. It is therefore going to give a high count, but it will be an upper bound at least ...
I'm pretty sure that event 0x08 is the instruction counter. So, if this event is not implemented does anybody know how I can get a reasonably accurate instruction count?