When I query the binary, I really get a binary and nothing human readable. I was expecting to see the generated assembly code like how Nvidia returns it. It's really difficult to write a maxFLOPS test without seeing this assembly. Moreover the Midgard architecture is a mixmatch between old school VLIW and scalar so I never know whether scalar or vector MULs are being generated from my code.
Honestly, I wasn't expecting to see any human readable binary from either of the vendors especially Nvidia. But still, PTX code isn't really useful for my purposes.
I agree with your sentiments that it is very useful to have a tool like AMD's Shaderanalyzer. I was able to achieve close to peak FP perf in my matrix multiplication code using that tool. Without that it's like trying to throw a coin from top of a pond into a bucket down below. There's a lot of guess work going on.