how to get vectorized instruction profiling info in arm forge tool

Hi Arm team,

we are exploring arm tools to profile our vectorized applications.

we used sample Neon program for matrix multiplication  provided by arm at

and compiled using the flags mentioned in above link. The program is properly vectorized when i disassembly using objdump.  

in arm forge (as shown in below snapshot) it is giving "time taken for each neon intrinsic API" 

unfortunately we am still not able to see the expected vectorized instructions details such as no of floating point operations etc.

how do we get this info in arm forge ?