Hi Arm team,
we are exploring arm tools to profile our vectorized applications.
we used sample Neon program for matrix multiplication provided by arm at https://developer.arm.com/documentation/102467/0100/Example---matrix-multiplication
and compiled using the flags mentioned in above link. The program is properly vectorized when i disassembly using objdump.
in arm forge (as shown in below snapshot) it is giving "time taken for each neon intrinsic API"
unfortunately we am still not able to see the expected vectorized instructions details such as no of floating point operations etc.
how do we get this info in arm forge ?