We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi Arm team,
we are exploring arm tools to profile our vectorized applications.
we used sample Neon program for matrix multiplication provided by arm at https://developer.arm.com/documentation/102467/0100/Example---matrix-multiplication
and compiled using the flags mentioned in above link. The program is properly vectorized when i disassembly using objdump.
in arm forge (as shown in below snapshot) it is giving "time taken for each neon intrinsic API"
unfortunately we am still not able to see the expected vectorized instructions details such as no of floating point operations etc.
how do we get this info in arm forge ?
Regards
Manjunath