FVP for instruction count assessment

Dear Arm forum,

If I want to profile a program which will land on Arm chip (like cortex A78), how far can I go if I have a x86 pc to simulation (i.e. w/o hardware)?

I know it will be cycle inaccurate, but how about instruction count?

What I expect is a tool that can report instruction count for each function in the program and maybe structure it as a tree diagram.

Can I use streamline or some kind of arm tool for this purpose?

If so, can you provide some documentation and some simple guide?

PS. actually I've open a support case #00386877 and try to use Fastline for instruction count assess, but I did not get my answer so far.

Thanks for your help.

