This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Top-Down microarchitectural analysis

Hi,

I was wondering if there are any documentations on how to analyze the ARM pipeline. I have access to thunderx2 nodes, and i'd like to make bottleneck analysis like can be done on intel chips. i can get the formulas to get compute the different metrics for a skylake here https://github.com/andikleen/pmu-tools/blob/master/skl_client_ratios.py. i checked the regular arm docs, ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile, and Programmer’s Guide for ARMv8-A and i did not find any information.

thanks,

0 Patrick Wohlschlegel over 3 years ago

Hello there,

Thanks for your query. I would be curious to learn a bit more about your application to make sure I point you in the right direction (for instance the programming language you use, the type of application and paradigms you rely on).

If you are working with C/C++, Fortran or Python codes, Arm Forge (and in particular Arm MAP) may be suitable for you. We did a webinar recently about this very topic. I encourage you to have a look here: https://www.brighttalk.com/webcast/17792/384060/top-down-performance-analysis

If you would like to give it a go, you can download a trial version of Forge on this page: https://developer.arm.com/tools-and-software/server-and-hpc/trials

I hope this helps. Let me know if I can be of further assistance.

Patrick
Cancel
Up 0 Down

Cancel
0 YHuerta over 3 years ago in reply to Patrick Wohlschlegel

thanks for the links. i'll definitely take a look. i have a number of openmp benchmarks and i'm trying to understand bottlenecks on the thunderx2. i know my way around perf, so i am interested in trying to figure out which perf events to focus on. i'll take a look at the trial version

yectli
Cancel
Up 0 Down

Cancel
0 YHuerta over 3 years ago in reply to YHuerta

saw the video. thanks. i was hoping to find a resource like this,

https://github.com/torvalds/linux/tree/master/tools/perf/pmu-events/arch/x86

here you can get a breakdown and information on how the top-down is computed. i like to understand what is going on. when i check the aarch64, it is empty

https://github.com/torvalds/linux/tree/master/tools/perf/pmu-events/arch/arm64/cavium/thunderx2

is there a white paper, document or something url where i could get more info on the thunderx2 top-down approach? i'm trying to learn more about the pipeline etc

thanks

yectli
Cancel
Up 0 Down

Cancel
0 YHuerta over 3 years ago in reply to Patrick Wohlschlegel

the application is a set of benchmarks, SPEC OMP2012. the goal is to understand the underlying architecture, bottlenecks, etc when using different computational kernel
Cancel
Up 0 Down

Cancel
0 Patrick Wohlschlegel over 3 years ago in reply to YHuerta

Hi Yectli,

Thanks for clarifying. This is an interesting query and I do not believe such a document exists today. I have forwarded your request to my colleague Florent who created the webinar and is one of our experts on the subject, we will do our best to assist you.

Best regards,

Patrick
Cancel
Up 0 Down

Cancel
0 YHuerta over 3 years ago in reply to Patrick Wohlschlegel

if you share the formulas in this forum, or on some arm document, people can write the url in the reference portion of the white paper or other publications. that way people can share knowledge and make it easier to profile on aarch64. that is a win for everyone and ARM too. formulas for the main categories and subcategories. people want to see why their code is core bound, or memory bound. whether it is l1bound, l2 bound, external memory bound, etc.

thanks
Cancel
Up 0 Down

Cancel