This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Arm Forge Map fails to build a call stack

Hello to the community,

I want to use Arm Forge Map that is installed on the cluster I work on. I've used Arm Forge DDT on this cluster, and I didn't see any bug or problem with DDT there.
Though, there seems to be problems with Arm Forge Map. I've asked the support there if they knew what was happening, they answered that they were aware of the problem, did not knew how to fix it, and for all I know they might be in contact with Arm via a support license. But it feels on my hand that they are not onto the problem with a great determination to solve it ^^'

So my last hope lies here...
Here is the problem :

Arm Forge Map fails to build a proper call stack.

When running Arm Forge Map, whether it might be from the graphical interface (on a dedicated node on the cluster and X11 forwarding, or with Arm Forge Client) or from the CLI with `map --profile ...`, everything runs smoothly and without raising any error or specific warning that seems related to building a proper call stack. And once Arm Forge Map has finished, it does generate a .map file that can be opened to read all the info about the job analyse by Arm Forge Map. And this is where the problem appears. It can read the sources files, it shows the main.cpp file in the code reader window, but beyond the main function, all it knows is a function called <unknown> in which the code, of course, spends 100% of its time. So you lose quite a lot of what Arm Forge Map has to offer :
_ selecting part of the execution time in the graphs on top does nothing
_ it does not show how much time is spent in a line of the left of the corresponding line in the code reader window

etc...

Here is a screenshot :

And here are some technical info :

_ it's Arm Forge 22.0.2 running on the cluster, the Arm Forge Client is 22.0.4.

_ the cluster runs on RHEL 8.8 (Ootpa)

_ the code is a C++ code, multithreaded with IntelTBB and distrubuted with IntelMPI.

_ A typical run would be : 2 MPI process per nodes having one dedicated CPU with 64 physical threads, and one dedicated RAM socket of 128Gb

If ever I can give more infos or answers questions, I'll be happy to. Thanks to anyone that stops here :)

Cheers,

Thibault