This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Forge MAP startup time out for multi-nodes or remote host run.

I am using Forge MAP to profiling my HPC app currently on several ARM hosts.

It worked well for one node. But for the cases with 2 or more nodes, or when assigning other nodes, the mpi job get stuck.

The same situation appears on the arm/forge/22.1.3/examples/wave_c, and the following are output logs:

----------------------

[root@softest001 examples]# map --profile mpirun -np 1 -hosts softest002 ./wave_c

Arm Forge 22.1.3 - Arm MAP

Topology file `/root/.allinea/session/topology-softest001-1-cefd8500' was not created within 30 seconds.

Topology file `/root/.allinea/session/topology-softest001-1-cefd8500' was not created within 1 minute.

MAP: The MPI processes are taking a long time to connect to Arm MAP (softest001:4242) during startup.

MAP:

MAP: Startup has timed out. Aborting.

MAP:

MAP: Try starting the program at the command prompt then contact the Arm Forge support team for assistance.

MAP:

MAP: Some processes may remain after aborting and need to be manually killed.

MAP:

MAP: You can disable this timeout by setting the ALLINEA_NO_TIMEOUT environment variable before you launch Arm Forge.

-------------------------

I have modified the system.conf following the solutions in the user guide about "No shared home directory", and it didn't work.

Does anyone get a clue?