I am using Forge MAP to profiling my HPC app currently on several ARM hosts.
It worked well for one node. But for the cases with 2 or more nodes, or when assigning other nodes, the mpi job get stuck.
The same situation appears on the arm/forge/22.1.3/examples/wave_c, and the following are output logs:
----------------------
[root@softest001 examples]# map --profile mpirun -np 1 -hosts softest002 ./wave_c
Arm Forge 22.1.3 - Arm MAP
Topology file `/root/.allinea/session/topology-softest001-1-cefd8500' was not created within 30 seconds.
Topology file `/root/.allinea/session/topology-softest001-1-cefd8500' was not created within 1 minute.
MAP: The MPI processes are taking a long time to connect to Arm MAP (softest001:4242) during startup.
MAP:
MAP: Startup has timed out. Aborting.
MAP: Try starting the program at the command prompt then contact the Arm Forge support team for assistance.
MAP: Some processes may remain after aborting and need to be manually killed.
MAP: You can disable this timeout by setting the ALLINEA_NO_TIMEOUT environment variable before you launch Arm Forge.
-------------------------
I have modified the system.conf following the solutions in the user guide about "No shared home directory", and it didn't work.
Does anyone get a clue?
Hi AllanC,
Can you please generate a debug log by adding --debug --log log.xml to the command line and send that to the support-hpc-sw@arm.com email? Many thanks :D
The log file is sent. Thanks.