We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am using Forge MAP to profiling my HPC app currently on several ARM hosts.
It worked well for one node. But for the cases with 2 or more nodes, or when assigning other nodes, the mpi job get stuck.
The same situation appears on the arm/forge/22.1.3/examples/wave_c, and the following are output logs:
----------------------
[root@softest001 examples]# map --profile mpirun -np 1 -hosts softest002 ./wave_c
Arm Forge 22.1.3 - Arm MAP
Topology file `/root/.allinea/session/topology-softest001-1-cefd8500' was not created within 30 seconds.
Topology file `/root/.allinea/session/topology-softest001-1-cefd8500' was not created within 1 minute.
MAP: The MPI processes are taking a long time to connect to Arm MAP (softest001:4242) during startup.
MAP:
MAP: Startup has timed out. Aborting.
MAP: Try starting the program at the command prompt then contact the Arm Forge support team for assistance.
MAP: Some processes may remain after aborting and need to be manually killed.
MAP: You can disable this timeout by setting the ALLINEA_NO_TIMEOUT environment variable before you launch Arm Forge.
-------------------------
I have modified the system.conf following the solutions in the user guide about "No shared home directory", and it didn't work.
Does anyone get a clue?
Hi AllanC,
Can you please generate a debug log by adding --debug --log log.xml to the command line and send that to the support-hpc-sw@arm.com email? Many thanks :D
The log file is sent. Thanks.