This turned out to be an

Seunghwa_Kang · ‎01-08-2016

Hello,

I'm playing with different job launching methods (http://slurm.schedmd.com/mpi_guide.html#intel_mpi), and getting the following error only when I launch a job using srun (my code works fine with mpirun, mpirun --bootstrp=slurm, and mpiexec.hyra) AND using shm:dapl (works fine with shm:tcp).

If I launch the job with

setenv I_MPI_PMI_LIBRARY /usr/lib64/libpmi.so
setenv I_MPI_FABRICS shm:dapl
srun -n 2 my_exec

I get

1: [1] trying to free memory block that is currently involved to uncompleted data transfer operation
1: free mem - addr=0x2b7a44547f70 len=1146388320
1: RTC entry - addr=0x2b7a4bc93a00 len=1254064 cnt=1
1: Assertion failed in file ../../i_rtc_cache.c at line 1338: 0
1: internal ABORT - process 1
0: [0] trying to free memory block that is currently involved to uncompleted data transfer operation
0: free mem - addr=0x2ab3a253ff90 len=2723413888
0: RTC entry - addr=0x2ab3a7aada80 len=1182864 cnt=1
0: Assertion failed in file ../../i_rtc_cache.c at line 1338: 0
0: internal ABORT - process 0

And this error disappears if I set I_MPI_FABRICS to shm:tcp

So what's the difference between srun and other launching methods in this regard? I want to make sure whether this can happen due to a bug in my code (so I need to fix it) or this is just a configuration issue and just not using srun will be sufficient.

Gergana_S_Intel · ‎01-12-2016

Hi Seunghwa,

Thanks for getting in touch. This is more likely a configuration error than an issue with your application. Although, it's likely your application takes up more memory than the defaults allow.

In your case, you're saying that using I_MPI_FABRICS=shm:dapl (which is running over your local InfiniBand software stack, likely OFED) works fine when doing mpirun, mpirun -bootstrap=slurm, and mpiexec.hydra. But doing the same with srun causes the "trying to free memory block" errors you see.

The main difference in all of these cases is the launch mechanism. When using mpirun/mpiexec.hydra, you're relying on the Intel MPI Library to start your job using the underlying SLURM startup method. But in the srun case, you're actually asking SLURM to start your MPI job by pulling in the appropriate Intel MPI libs and scripts. So the issue with srun is that some of the defaults on your system might be set differently as compared to when starting up with mpirun.

Do you know if your memory limits are set appropriately? Check out this forum thread which talks about how to set some of these limits. Furthermore, the same error was resolved here by setting log_num_mtt to 24.

I hope this helps. Let me know if updating your settings changes the outcome.

Regards,
~Gergana

Seunghwa_Kang · ‎01-12-2016

Thanks Gergana,

The problem disappeared once I removed the "setenv I_MPI_PMI_LIBRARY /usr/lib64/libpmi.so" line and executed with "srun --mpi=pmi2 ..." instead of "srun ...".

For OpenMPI, it seems like --mpi=pmi2 should be used if pmi2 is enabled. Is there something similar for Intel MPI?

"If the pmi2 support is enabled then the command line options '--mpi=pmi2' has to be specified on the srun command line." <= from http://slurm.schedmd.com/mpi_guide.html#open_mpi

And I am encountering another problem.

With srun --mpi=pmi2 and 128 or more nodes (1 MPI process per node, no error message till 64 nodes),

I got "slurmstepd: error: tree_msg_to_stepds: host=g161, rc = 1" in MPI_Init_thread(), but the code seems like working fine. With mpirun or mpiexec, MPI_Init_thread() does not return any error message, but MPI communication is way slower.

Any idea?

Thank you very much!!!

-seunghwa

Gergana S. (Intel) wrote:

Hi Seunghwa,

Thanks for getting in touch. This is more likely a configuration error than an issue with your application. Although, it's likely your application takes up more memory than the defaults allow.

In your case, you're saying that using I_MPI_FABRICS=shm:dapl (which is running over your local InfiniBand software stack, likely OFED) works fine when doing mpirun, mpirun -bootstrap=slurm, and mpiexec.hydra. But doing the same with srun causes the "trying to free memory block" errors you see.

The main difference in all of these cases is the launch mechanism. When using mpirun/mpiexec.hydra, you're relying on the Intel MPI Library to start your job using the underlying SLURM startup method. But in the srun case, you're actually asking SLURM to start your MPI job by pulling in the appropriate Intel MPI libs and scripts. So the issue with srun is that some of the defaults on your system might be set differently as compared to when starting up with mpirun.

Do you know if your memory limits are set appropriately? Check out this forum thread which talks about how to set some of these limits. Furthermore, the same error was resolved here by setting log_num_mtt to 24.

I hope this helps. Let me know if updating your settings changes the outcome.

Regards,
~Gergana

Seunghwa_Kang · ‎01-20-2016

This turned out to be an issue at the system I am using and now it's fixed.

Thanks for the support!

Seunghwa Kang wrote:

Thanks Gergana,

The problem disappeared once I removed the "setenv I_MPI_PMI_LIBRARY /usr/lib64/libpmi.so" line and executed with "srun --mpi=pmi2 ..." instead of "srun ...".

For OpenMPI, it seems like --mpi=pmi2 should be used if pmi2 is enabled. Is there something similar for Intel MPI?

"If the pmi2 support is enabled then the command line options '--mpi=pmi2' has to be specified on the srun command line." <= from http://slurm.schedmd.com/mpi_guide.html#open_mpi

And I am encountering another problem.

With srun --mpi=pmi2 and 128 or more nodes (1 MPI process per node, no error message till 64 nodes),

I got "slurmstepd: error: tree_msg_to_stepds: host=g161, rc = 1" in MPI_Init_thread(), but the code seems like working fine. With mpirun or mpiexec, MPI_Init_thread() does not return any error message, but MPI communication is way slower.

Any idea?

Thank you very much!!!

-seunghwa

Quote:

Gergana S. (Intel) wrote:

Hi Seunghwa,

Thanks for getting in touch. This is more likely a configuration error than an issue with your application. Although, it's likely your application takes up more memory than the defaults allow.

In your case, you're saying that using I_MPI_FABRICS=shm:dapl (which is running over your local InfiniBand software stack, likely OFED) works fine when doing mpirun, mpirun -bootstrap=slurm, and mpiexec.hydra. But doing the same with srun causes the "trying to free memory block" errors you see.

The main difference in all of these cases is the launch mechanism. When using mpirun/mpiexec.hydra, you're relying on the Intel MPI Library to start your job using the underlying SLURM startup method. But in the srun case, you're actually asking SLURM to start your MPI job by pulling in the appropriate Intel MPI libs and scripts. So the issue with srun is that some of the defaults on your system might be set differently as compared to when starting up with mpirun.

Do you know if your memory limits are set appropriately? Check out this forum thread which talks about how to set some of these limits. Furthermore, the same error was resolved here by setting log_num_mtt to 24.

I hope this helps. Let me know if updating your settings changes the outcome.

Regards,
~Gergana

memory error occurs only with a certain job launching method and shm:dapl