Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

When I_MPI_FABRICS=shm, the size of MPI_Bcast can't larger than 64kb

杨_栋_
Beginner
1,085 Views
I run MPI with a single workstation( 2 x E5 2690). When I export I_MPI_FABRICS=shm, the size of MPI_Bcast can't larger than 64kb. But when I export I_MPI_FABRICS={shm,tcp}, everything is ok. Are there some limit for shm? Can I adjust the limit?
0 Kudos
5 Replies
Zhen_Z_Intel
Employee
1,085 Views

Dear customer,

Your question is more relevant to MPI not MKL. I will transfer your thread to MPI forum zone. Thank you.

Best regards,
Fiona

0 Kudos
杨_栋_
Beginner
1,085 Views

Fiona Z. (Intel) wrote:

Dear customer,

Your question is more relevant to MPI not MKL. I will transfer your thread to MPI forum zone. Thank you.

Best regards,
Fiona

Thank you!

0 Kudos
James_S
Employee
1,085 Views

Hi Dong,

What is your OS and Intel MPI version? Could please send me the outputs of your MPI environment, and the debug results when exporting I_MPI_DEBUG=6. Thanks.

Best Regards,

Zhuowei

 

0 Kudos
杨_栋_
Beginner
1,085 Views

Si, Zhuowei wrote:

Hi Dong,

What is your OS and Intel MPI version? Could please send me the outputs of your MPI environment, and the debug results when exporting I_MPI_DEBUG=6. Thanks.

Best Regards,

Zhuowei

 

 

Hello Zhuowei.

thanks for your help!

I test my code on two workstations. One of the workstations runs with ubuntu 16.04 LTS, and one runs with Debian GNU/Linux 8.

Intel MPI version of the two workstations is Intel(R) MPI Library 2017 Update 2 for Linux.

My MPI environment is set in .bashrc like this:

export PATH=$PATH:/opt/intel/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_LIB:/opt/intel/mkl/lib/intel64:/opt/intel/lib/intel64:
source /opt/intel/bin/compilervars.sh intel64
source /opt/intel/mkl/bin/mklvars.sh intel64
export INTEL_LICENSE_FILE=/opt/intel/licenses

This is my c++ code:

# include <mpi.h>
# include <iostream>
# include <unistd.h>

void main(int argc,char *argv[])
{
  MPI_Init(&argc,&argv);
  
  int processor_id_temp;
  MPI_Comm_rank(MPI_COMM_WORLD,&processor_id_temp);
  const int processor_id = processor_id_temp;

  char*const buf = new char[BCAST_SIZE];
  sprintf(buf, "Hello! (from processor id %d)", processor_id);

  const int color = (processor_id>0 ? 1 : 0);

  MPI_Comm MPI_COMM_TEST;
  MPI_Comm_split(MPI_COMM_WORLD,
		 color,
		 processor_id,
		 &MPI_COMM_TEST);
  
  MPI_Bcast(buf,
	    BCAST_SIZE,
	    MPI_CHAR,
	    0,
	    MPI_COMM_TEST);

  usleep(processor_id * 10000);
    
  std::cout<<"processor id "
	   <<processor_id
	   <<", color "
	   <<color
	   <<": "
	   <<buf
	   <<std::endl;

  delete [] buf;
    
  MPI_Finalize();
}

This is the result on the workstation with ubuntu:

$ export I_MPI_FABRICS=shm
$ export I_MPI_DEBUG=6
$ for size in 32768 131072; do mpiicpc -DBCAST_SIZE=${size} mpi_comm_split.cpp; mpirun -n 3 ./a.out; echo; done
[0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 2  Build 20170125 (id: 16752)
[0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 3: 0-0 & 0-2147483647
[0] MPI startup(): Allgather: 1: 1-6459 & 0-2147483647
[0] MPI startup(): Allgather: 5: 6460-14628 & 0-2147483647
[0] MPI startup(): Allgather: 1: 14629-25466 & 0-2147483647
[0] MPI startup(): Allgather: 3: 25467-36131 & 0-2147483647
[0] MPI startup(): Allgather: 5: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 1: 0-7199 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-4 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 5-8 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 9-32 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 33-64 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 65-341 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 342-6656 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 6657-8192 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 8193-113595 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 113596-132320 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 132321-1318322 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-25 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 26-37 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 38-1024 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 1025-4096 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 4097-70577 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 1: 0-0 & 0-2147483647
[0] MPI startup(): Bcast: 8: 1-12746 & 0-2147483647
[0] MPI startup(): Bcast: 1: 12747-42366 & 0-2147483647
[0] MPI startup(): Bcast: 7: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 1: 0-0 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-5 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 6-128 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 3: 129-89367 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-0 & 0-2147483647
[0] MPI startup(): Reduce: 7: 1-39679 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 1: 0-0 & 0-2147483647
[0] MPI startup(): Scatter: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[1] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[2] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       4440     yd-ws1     {0,4}
[0] MPI startup(): 1       4441     yd-ws1     {1,5}
[0] MPI startup(): 2       4442     yd-ws1     {2,6}
[0] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): I_MPI_DEBUG=6
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=1
[0] MPI startup(): I_MPI_PIN_MAPPING=3:0 0,1 1,2 2
processor id 0, color 0: Hello! (from processor id 0)
processor id 1, color 1: Hello! (from processor id 1)
processor id 2, color 1: Hello! (from processor id 1)

[0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 2  Build 20170125 (id: 16752)
[0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 3: 0-0 & 0-2147483647
[0] MPI startup(): Allgather: 1: 1-6459 & 0-2147483647
[0] MPI startup(): Allgather: 5: 6460-14628 & 0-2147483647
[0] MPI startup(): Allgather: 1: 14629-25466 & 0-2147483647
[0] MPI startup(): Allgather: 3: 25467-36131 & 0-2147483647
[0] MPI startup(): Allgather: 5: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 1: 0-7199 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-4 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 5-8 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 9-32 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 33-64 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 65-341 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 342-6656 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 6657-8192 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 8193-113595 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 113596-132320 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 132321-1318322 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-25 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 26-37 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 38-1024 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 1025-4096 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 4097-70577 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 1: 0-0 & 0-2147483647
[0] MPI startup(): Bcast: 8: 1-12746 & 0-2147483647
[0] MPI startup(): Bcast: 1: 12747-42366 & 0-2147483647
[0] MPI startup(): Bcast: 7: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 1: 0-0 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-5 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 6-128 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 3: 129-89367 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-0 & 0-2147483647
[0] MPI startup(): Reduce: 7: 1-39679 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 1: 0-0 & 0-2147483647
[0] MPI startup(): Scatter: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[1] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[2] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       4468     yd-ws1     {0,4}
[0] MPI startup(): 1       4469     yd-ws1     {1,5}
[0] MPI startup(): 2       4470     yd-ws1     {2,6}
[0] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): I_MPI_DEBUG=6
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=1
[0] MPI startup(): I_MPI_PIN_MAPPING=3:0 0,1 1,2 2
processor id 0, color 0: Hello! (from processor id 0)

When BCAST_SIZE=131072, the processors 1 and 2 couldn't output(line 32 of the code) and they was stopped by Ctrl+C.

0 Kudos
James_S
Employee
1,085 Views

Hi Dong,

Could you please try to set the I_MPI_SHM_FBOX/I_MPI_SHM_LMT (https://software.intel.com/en-us/node/528902?language=es), does this help on the hang-up?

Best Regards,

Zhuowei

0 Kudos
Reply