Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

SLURM Intelmpi cannot use Infiniband only ethernet QLogic Intel switch and interfaces

Zeljko_M_
Beginner
1,764 Views

Hi guys,

I have a SLURM cluster setup with intelmpi and Ansys CFX.

Here are my settings for the jobs:

export I_MPI_DEBUG=5
export PSM_SHAREDCONTEXTS=1
export PSM_RANKS_PER_CONTEXT=4
export TMI_CONFIG=/etc/tmi.conf
export IPATH_NO_CPUAFFINITY=1
export I_MPI_DEVICE=rddsm
export I_MPI_FALLBACK_DEVICE=disable
export I_MPI_PLATFORM=bdw
export SLURM_CPU_BIND=none
export I_MPI_FABRICS=shm:tmi
export I_MPI_TMI_PROVIDER=psm
export I_MPI_FALLBACK=1

I have also the intelmpi 5.0.3 module loaded under Centos 7

And also the simulation starts but the traffic does not go trought ib0 interfaces.

This is the output from the debug:

[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): shm and tmi data transfer modes
[8] MPI startup(): shm and tmi data transfer modes
[2] MPI startup(): shm and tmi data transfer modes
[10] MPI startup(): shm and tmi data transfer modes
[4] MPI startup(): shm and tmi data transfer modes
[12] MPI startup(): shm and tmi data transfer modes
[1] MPI startup(): shm and tmi data transfer modes
[9] MPI startup(): shm and tmi data transfer modes
[3] MPI startup(): shm and tmi data transfer modes
[15] MPI startup(): shm and tmi data transfer modes
[6] MPI startup(): shm and tmi data transfer modes
[14] MPI startup(): shm and tmi data transfer modes
[5] MPI startup(): shm and tmi data transfer modes
[11] MPI startup(): shm and tmi data transfer modes
[7] MPI startup(): shm and tmi data transfer modes
[13] MPI startup(): shm and tmi data transfer modes
[0] MPI startup(): Rank    Pid      Node name                 Pin cpu
[0] MPI startup(): 0       12614    qingclinf-01.hpc.cluster  {0,1,2,20,21}
[0] MPI startup(): 1       12615    qingclinf-01.hpc.cluster  {3,4,22,23,24}
[0] MPI startup(): 2       12616    qingclinf-01.hpc.cluster  {5,6,7,25,26}
[0] MPI startup(): 3       12617    qingclinf-01.hpc.cluster  {8,9,27,28,29}
[0] MPI startup(): 4       12618    qingclinf-01.hpc.cluster  {10,11,12,30,31}
[0] MPI startup(): 5       12619    qingclinf-01.hpc.cluster  {13,14,32,33,34}
[0] MPI startup(): 6       12620    qingclinf-01.hpc.cluster  {15,16,17,35,36}
[0] MPI startup(): 7       12621    qingclinf-01.hpc.cluster  {18,19,37,38,39}
[0] MPI startup(): 8       12441    qingclinf-02.hpc.cluster  {0,1,2,20,21}
[0] MPI startup(): 9       12442    qingclinf-02.hpc.cluster  {3,4,22,23,24}
[0] MPI startup(): 10      12443    qingclinf-02.hpc.cluster  {5,6,7,25,26}
[0] MPI startup(): 11      12444    qingclinf-02.hpc.cluster  {8,9,27,28,29}
[0] MPI startup(): 12      12445    qingclinf-02.hpc.cluster  {10,11,12,30,31}
[0] MPI startup(): 13      12446    qingclinf-02.hpc.cluster  {13,14,32,33,34}
[0] MPI startup(): 14      12447    qingclinf-02.hpc.cluster  {15,16,17,35,36}
[0] MPI startup(): 15      12448    qingclinf-02.hpc.cluster  {18,19,37,38,39}
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): I_MPI_FABRICS=shm:tmi
[0] MPI startup(): I_MPI_FALLBACK=1
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=qib0:0
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=8:0 0,1 3,2 5,3 8,4 10,5 13,6 15,7 18
[0] MPI startup(): I_MPI_PLATFORM=auto
[0] MPI startup(): I_MPI_TMI_PROVIDER=psm

 

But there is not traffic over infinabd

        inet 10.0.2.1  netmask 255.255.255.0  broadcast 10.0.2.255
        inet6 fe80::211:7500:6e:de10  prefixlen 64  scopeid 0x20<link>
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:00:00:03:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 121  bytes 23835 (23.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 118  bytes 22643 (22.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

I did chmod 666 on the /dev/ipath and /dev/infiniband* on the compute nodes.

the /etc/tmi.conf has the library.

Why are the sims running ok and will not over infinband. I can ping and ssh onver infinband but canno use it.

Thanks in advance.

0 Kudos
6 Replies
Zeljko_M_
Beginner
1,764 Views

Greetings,

I have now configured DAPL,

The sim starts I get the followin messages but still no traffic over Infiniband

[12] MPI startup(): DAPL provider ofa-v2-qib0-1s
[6] MPI startup(): shm and dapl data transfer modes

 

0 Kudos
Gregg_S_Intel
Employee
1,764 Views

Try removing these: I_MPI_DEVICE, I_MPI_FALLBACK_DEVICE

 

And set I_MPI_FALLBACK to 0.

0 Kudos
Zeljko_M_
Beginner
1,764 Views

I have tried this. Also dapl, tmi and so on. It is strange , the ofed stack is installed, the ibstatus is ok , the nodes can ping each other, but the apps will not communicate. Are there some /etc/infiniband/openibd.conf settings to make.

0 Kudos
Gregg_S_Intel
Employee
1,764 Views

The I_MPI_DEVICE setting conflicts with the  I_MPI_FABRICS setting.  (I_MPI_DEVICE is deprecated.)

What error message do you get when the MPI is forced not to fallback?  (If it ran, something else is going on, maybe Ansys is overriding your settings.)

Try running the Intel MPI Benchmarks -- IMB-MPI1 in same directory as mpirun.

 

0 Kudos
Zeljko_M_
Beginner
1,764 Views

Hey guys,

I think that the HPC cluster was working from the start.

When I use perfquery (native infiniband port counter) I see that packets are goting trough. And I cannot see them over ifconfig.

I think the TCP-IP stack (the IPoIB) was designed for apps that would like to use the socket technology from Infiniband, and that the solver has recognized it use it natively over ib0 interface and the traffic is going trough infiniband stack.

I do not get the error. The simulation is running and I get this

[11] MPI startup(): shm and tmi data transfer modes

So although I get a lot of data over Ethernet I think these are only sockets with Ethernet IPs for exchanging info and the calculations are going over MPI instances trough Infiniband interfaces natively.

How could I also confirm / benchmark this?

 perfquery
# Port counters: Lid 1 port 1 (CapMask: 0x200)
PortSelect:......................1
CounterSelect:...................0x0000
SymbolErrorCounter:..............0
LinkErrorRecoveryCounter:........0
LinkDownedCounter:...............0
PortRcvErrors:...................0
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................252512975
PortRcvData:.....................244666221
PortXmitPkts:....................1301352
PortRcvPkts:.....................1308427

 

0 Kudos
Gregg_S_Intel
Employee
1,764 Views

If it runs with I_MPI_FABRICS=shm,tmi and I_MPI_FALLBACK=0, then the messages are going over infiniband.

0 Kudos
Reply