Community
cancel
Showing results for 
Search instead for 
Did you mean: 
David_Race
Beginner
66 Views

Dual Rail Performance

Hello,
I have a dual rail FDR system with SandyBridge nodes. I have 16 processes assigned to each of four nodes and I am testing the all-to-all. I have enabled dual rail with
export I_MPI_OFA_NUM_ADAPTERS=2
export I_MPI_OFA_RAIL_SCHEDULER=ROUND_ROBIN
The ibstat shows
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.10.2370
Hardware version: 0
Node GUID: 0x001e6703003dd888
System image GUID: 0x001e6703003dd88b
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 78
LMC: 0
SM lid: 3
Capability mask: 0x02514868
Port GUID: 0x001e6703003dd889
Link layer: InfiniBand
CA 'mlx4_1'
CA type: MT4099
Number of ports: 1
Firmware version: 2.10.700
Hardware version: 0
Node GUID: 0x0002c90300333e90
System image GUID: 0x0002c90300333e93
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 779
LMC: 0
SM lid: 618
Capability mask: 0x02514868
Port GUID: 0x0002c90300333e91
Link layer: InfiniBand
Where does Intel MPI pick up the name of the adapters to use for dual rail mode?
Thanks
David Race
0 Kudos
4 Replies
Dmitry_K_Intel2
Employee
66 Views

Hi David,
Intel MPI calls an IB verbs function to get a list of available IB devices. So, Intel MPI doesn't work with names of the adapters and doesn't try to read any configuration file. It should be done bylibibvers library.

Regards!
Dmitry
David_Race
Beginner
66 Views

I set I_MPI_DEBUG=10, then received
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[16] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[28] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[28] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[32] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[32] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[35] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[35] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[36] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[36] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[37] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[37] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[38] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[40] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[40] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[41] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[41] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[42] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[42] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[43] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[43] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[48] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[48] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[16] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[28] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[28] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[32] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[32] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[35] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[35] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[36] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[36] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[37] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[37] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[38] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[41] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[41] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[42] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[42] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[43] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[43] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[48] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[48] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.
From this an the Intel MPI Documentation, does this mean I need an entry for ofa-v2-mlx4_1-1 in the /etc/dat.conf file?
I only have an entry for the first IB devices in the /etc/dat.conf file.
Is this correct?
Thanks
David
karl_lehnberger
Beginner
66 Views

Hi David,

afaik, you cannot use dapl for multirail. You'll have to use the shm:ofa fabric instead.

cheers.
Dmitry_K_Intel2
Employee
66 Views

By default Intel MPI uses shm:dapl fabric.
To enable OFA fabric you need to either set environment variable I_MPI_FABRICS=shm:ofa or add an option '-genv I_MPI_FABRICS shm:ofa' to your mpirun command.

Multi-rail feature is available with ofa fabric only and only with versions 4.0.x


Regards!
Dmitry
Reply