Is anyone working on adding support for Amazon's EFA libfabric provider? Described here:
They currently have a build of libfabric that supports their 100Gb/s network with OS bypass. I tried getting Intel MPI to use their libfabric, but it doesn't seem to work. I guess Intel MPI needs to be aware of the specific provider? That surprised me a bit since I thought the point of libfabric would be to provide abstracted access to the fabric.
If support for the EFA provider is in progress, I've got an HPC app that could benefit and would be interested in doing some testing.
Yes, we are working on adding support for EFA in Intel MPI. For now there is no out-of-box support for EFA provider but it can be enabled with few manual steps. We plan to hide these steps with future IMPI release.
After installation it should be enough to perform "source /op/intel/impi/.../intel64/bin/mpivars.sh" and "export LD_LIBRARY_PATH=<ofi_install_path>/lib:$LD_LIBRARY_PATH" to enable IMPI environment.
To launch MPI job check that you have passwordless access between instances. And use for example this command:
I_MPI_DEBUG=1 mpiexec.hydra -n 2 -ppn 1 -f hostfile IMB-MPI1 pingpong
In logs you should see "MPI startup(): libfabric provider: efa" - this means that expected provider is used by IMPI.
Thanks Mikhail. I'll give that a shot. I've already got a baseline of PingPong benchmark data for Intel MPI without EFA support, so will run that again with the EFA support.
I followed the steps above and everything seemed to install fine, but isn't quite working. When I run I can see it is using the libfabric I built, but it uses the tcp provider. The performance is not as good as I was seeing with the production Intel MPI 2019.
I tried forcing it to use the efa provider, but then it errors out:
[ec2-user@ip-172-31-23-250 ~]$ export FI_PROVIDER=efa [ec2-user@ip-172-31-23-250 ~]$ mpiexec.hydra -n 2 -ppn 1 -hosts 172.31.23.250,172.31.25.29 ./benchmarks/imb_impi/IMB-MP I1 PingPong  MPI startup(): libfabric version: 1.8.0rc1  MPI startup(): libfabric provider: efa;ofi_rxd Abort(1094799) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(666)......: MPID_Init(922).............: MPIDI_NM_mpi_init_hook(987): OFI address vector open failed (ofi_init.h:987:MPIDI_NM_mpi_init_hook:Invalid argument) Abort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(666)......: MPID_Init(922).............: MPIDI_NM_mpi_init_hook(987): OFI address vector open failed (ofi_init.h:987:MPIDI_NM_mpi_init_hook:Invalid argument)
I built the OFI libfabric off master, is that the right branch? I already had Intel MPI 2019.u4 installed, so when I ran that install script it just seemed to patch the install I already had. Should I have removed the install before running your install script?
> I built the OFI libfabric off master, is that the right branch?
For now use 1.8.0rc1 tag (git checkout v1.8.0rc1). There is work in progress on libfabric side to fix IMPI/EFA initialization.
I verified 1.8.0rc1 and it works well on IMB with expected "efa" provider in log.
> I already had Intel MPI 2019.u4 installed, so when I ran that install script it just seemed to patch the install I already had. Should I have removed the install before running your install script?
In this case install script should patch existing mpivars.sh. Make sure that mpivars.sh has new "export ..." lines in the beginning, from section 3 of install script, for example MPIR_CVAR_CH4_OFI_ENABLE_ATOMICS, I_MPI_TUNING_BIN.
Thanks, I got it working. The problem on my first attempt is that I had Intel MPI 2018 and 2019 installed and the script decided to patch mpivars.sh from the 2018 version by accident. Once I fixed that it all worked as expected. The latency looks better now, ~16us instead of ~25us with sockets. The bandwidth looks a little bit lower with EFA though, ~10% less than I measured with sockets in the pingpong test.
Looks like I get about the same pingpong results with Intel MPI+EFA as I get with OpenMPI+EFA using the Amazon stack, which I guess isn't surprising. The big advantage to Intel MPI for me is that it supports shared mem for intra node, which apparently Amazon's OpenMPI+EFA does not yet.
Thanks for the help. I'll see if my HPC app runs faster now!