- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been testing code using Intel MPI (version 4.1.3 build 20140226) and the Intel compiler (version 15.0.1 build 20141023) with 1024 or more total processes. When we attempt to run on 1024 or more processes we receive the following error:
MPI startup(): ofa fabric is not available and fallback fabric is not enabled
Anything less than 1024 processes does not produce this error, and I also do not receive this error with 1024 processes using OpenMPI and GCC.
I am using the High Performance Conjugate Gradient benchmark as my test code, although we have received the same errors with other test codes.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jack,
Could you please provide more details about your MPI runs (IMPI environment variables, command line options, OS/OFED versions, processor type, InfiniBand adapter name, number of involved hosts and so on)?
Are you able to run with newer Intel MPI Library (5.x)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Artem,
Absolutely, thank you for the response.
I ran the tests with the following IMPI variables:
I_MPI_FABRICS=shm:ofa
It was submitted through SLURM scheduling with the following batch script:
#!/bin/bash
#SBATCH --job-name=HPCGeval
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=batch
#SBATCH --time=02:00:00
#SBATCH --account=support
#SBATCH --nodes=64
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --exclusive
#SBATCH --constraint=hpcf2013
export KMP_AFFINITY=compact
export OMP_NUM_THREADS=1
srun ../../xhpcg > /dev/null
OS: Red Hat release 6.6 (Santiago), OFED: OFED-1.5.4.1
All tests were run on 64 total nodes, with two Intel E5-2650v2 CPUs (16 total cores) per node, linked with QLogic Corp. IBA7322 Infiniband HCA (rev 02) cards connected to a QLogic 12800-180 switch.
We rely on another company to handle our licenses and updates with Intel-MPI, although I believe that we will be upgrading to Intel MPI Library v5.x soon.
Best,
Jack
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jack,
Thanks for the clarification.
As far as I see you use Intel True Scale (aka QLogic) IBAs, 'shm:ofa' may work nonoptimal on such IBAs.
You can use 'tmi/shm:tmi' fabric which is designed for such cases.
Usage:
export I_MPI_FABRICS=shm:tmi
export TMI_CONFIG=<path_to_impi>/intel64/etc/tmi.conf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Artem,
Thank you so much for your help, this solved the issue we were having, as well as another issue that we were having!
I'm just curious, do you have any idea why this problem only seemed to surface after going over 1023 processes?
Best,
Jack
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page