- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
Using mpiifort on a cluster results in : "open_hca: device mlx4_0 not found" for some group nodes while for others there is no error and mpiifort runs perfectly fine. All the nodes have the same hardware/software configuration. I already had a look at the similar topic at :
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/393416
And applied the proposed solution of commenting the ofa-v2-mlx4_0-1 and ofa-v2-mlx4_0-2 lines in /etc/dat.conf, but it did not solve the issue.
Would you have any idea of what might be wrong ? I attach the error log as well as ibstat output if it can help :
$ ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.31.5050
Hardware version: 1
Node GUID: 0xf45214030090c050
System image GUID: 0xf45214030090c053
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0251486a
Port GUID: 0xf45214030090c051
Link layer: InfiniBand
Many thanks in advance,
Edrisse535717
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please test with I_MPI_FABRICS=shm:ofa
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Many thanks for your answer, I confirm you that adding I_MPI_FABRICS=shm:ofa solves the issue.
Best Regards,
Edrisse
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page