- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
## Summary
Setting `I_MPI_FILESYSTEM_FORCE=ufs` with shared-file MPI-IO (N-to-1 pattern) on NFS filesystems causes `ADIO_OPEN` failures and `MPI_Abort` crashes on Intel MPI 2021.17 (oneAPI 2025.3). Additionally, Intel MPI 2021.11 (oneAPI 2024.0) exhibits shared-file ROMIO failures at multi-node scale regardless of environment variable settings. The upstream ROMIO variable `ROMIO_FSTYPE_FORCE="ufs:"` works correctly on 2021.17 as a workaround, but does not resolve the 2021.11 failures.
## Environment
- **OS:** Rocky Linux 8.9
- **Kernel:** 4.18.0 (x86_64)
- **Compiler:** Intel oneAPI 2022 (icc 2021.6.0 / ifort 2021.6.0)
- **Intel MPI versions tested:**
- 2021.10 (oneAPI 2024.0)
- 2021.11 (oneAPI 2024.0) — `Intel(R) MPI Library for Linux* OS, Version 2021.11`
- 2021.17 (oneAPI 2025.3) — `Intel(R) MPI Library for Linux* OS, Version 2021.17 Build 20251215`
- **Filesystem:** NFS v3 over RDMA, `nconnect=32`
- **Cluster:** Up to 8 nodes, 28 cores/node (up to 224 MPI ranks), InfiniBand interconnect (Mellanox ConnectX-6)
- **Job scheduler:** SLURM 25.11.0
- **Reproducers:**
- IOR 4.0.0 (`-a MPIIO` shared-file mode, without `-F`). IOR was compiled and dynamically linked against each Intel MPI version being tested (`ldd` verified `libmpi.so.12` resolves to the correct version).
- PNetCDF 1.14.1 fandc test (flush-and-close shared-file write), also compiled per-version.
## Steps to Reproduce
### Minimal reproducer using IOR shared-file mode
The bug can be reproduced with as few as 2 nodes. IOR must be compiled against the same Intel MPI version being tested.
```bash
#!/bin/bash
#SBATCH -N 2
#SBATCH -n 2
#SBATCH --ntasks-per-node=1
#SBATCH -p <partition>
module load intel/2022 intelmpi/2021.17
# Confirm version
mpirun --version
# Expected: Intel(R) MPI Library for Linux* OS, Version 2021.17 Build 20251215
# Test 1: Baseline shared-file (PASSES)
mpirun -np 2 \
./ior -a MPIIO -e -g -t 1m -b 64m -w \
-o /path/to/nfs/testfile_baseline
# Test 2: I_MPI_FILESYSTEM_FORCE=ufs on shared-file (CRASHES)
mpirun -np 2 \
-env I_MPI_FILESYSTEM_FORCE ufs \
./ior -a MPIIO -e -g -t 1m -b 64m -w \
-o /path/to/nfs/testfile_ufs
# Test 3: ROMIO_FSTYPE_FORCE workaround on shared-file (PASSES)
mpirun -np 2 \
-env ROMIO_FSTYPE_FORCE "ufs:" \
./ior -a MPIIO -e -g -t 1m -b 64m -w \
-o /path/to/nfs/testfile_romio
```
**Key points for reproduction:**
- IOR must be run **without** `-F` (i.e., shared-file / N-to-1 mode, not file-per-process)
- Variables must be passed via `mpirun -env VAR value` (space-separated, not `=`), to ensure propagation to remote ranks under SLURM
- The bug manifests with as few as 2 ranks on 2 nodes; it scales with node count (28 errors at 8 nodes / 224 ranks)
- IOR 4.0.0 was used; any MPI-IO application that opens a shared file via `MPI_File_open` should reproduce this
### Alternate reproducer using PNetCDF fandc
```bash
# Build PNetCDF 1.14.1 against Intel MPI 2021.11 or 2021.17
cd pnetcdf-1.14.1/test/fandc/
# Run with I_MPI_FILESYSTEM_FORCE=ufs on NFS mount
mpirun -np 224 \
-env I_MPI_FILESYSTEM_FORCE ufs \
./exe.intel2022_impi2021.17
```
## Observed Behavior
### Intel MPI 2021.17 (oneAPI 2025.3)
| Test Configuration | Scale | ADIO Errors | Outcome |
|---|---|---|---|
| Shared-file, no tuning (baseline) | 8 nodes / 224 PE | **0** | PASS — 4.1 GB/s write, 5.8 GB/s read |
| Shared-file + `I_MPI_FILESYSTEM_FORCE=ufs` | 2 nodes / 2 PE | **2** | **FAIL — MPI_Abort** |
| Shared-file + `I_MPI_FILESYSTEM_FORCE=ufs` | 8 nodes / 224 PE | **28** | **FAIL — MPI_Abort, total failure** |
| Shared-file + `I_MPI_FILESYSTEM_FORCE=ufs` + `I_MPI_FILESYSTEM_NFS_DIRECT=enable` | 8 nodes / 224 PE | **28** | **FAIL — MPI_Abort, total failure** |
| Shared-file + `ROMIO_FSTYPE_FORCE="ufs:"` | 8 nodes / 224 PE | **0** | PASS — 4.1 GB/s write, 5.7 GB/s read |
| File-per-process (`-F`) + `I_MPI_FILESYSTEM_FORCE=ufs` | 8 nodes / 224 PE | **0** | PASS — works correctly |
**Error output (2021.17, 2-node reproducer):**
```
ERROR: cannot open file: /path/to/nfs/testfile, MPI Other I/O error , error stack:
internal_File_open(3211): MPI_File_open(comm=0x84000002, filename=/path/to/nfs/testfile, amode=37, info=0x9c000000, fh=0x93ef00) failed
ADIO_OPEN(535)..........: open failed on a remote node, (aiori-MPIIO.c:236)
ERROR: cannot open file: /path/to/nfs/testfile, MPI File does not exist, error stack:
internal_File_open(3211): MPI_File_open(comm=0x84000002, filename=/path/to/nfs/testfile, amode=37, info=0x9c000000, fh=0x11f87e0) failed
ADIOI_UFS_OPEN(37)......: File /path/to/nfs/testfile does not exist, (aiori-MPIIO.c:236)
Abort(-1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
```
**Error output (2021.17, 8-node / 224-rank run):**
At larger scale (8 nodes, 224 MPI ranks), 28 ADIO_OPEN errors are emitted before the collective `MPI_Abort`.
On 2021.17, the `I_MPI_FILESYSTEM_FORCE=ufs` setting causes a hard `MPI_Abort` — the job terminates immediately with 28 ADIO_OPEN errors across 8 nodes. The error stack suggests that when the UFS module is selected via this variable, the shared-file open path fails on remote nodes — rank 0 appears to proceed, but other ranks receive "File does not exist" errors.
Notably, the upstream ROMIO variable `ROMIO_FSTYPE_FORCE="ufs:"` produces zero errors and equivalent performance for the same workload. This suggests the issue may be specific to how `I_MPI_FILESYSTEM_FORCE` routes into the UFS module, rather than in ROMIO's UFS module itself.
### Intel MPI 2021.11 (oneAPI 2024.0) — Shared-File Baseline Affected
| Test Configuration | Tool | Scale | ADIO Errors | Outcome |
|---|---|---|---|---|
| Shared-file, no tuning (baseline) | PNetCDF fandc | 8 nodes / 224 PE | **140** | **FAIL — data corruption (cprnc diff = 1.010)** |
| Shared-file, no tuning (baseline) | IOR | 8 nodes / 224 PE | **28** | **FAIL** |
| Shared-file + `I_MPI_FILESYSTEM_FORCE=ufs` | IOR | 8 nodes / 224 PE | **56** | **FAIL** |
| Shared-file + `ROMIO_FSTYPE_FORCE="ufs:"` | PNetCDF fandc | 8 nodes / 224 PE | **140** | **FAIL — data corruption** |
| Shared-file, no tuning (baseline) | IOR (strace) | 2 nodes | crash | **FAIL — ADIO_OPEN crash** |
On 2021.11, shared-file MPI-IO fails at multi-node scale even without any tuning variables set. The `ADIO_OPEN(522)` error occurs regardless of environment variable settings — `I_MPI_FILESYSTEM_FORCE`, `ROMIO_FSTYPE_FORCE`, and `I_MPI_FILESYSTEM_NFS_DIRECT` all produce the same failure. With PNetCDF fandc at 8 nodes, this manifests as 140 non-fatal ADIO errors with **silent data corruption** (verified via `cprnc` comparison utility showing diff = 1.010, where 0.000 is expected). The issue appears to be in the shared-file ADIO open path introduced in this release.
**Error message (2021.11):**
```
MPI error (MPI_File_open): Unknown error class, error stack:
ADIO_OPEN(522): open failed on a remote node
```
Note the line number changed from 522 (2021.11) to 535 (2021.17), which suggests the relevant code path was modified between releases. The shared-file failure symptom persists under different conditions in each version.
### Intel MPI 2021.10 — Works Correctly (Reference)
| Test Configuration | ADIO Errors | Outcome |
|---|---|---|
| Shared-file, no tuning (baseline) | **0** | PASS |
| Shared-file + `ROMIO_FSTYPE_FORCE="ufs:"` | **0** | PASS |
| PNetCDF fandc (4-node, 112 PE) | **0** | PASS — zero data diff |
| PNetCDF fandc (8-node, 224 PE) | **0** | PASS — zero data diff |
Intel MPI 2021.10 handles shared-file MPI-IO correctly at all tested scales with zero ADIO errors and zero data corruption, confirming this is a regression introduced in 2021.11.
## Cross-Version Summary
| Intel MPI Version | Shared-File Baseline | `I_MPI_FILESYSTEM_FORCE=ufs` | `ROMIO_FSTYPE_FORCE="ufs:"` |
|---|---|---|---|
| **2021.10** | PASS (0 errors) | N/A (uses older `I_MPI_EXTRA_*` syntax) | PASS (0 errors) |
| **2021.11** | ADIO_OPEN errors at multi-node scale + data corruption | ADIO_OPEN errors | ADIO_OPEN errors |
| **2021.17** | PASS (0 errors) | ADIO_OPEN errors (28 at 8 nodes) + MPI_Abort | PASS (0 errors) |
## Key Observations
1. **The issue first appears in Intel MPI 2021.11.** Version 2021.10 works correctly at all tested scales; 2021.11 exhibits shared-file MPI-IO failures on NFS at multi-node scale.
2. **2021.17 resolves the baseline failure but introduces a different issue.** The default NFS module's shared-file open works correctly again on 2021.17. However, setting `I_MPI_FILESYSTEM_FORCE=ufs` triggers the ADIO_OPEN failure on what would otherwise be a working baseline.
3. **The upstream ROMIO variable behaves differently.** `ROMIO_FSTYPE_FORCE="ufs:"` works correctly on 2021.17 for the same shared-file workload where `I_MPI_FILESYSTEM_FORCE=ufs` fails. This suggests the two variables may follow different code paths when selecting the UFS module.
4. **File-per-process mode is unaffected.** `I_MPI_FILESYSTEM_FORCE=ufs` works correctly for N-N (file-per-process) patterns on 2021.17. The issue is specific to shared-file (N-to-1) opens.
5. **Strace-verified.** All findings were confirmed via `strace` on individual MPI ranks, not just performance observation.
## Expected Behavior
`I_MPI_FILESYSTEM_FORCE=ufs` should select the UFS ROMIO module without introducing ADIO_OPEN failures, consistent with how `ROMIO_FSTYPE_FORCE="ufs:"` behaves. The shared-file open path should work identically regardless of which variable is used to select the UFS module.
## Current Workaround
Use `ROMIO_FSTYPE_FORCE="ufs:"` instead of `I_MPI_FILESYSTEM_FORCE=ufs` for all shared-file MPI-IO workloads on NFS:
```bash
mpirun -env ROMIO_FSTYPE_FORCE "ufs:" -np <N> ./application
```
For Intel MPI 2021.11 specifically, the only viable path is to downgrade to 2021.10, as the shared-file baseline itself is broken and no environment variable provides a workaround.
## Request
We would appreciate guidance on the following:
1. Could the `I_MPI_FILESYSTEM_FORCE=ufs` code path for shared-file (N-to-1) MPI-IO on NFS be investigated in Intel MPI 2021.17+? The upstream `ROMIO_FSTYPE_FORCE="ufs:"` works for the same workload, suggesting the issue may be in how the Intel-specific variable routes into the UFS module.
2. Could the shared-file ROMIO regression in Intel MPI 2021.11 be investigated, and could Intel confirm whether versions 2021.12 through 2021.16 are also affected?
3. Would it be possible to document `ROMIO_FSTYPE_FORCE="ufs:"` as a recommended workaround for shared-file I/O on NFS in the interim?
We are happy to provide additional logs, strace output, or run further tests if that would help the investigation. Thank you for your time and for the excellent Intel MPI toolkit.
---
*Findings independently confirmed via both IOR (shared-file mode) and PNetCDF fandc test suite across multiple node counts (2, 4, 8 nodes) on a production HPC cluster with NFS v3 over RDMA storage.*
Link kopiert
- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite