- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'd like to report a wrong-result code-generation bug in ifx 2026 (oneAPI 2026.0.0) on Windows x64. Minimal reproducer with a Visual Studio 2026 solution and detailed README is in this repo:
Repo: https://bitbucket.org/k2fem/intel-oneapi2026-matmul-transpose-bug/src/main/
TL;DR
subroutine compute_y(nm, T, X, Y) integer, intent(in) :: nm real(8), dimension(12, 3*nm), intent(in) :: T real(8), dimension(12, 3*nm), intent(in) :: X real(8), dimension(3*nm, 3*nm), intent(out) :: Y Y = matmul(transpose(T), X) end subroutine
With nm = 5 and a single nonzero T(7,13) = 1, X(7,13) = 70, the expected result is Y(13,13) = 70. Under ifx 2026 with Release-mode optimisation (/O2), the routine returns Y(13,13) = 0. With /Od (Debug) the same routine returns the correct value.
Toolchain
- Intel oneAPI Fortran compiler (ifx) 2026.0.0, x64
- Visual Studio 2026 Community 18.6.0, MSVC v145 (14.51)
- Windows 11 22621.6060
What's required to trigger it (both ingredients together)
- The array dummy has an explicit-shape extent that depends on another dummy argument (here: dimension(12, 3*nm) — the 3*nm extent depends on the integer dummy nm).
- The matmul expression uses transpose(T) directly inside the call (so the compiler is presumably trying to fuse the transpose with the matmul rather than materialise an intermediate array).
The reproducer ships two control variants that each remove exactly one of those ingredients and demonstrate that neither alone triggers the bug:
- compute_y_hardcoded keeps matmul(transpose(T), X) but hardcodes the shape as dimension(12, 15) → correct result.
- compute_y_materialised keeps the dummy-extent shape but assigns T_t = transpose(T) first, then computes Y = matmul(T_t, X) → correct result.
Symptom characterisation
The "fixed-extent" cells of Y (rows 1..12, which depend only on the hardcoded 12 extent of T) are computed correctly. Rows 13..15 of Y — the rows that only exist because of the dummy-extent 3*nm — are where the result is wrong. So it looks like the compiler is emitting code that walks transpose(T) with the wrong stride for the high-index cells.
How to reproduce
Open reproducer.sln in VS 2026 with ifx 2026 installed, build & run Release|x64. Or from a shell:
& "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 vs2026 & "C:\Program Files\Microsoft Visual Studio\18\Community\Common7\IDE\devenv.com" reproducer.sln /Build "Release|x64" .\bin\Release64\reproducer.exe
Release output (bug):
Variant 1: bug version (dummy-extent + transpose-in-matmul) Y(13,13) = 0.000000 (expected 70) FAIL Variant 2: control hardcoded dimension(12,15) Y(13,13) = 70.000000 (expected 70) PASS Variant 3: control dummy-extent + materialised transpose Y(13,13) = 70.000000 (expected 70) PASS
Debug output: all three variants pass.
Originating site
Found while debugging a wrong-result failure in a structural-FEM solver. The original site is
Ke = matmul(transpose(T), matmul(Kp, T))
with T :: dimension(12, 3*(2+nm1+nm2)). The matmul of a stiffness Kp (12×12) sandwiched by a transfer matrix T (12×N), producing an element stiffness Ke (N×N). Cells of Ke whose row index falls in the part of T's extent that depends on nm1+nm2 came out zero. The reproducer reduces this to the smallest configuration that still misbehaves (one matmul, one transpose, one dummy integer in the shape, a single nonzero element).
Workarounds in production
Either hardcode the extents (often impractical), or materialise the transpose:
block real(8), allocatable :: T_t(:,:) allocate(T_t(size(T,2), size(T,1))) T_t = transpose(T) Y = matmul(T_t, X) end block
Disabling optimisation on the affected routine also hides the bug but isn't a viable production fix.
Happy to test patches or provide more detail. Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Michael:
I can not open your repo:
https://bitbucket.org/k2fem/intel-oneapi2026-matmul-transpose-bug/src/main/
The page is 404.
How many files do we need to reproduce the issue? Can you just attach the needed files here?
It seems I only need your subroutine
subroutine compute_y(nm, T, X, Y) integer, intent(in) :: nm real(8), dimension(12, 3*nm), intent(in) :: T real(8), dimension(12, 3*nm), intent(in) :: X real(8), dimension(3*nm, 3*nm), intent(out) :: Y Y = matmul(transpose(T), X) end subroutine
and the
matmul(A,B)
subroutine, and the driver to set up 3 variants to run.
Would you please provide the actual files instead of a repo link?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I apologise, I forgot to check whether the repo was accessible. I accidentally left it 'private'.
It should be accessible now.
I think it's easier cloning the whole repo.
If that doesn't work, please let me know, I will post it here.
Thank you.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page