ifx 2026 matmul + transpose codegen bug

michaelkonecny · ‎06-01-2026

Hi all,

I'd like to report a wrong-result code-generation bug in ifx 2026 (oneAPI 2026.0.0) on Windows x64. Minimal reproducer with a Visual Studio 2026 solution and detailed README is in this repo:

Repo: https://bitbucket.org/k2fem/intel-oneapi2026-matmul-transpose-bug/src/main/

TL;DR

subroutine compute_y(nm, T, X, Y)
    integer, intent(in) :: nm
    real(8), dimension(12, 3*nm), intent(in)  :: T
    real(8), dimension(12, 3*nm), intent(in)  :: X
    real(8), dimension(3*nm, 3*nm), intent(out) :: Y

    Y = matmul(transpose(T), X)
end subroutine

With nm = 5 and a single nonzero T(7,13) = 1, X(7,13) = 70, the expected result is Y(13,13) = 70. Under ifx 2026 with Release-mode optimisation (/O2), the routine returns Y(13,13) = 0. With /Od (Debug) the same routine returns the correct value.

Toolchain

Intel oneAPI Fortran compiler (ifx) 2026.0.0, x64
Visual Studio 2026 Community 18.6.0, MSVC v145 (14.51)
Windows 11 22621.6060

What's required to trigger it (both ingredients together)

The array dummy has an explicit-shape extent that depends on another dummy argument (here: dimension(12, 3*nm) — the 3*nm extent depends on the integer dummy nm).
The matmul expression uses transpose(T) directly inside the call (so the compiler is presumably trying to fuse the transpose with the matmul rather than materialise an intermediate array).

The reproducer ships two control variants that each remove exactly one of those ingredients and demonstrate that neither alone triggers the bug:

compute_y_hardcoded keeps matmul(transpose(T), X) but hardcodes the shape as dimension(12, 15) → correct result.
compute_y_materialised keeps the dummy-extent shape but assigns T_t = transpose(T) first, then computes Y = matmul(T_t, X) → correct result.

Symptom characterisation

The "fixed-extent" cells of Y (rows 1..12, which depend only on the hardcoded 12 extent of T) are computed correctly. Rows 13..15 of Y — the rows that only exist because of the dummy-extent 3*nm — are where the result is wrong. So it looks like the compiler is emitting code that walks transpose(T) with the wrong stride for the high-index cells.

How to reproduce

Open reproducer.sln in VS 2026 with ifx 2026 installed, build & run Release|x64. Or from a shell:

& "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 vs2026
& "C:\Program Files\Microsoft Visual Studio\18\Community\Common7\IDE\devenv.com" reproducer.sln /Build "Release|x64"
.\bin\Release64\reproducer.exe

Release output (bug):

Variant 1: bug version (dummy-extent + transpose-in-matmul)
  Y(13,13) =     0.000000   (expected 70)
  FAIL
Variant 2: control hardcoded dimension(12,15)
  Y(13,13) =    70.000000   (expected 70)
  PASS
Variant 3: control dummy-extent + materialised transpose
  Y(13,13) =    70.000000   (expected 70)
  PASS

Debug output: all three variants pass.

Originating site

Found while debugging a wrong-result failure in a structural-FEM solver. The original site is

Ke = matmul(transpose(T), matmul(Kp, T))

with T :: dimension(12, 3*(2+nm1+nm2)). The matmul of a stiffness Kp (12×12) sandwiched by a transfer matrix T (12×N), producing an element stiffness Ke (N×N). Cells of Ke whose row index falls in the part of T's extent that depends on nm1+nm2 came out zero. The reproducer reduces this to the smallest configuration that still misbehaves (one matmul, one transpose, one dummy integer in the shape, a single nonzero element).

Workarounds in production

Either hardcode the extents (often impractical), or materialise the transpose:

block
    real(8), allocatable :: T_t(:,:)
    allocate(T_t(size(T,2), size(T,1)))
    T_t = transpose(T)
    Y = matmul(T_t, X)
end block

Disabling optimisation on the affected routine also hides the bug but isn't a viable production fix.

Happy to test patches or provide more detail. Thanks!

Shiquan_Su · ‎06-02-2026

Hi, Michael:

I can not open your repo:

https://bitbucket.org/k2fem/intel-oneapi2026-matmul-transpose-bug/src/main/

The page is 404.

How many files do we need to reproduce the issue? Can you just attach the needed files here?

It seems I only need your subroutine

subroutine compute_y(nm, T, X, Y)
    integer, intent(in) :: nm
    real(8), dimension(12, 3*nm), intent(in)  :: T
    real(8), dimension(12, 3*nm), intent(in)  :: X
    real(8), dimension(3*nm, 3*nm), intent(out) :: Y

    Y = matmul(transpose(T), X)
end subroutine

and the

matmul(A,B)

subroutine, and the driver to set up 3 variants to run.

Would you please provide the actual files instead of a repo link?

michaelkonecny · ‎06-02-2026

I apologise, I forgot to check whether the repo was accessible. I accidentally left it 'private'.

It should be accessible now.

I think it's easier cloning the whole repo.
If that doesn't work, please let me know, I will post it here.

Thank you.