Solved: Regression in 2025.2? Warning 5474 for whole array assignment in omp parallel workshare

Mark_Lewy · ‎07-18-2025

Consider this MRE:

    program w5474

    implicit none
    
    real, allocatable :: arr(:)

    allocate(arr(1000))

    !$omp parallel workshare
    arr = 0

    !$omp end parallel workshare

    end program w5474

When compiled with "ifx /Qopenmp w5474.f90" using Version 2025.2.0 Build 20250605 on Windows, this produces the following warning:

w5474.f90(10): warning #5474: Statement in an OpenMP PARALLEL WORKSHARE construct cannot be parallelized; the statement will only be executed by one thread of one team.
arr = 0
----^

Older versions of ifx and ifort and gfortran do not produce a warning. This code looks as though it should be parallelised, so I'm assuming a regression?

Ron_Green · ‎07-22-2025

ifx 2025.2.0 is just being more open and honest about what it is doing in this example. That ifort or older compilers did not report a warning here does not not mean it did WORKSHARE.

Using the old unsupported ifort compiler's optimization report you can see that it is also NOT doing workshare for this example.

ifort -O2 -xhost -qopenmp -qopt-report 5 ompwork.f90

If you look at the ompwork.optrpt you will see the following

    Report from: OpenMP optimizations [openmp]

OpenMP Construct at ompwork.f90(9,11)
remark #16204: OpenMP multithreaded code generation for SINGLE was successful
OpenMP Construct at ompwork.f90(9,11)
remark #16201: OpenMP DEFINED REGION WAS PARALLELIZED

    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]

LOOP BEGIN at ompwork.f90(10,5)
   remark #25408: memset generated
   remark #15398: loop was not vectorized: loop was transformed to memset or memcpy
LOOP END

Breaking this down: the OpenMP optimizations shows that the OMP region was parallelized BUT it made it SINGLE. WHy?? Keep reading ...
LOOP at (10,5) was replaced by a call to MEMSET to set memory to zero.

The compiler decided that 'arr = 0' is fastest with a single call to MEMSET. so the OMP region sees a call to memset, nothing to split up the work for workshare.

This is one situation to keep in mind when mixing OMP and array expressions: an array expression is NOT a loop in all situations. When you use an array expression you give the compiler the liberty to implement it anyway it sees fit. In this case, the compiler decided 'arr=0' is best done by a simple call to memset. So the OMP directives, particularly WORKSHARE, is simply not applicable and it generated an OMP SINGLE for you. Thank you ifort!

At least ifx tells you it's only going to use 1 thread instead of leading you on to believe it's actually going to use WORKSHARE. It's not a loop. it's an array expression.

Maybe our older versions of ifx did or did not give a warnings. Warnings are added all the time to help alert users when they are making a mistake or may get unusual results. Warnings improve over time.

Now, before we go on, another warning. Doing this with IFX has another interesting optimization that you will find in it's optimization report ... look at your example closely. Do you ever use the results of the express 'arr = 0' ? Do you ever even do:
Print*, arr(1)
? No. You never use the data. LLVM does optimization flow analysis. In this case you never use the result of the memset to arr, so it REMOVES YOUR DEAD CODE - removes the call completely! This optimization is called dead code elimination and LLVM is really really good at this optimization, far better than ifort's IL0 optimizer. THis trips up a lot of people trying to time simple code kernels where the data is never used, printed, assigned to other vars that are used later. There is an anecdote from the old days of a sample like this compiled by the CDC compiler. This compiler decided that the whole program was dead code so the entire program was a NOOP and executed in zero time. Hey, if you don't need the data then it can simply do nothing! A fun little trivia on dead code elimination.

I will anticipate your next comment "Yeah but" "gfortran does ....". That is their implementation making it's decisions. An implementation is free to optiimize an array expression however they wish. We see that we make it a call to memset. What is gfortran really doing with this array expression? Maybe check an opt report or assembly code. Intel Fortran makes this a memset because it's optimal in it's opinion.

View solution in original post

Ron_Green · ‎07-22-2025

ifx 2025.2.0 is just being more open and honest about what it is doing in this example. That ifort or older compilers did not report a warning here does not not mean it did WORKSHARE.

Using the old unsupported ifort compiler's optimization report you can see that it is also NOT doing workshare for this example.

ifort -O2 -xhost -qopenmp -qopt-report 5 ompwork.f90

If you look at the ompwork.optrpt you will see the following

    Report from: OpenMP optimizations [openmp]

OpenMP Construct at ompwork.f90(9,11)
remark #16204: OpenMP multithreaded code generation for SINGLE was successful
OpenMP Construct at ompwork.f90(9,11)
remark #16201: OpenMP DEFINED REGION WAS PARALLELIZED

    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]

LOOP BEGIN at ompwork.f90(10,5)
   remark #25408: memset generated
   remark #15398: loop was not vectorized: loop was transformed to memset or memcpy
LOOP END

Breaking this down: the OpenMP optimizations shows that the OMP region was parallelized BUT it made it SINGLE. WHy?? Keep reading ...
LOOP at (10,5) was replaced by a call to MEMSET to set memory to zero.

The compiler decided that 'arr = 0' is fastest with a single call to MEMSET. so the OMP region sees a call to memset, nothing to split up the work for workshare.

This is one situation to keep in mind when mixing OMP and array expressions: an array expression is NOT a loop in all situations. When you use an array expression you give the compiler the liberty to implement it anyway it sees fit. In this case, the compiler decided 'arr=0' is best done by a simple call to memset. So the OMP directives, particularly WORKSHARE, is simply not applicable and it generated an OMP SINGLE for you. Thank you ifort!

At least ifx tells you it's only going to use 1 thread instead of leading you on to believe it's actually going to use WORKSHARE. It's not a loop. it's an array expression.

Maybe our older versions of ifx did or did not give a warnings. Warnings are added all the time to help alert users when they are making a mistake or may get unusual results. Warnings improve over time.

Now, before we go on, another warning. Doing this with IFX has another interesting optimization that you will find in it's optimization report ... look at your example closely. Do you ever use the results of the express 'arr = 0' ? Do you ever even do:
Print*, arr(1)
? No. You never use the data. LLVM does optimization flow analysis. In this case you never use the result of the memset to arr, so it REMOVES YOUR DEAD CODE - removes the call completely! This optimization is called dead code elimination and LLVM is really really good at this optimization, far better than ifort's IL0 optimizer. THis trips up a lot of people trying to time simple code kernels where the data is never used, printed, assigned to other vars that are used later. There is an anecdote from the old days of a sample like this compiled by the CDC compiler. This compiler decided that the whole program was dead code so the entire program was a NOOP and executed in zero time. Hey, if you don't need the data then it can simply do nothing! A fun little trivia on dead code elimination.

I will anticipate your next comment "Yeah but" "gfortran does ....". That is their implementation making it's decisions. An implementation is free to optiimize an array expression however they wish. We see that we make it a call to memset. What is gfortran really doing with this array expression? Maybe check an opt report or assembly code. Intel Fortran makes this a memset because it's optimal in it's opinion.

Ron_Green · ‎07-22-2025

I should also say that if you want "predictable" OpenMP behavior, use explicit loops and not array syntax. Like you, I envision array expressions as loops. So I think loop semantics apply, including OMP directives controlling those loops. But compilers may not see them as loops.

Mark_Lewy · ‎07-24-2025

Thanks Ron for the comprehensive answer. That all makes sense to me, having looked at the workshares in the original solution that trip the warning, most of them are array assignments to scalars (so using memset) or copies (so using memcpy, I assume). I also found some workshares that didn't produce the warning, for example, calls to elemental functions with array arguments, which makes sense to me.

Ron_Green · ‎07-22-2025

so many layers to this example. Another consideration - when are the OMP directives processed - before or after optimization? Compilers can vary on this choice in phasing of optimization relative to parallelization.