Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28861 Discussions

wrong results with SSE4.2 vectorization and ifort 19

gn164
Beginner
1,664 Views

 

ifort 19 and 18 produces wrong results for the following code when SSE4.2 vectorization is enabled.

When compiled with AVX it produces the correct results

Previous versions of the compiler seem to produce the correct result for this code and SSE4.2

See below the outputs when compiled with the different options. The executable runs on a Intel Xeon CPU E3-1240 v3 @ 3.40GHz

corr.f

 

 

(removed by customer request - @gn164 , let me know if this is what you wanted. thanks! Mary T. intel.community@intel.com)

main.f

 

 

(removed by customer request)

$INTEL16_HOME/bin/ifort -O3 -xSSE4.2 -o sseTest main.f corr.f

 

 

   67.00000       67.00000       67.00000       67.00000       67.00000    
   67.00000       67.00000       67.00000       67.00000       67.00000    
   67.00000

 

 

 

$INTEL19_HOME/bin/ifort -O3 -xSSE4.2 -o sseTest main.f corr.f

 

 

   67.00000       67.00000       67.00000       67.00000       67.00000    
   67.00000       67.00000       59.00000       67.00000       59.00000    
   59.00000 

 

 

 

$INTEL19_HOME/bin/ifort -O3 -xAVX -o sseTest main.f corr.f

(removed by customer request)

 

 

0 Kudos
3 Replies
mecej4
Honored Contributor III
1,647 Views

This bug is also seen with 18. 0.5.274 and 2021.1.1.216 on Windows, with /QxSSE4.2 /O3.

The bug is not seen with 16.0.8.254 on Windows, using the same options.

0 Kudos
Ron_Green
Moderator
1,591 Views

This issue occurs at O3 only.  and SSE4.1 or 4.2 only.  with 18.x, 19.0.x and 19.1.x compilers.  I entered a bug report CMPLRIL0-32474

Another interesting tidbit - if you combine corr.f and main.f into 1 source file the error goes away! 

We'll get working on a fix.

cgg_distribution
Beginner
1,584 Views

Hi Ronald,

Thanks, the error also goes away if the array sizes and strides that are passed to the function are defined within the function scope.

I am assuming that the generated code is different if these are known to the vectorizer and that combining the functions in the same file would also make those visible if there is some interprocedural  optimization by default within the translation unit.

 

0 Kudos
Reply