Community
cancel
Showing results for 
Search instead for 
Did you mean: 
gn164
Beginner
333 Views

wrong results with SSE4.2 vectorization and ifort 19

 

ifort 19 and 18 produces wrong results for the following code when SSE4.2 vectorization is enabled.

When compiled with AVX it produces the correct results

Previous versions of the compiler seem to produce the correct result for this code and SSE4.2

See below the outputs when compiled with the different options. The executable runs on a Intel Xeon CPU E3-1240 v3 @ 3.40GHz

corr.f

 

 

(removed by customer request - @gn164 , let me know if this is what you wanted. thanks! Mary T. intel.community@intel.com)

main.f

 

 

(removed by customer request)

$INTEL16_HOME/bin/ifort -O3 -xSSE4.2 -o sseTest main.f corr.f

 

 

   67.00000       67.00000       67.00000       67.00000       67.00000    
   67.00000       67.00000       67.00000       67.00000       67.00000    
   67.00000

 

 

 

$INTEL19_HOME/bin/ifort -O3 -xSSE4.2 -o sseTest main.f corr.f

 

 

   67.00000       67.00000       67.00000       67.00000       67.00000    
   67.00000       67.00000       59.00000       67.00000       59.00000    
   59.00000 

 

 

 

$INTEL19_HOME/bin/ifort -O3 -xAVX -o sseTest main.f corr.f

(removed by customer request)

 

 

0 Kudos
3 Replies
mecej4
Black Belt
316 Views

This bug is also seen with 18. 0.5.274 and 2021.1.1.216 on Windows, with /QxSSE4.2 /O3.

The bug is not seen with 16.0.8.254 on Windows, using the same options.

Ron_Green
Moderator
260 Views

This issue occurs at O3 only.  and SSE4.1 or 4.2 only.  with 18.x, 19.0.x and 19.1.x compilers.  I entered a bug report CMPLRIL0-32474

Another interesting tidbit - if you combine corr.f and main.f into 1 source file the error goes away! 

We'll get working on a fix.

cgg_distribution
Beginner
253 Views

Hi Ronald,

Thanks, the error also goes away if the array sizes and strides that are passed to the function are defined within the function scope.

I am assuming that the generated code is different if these are known to the vectorizer and that combining the functions in the same file would also make those visible if there is some interprocedural  optimization by default within the translation unit.

 

Reply