OpenMP parallel loop with ordered section is slow


In the attached C sample I try to parallelize formatted writes to file by doing the formatting into a string buffer inside OpenMP parallel for and then writing the buffer to file with fwrite (inside an OpenMP ordered section). I see nice speedups with gcc, however slowdown instead with icc.

The attached code first writes the data in a serial fashion for reference, than does it in parallel and compares timings.

icc -O2 -qopenmp parawrite.c
$ OMP_NUM_THREADS=6 ./a.out
Time elapsed serial: 6.723 s
Time elapsed parallel: 9.325 s

$icc -v
icc version (gcc version 8.2.0 compatibility)

$ gcc -O2 -fopenmp parawrite.c
$ OMP_NUM_THREADS=6 ./a.out
Time elapsed serial: 6.721 s
Time elapsed parallel: 1.206 s

$gcc --version
gcc (GCC) 8.2.0

Any ideas what is going on? BTW I see a similar (bad) behavior also with equivalent code in Fortran and ifort...

Best regards
Pavel Ondračka

Thank for reporting this issue. I've filed an internal bug (CMPLRIL0-31316) to track this problem.

