we are doing some experiments with the option "-parallel", on icc 18.0.0, and using the flags -qopt-report=5 and -qopt-report-phase=all to obtain the maximum information from the reports. However, it seems that the reports are not providing complete information about what OpenMP pragmas are automatically inserted by the "-parallel" option.
In the experiment, we first compiled a small application with a well defined kernel (e.g., benchmark 'adi' in PolyBench/C) with icc and the option "-parallel", as well as the report options. When running the resulting binary, we had around ~50% performance improvement.
Then, we manually insert in the original source code the OpenMP pragmas that the reports indicate that "-parallel" added, and compiled again without "-parallel" (but with "-qopenmp"). When running the version with the manually inserted OpenMP pragmas, we got no performance benefit (i.e., the execution time is the same as the serial version). Initially we assumed that "-parallel" was enabling other optimizations besides the OpenMP pragmas. However, changing the environment variable OMP_NUM_THREADS had an effect in the performance of the binary compiled with the flag "-parallel" (e.g., increasing the number of threads, decreased the execution time), which seems that the code was parallelized with OpenMP pragmas that did not appear in the report.
Could you confirm that the option '-qopt-report' might not show all the OpenMP pragmas inserted by the option "-parallel"? Is there any way to obtain this information?
It might be interesting if the opt-report were suitable for this purpose. However, I have doubts. Use of explicit #pragma omp will disable some optimizations, such as loop nest optimization.
You could examine the asm view, possibly with the help of Intel Parallel Advisor.
Tim P. wrote:
Use of explicit #pragma omp will disable some optimizations, such as loop nest optimization.
I understand. However, according to the report, when using the "-parallel" option in this example, icc does not put a #pragma omp around the main computation kernel. All the pragmas in the report are around some initialization code, hence the question. It seems that icc is parallelizing the main kernel with OpenMP, because of the performance behavior when changing the environment variable OMP_NUM_THREADS. However, the report does not contain any pragma around the main kernel.
I think I have found out what the problem was. I was compiling the code using the following flags:
icc ./adi-icc-parallel/adi.c ./utilities-serial/polybench.c -I./adi-icc-parallel -I./utilities-serial -parallel -qopt-report-phase=all -qopt-report:5 -g -O2 -D_Float128=__float128 -qopt-report-file=report.txt
The problem is in the last flag:
When I use this flag, it only generates one file. When I remove this flag, it generates two files, one for each object file (adi.optrpt and polybench.optrpt). It seems that when the flag is active, it will use the same filename for all report files, and overwrite the previous one, in succession. If I change the order of the input .c files and specify a report file, the report file will have the contents of the last object file.
Looking at the documentation for that flag, it says that "If you prefer another form of output, you can specify option -qopt-report-file or /Qopt-report-file.". From the description, for me it was not clear that it will just generate the report for the last compiled object, I was expecting icc to concatenate all the reports into a single file.
It seems that when the flag is active, it will use the same filename for all report files, and overwrite the previous one, in succession.
Looks like a bug for me. I'll try to find out if this is a known issue.