Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

OpenMP and Implicit Optimizations

dead-paulie
Beginner
425 Views
When the "openmp" option is used as an option for the compiler and for the linker, are there any implicit optimizations that are enabled?

Thanks!

- paul
0 Kudos
8 Replies
KitturGanesh
Employee
425 Views
Quoting - dead_paulie
When the "openmp" option is used as an option for the compiler and for the linker, are there any implicit optimizations that are enabled?

Thanks!

- paul

Hi Paul,

The -openmp option just enables the generation of multithreaded code based on what openmp directives you may have in the code. What ever optimization level you've chosen (such as -O0 (say helps debugging), -O1, -O2 (is default), or -O3) is what will be used by the optimizer in addition any other optimizations switches you may have used. So, the openmp option is mainly for parallization of the code based on openmp directives. Also, you generate openmp diagonostics reports you'll find where code sections were parallelized as such.

-regards,
Kittur
0 Kudos
dead-paulie
Beginner
425 Views

Hi Paul,

The -openmp option just enables the generation of multithreaded code based on what openmp directives you may have in the code. What ever optimization level you've chosen (such as -O0 (say helps debugging), -O1, -O2 (is default), or -O3) is what will be used by the optimizer in addition any other optimizations switches you may have used. So, the openmp option is mainly for parallization of the code based on openmp directives. Also, you generate openmp diagonostics reports you'll find where code sections were parallelized as such.

-regards,
Kittur

Hello Kittur,

I am working with a code base that will be using OpenMP for parallelization. Currently, there are NO OpenMP directives in the code. The executable will log information to a text file. As a incremental verification test, I simply placed the "-openmp" option in my make files for C++ and Fortran compiles and for executable builds. When I run the original version and the new version, there are differences in the text files!

Once again, I only added the "-openmp" option, and no OpenMP directives were added. So, I hypothesized that some implicit optimizations could be added at compile time?

I would have thought that the code would behave the same if no OpenMP constructs were present. I am using version 09.01.052 of the C++ and Fortran compiler.

Thanks!

- paul

0 Kudos
TimP
Honored Contributor III
425 Views
Openmp adds potentially a lot of code; function calls to OpenMP run-time library, and the like, to implement each directive. In the main program, you get implicit OpenMP code even with no directives, but it doesn't have much effect until you have parallel regions (may increase startup time).
OpenMP probably reduces optimization in critical regions, and the like.
In Fortran, OpenMP changes the default local memory allocation to all automatic. If your code is incorrect, and depends on default SAVE, or has uninitalized variables, the breakage will likely be exposed even at 1 thread.
For people who like to exaggerate OpenMP threaded scaling by not optimizing their code, OpenMP doesn't get in the way.
0 Kudos
dead-paulie
Beginner
425 Views
Quoting - tim18
Openmp adds potentially a lot of code; function calls to OpenMP run-time library, and the like, to implement each directive. In the main program, you get implicit OpenMP code even with no directives, but it doesn't have much effect until you have parallel regions (may increase startup time).
OpenMP probably reduces optimization in critical regions, and the like.
In Fortran, OpenMP changes the default local memory allocation to all automatic. If your code is incorrect, and depends on default SAVE, or has uninitalized variables, the breakage will likely be exposed even at 1 thread.
For people who like to exaggerate OpenMP threaded scaling by not optimizing their code, OpenMP doesn't get in the way.

Is what you described explicitly documented anywhere?
0 Kudos
TimP
Honored Contributor III
425 Views
Quoting - dead_paulie

Is what you described explicitly documented anywhere?
The effect of OpenMP on Fortran default memory allocation has been discussed on the Fortran forum. The ifort default which has local arrays default SAVE without OpenMP is incompatible with OpenMP, so the -auto option is implied by -openmp. The -save option to make local scalars SAVE by default also in incompatible with OpenMP. A correct program will work the same either way, except that SAVE isn't supported in parallel regions. Likewise, C static variables don't work in OpenMP parallel regions. This has been discussed many times. I wouldn't expect it to be an issue when you run 1 thread, but Intel Thread Checker surely ought to complain.
The fact of OpenMP option implying OpenMP library initializations in the main program is visible if you compare the asm code. I've never before seen anyone indicate this might be a surprise. A correct C or Fortran program will run the same whether you compile with or without OpenMP option. I have advocated using the OpenMP option at link time when linking OpenMP compiled libraries such as MKL, so the compiler can choose the OpenMP support libraries, even though the source code has no OpenMP directives. I'll concede there is a possibility that an incorrect program might run OK if you don't follow my advice, but will break if you do follow my advice.
0 Kudos
dead-paulie
Beginner
425 Views
Quoting - tim18
The effect of OpenMP on Fortran default memory allocation has been discussed on the Fortran forum. The ifort default which has local arrays default SAVE without OpenMP is incompatible with OpenMP, so the -auto option is implied by -openmp. The -save option to make local scalars SAVE by default also in incompatible with OpenMP. A correct program will work the same either way, except that SAVE isn't supported in parallel regions. Likewise, C static variables don't work in OpenMP parallel regions. This has been discussed many times. I wouldn't expect it to be an issue when you run 1 thread, but Intel Thread Checker surely ought to complain.
The fact of OpenMP option implying OpenMP library initializations in the main program is visible if you compare the asm code. I've never before seen anyone indicate this might be a surprise. A correct C or Fortran program will run the same whether you compile with or without OpenMP option. I have advocated using the OpenMP option at link time when linking OpenMP compiled libraries such as MKL, so the compiler can choose the OpenMP support libraries, even though the source code has no OpenMP directives. I'll concede there is a possibility that an incorrect program might run OK if you don't follow my advice, but will break if you do follow my advice.

Tim,

Your comments have been very informative. My legacy fortan code will need to be checked. I also know of some C functions that use static arrays, so I will be sure to examine that code for potential problems.

Most of your advice seems to come from practical, in the weeds, experience. Do any of the man pages or manuals explicitly document these behaviors? I need to maintain a validation paper trail . . .

Thanks!
0 Kudos
TimP
Honored Contributor III
425 Views
If you have difficulty sorting out whether an array is written in OpenMP parallel eligible form, Intel Thread Checker should be helpful. It should flag any attempt to modify a non-shareable array from a parallel region. Unfortunately, last I knew, Thread Checker would not deal with parallel regions in C called from Fortran, or vice versa. Likewise, Parallel Studio is meant to assist in parallelizing C++ projects (which don't include Fortran).
0 Kudos
KitturGanesh
Employee
425 Views
Hi,
I also talked to our openmp expert and here's some more input. In principle, adding -openmp to the command line without any OPENMP pragmas should not make any noticeable difference at all. But, there will be slight differences in the executable that may cause some differences in output. What would be nice is for you to attach a small test case that can show the kind of differences you are seeing? Is it just the precision used to calculate the floating point numbers or is the code actually executing different paths? BTW, these differences can be caused by various other things including a bug in original code itself?

Questions to ponder: 1. Is your code thoroughly debugged? Have you checked for array bound references, make sure it's not referencing deallocated memory, or is the code creating own threads etc?

Suggestions: We don't support old version (9.1) compiler anymore. You should update to the latest 11.0 and try it out. If you still see problems, may be you should file an issue (with reproducer) in Premier so it can be looked at it.

Hope the above helps. Let me know if you still need any more info or clarification, thanks.
-regards,
Kittur
0 Kudos
Reply