08-26-2009 01:06 AM
08-26-2009 03:22 PM
1: any loops with a subroutine or function call in them.
The trick then is to get the compiler to inline the call, then the compiler might attempt to
2: any loops with dependencies, there are quite a few variants.
The most common are loops with the following in them:
a(j) = ... a(j+1)
3: any loops with CDEC$ noparallel
Beware when using -parallel and -O3 on code with nested loops, because the compiler may interchange loops and then it is not exactly clear where the compiler directive needs to be placed. You have to look very carefully at the opt-report and decipher what the compiler is doing.
08-26-2009 10:53 PM
1. In order to parallelize loops with a subroutine or function call in them, it is not necessary to inline the call - it is sufficient for a call to be thread safe. The auto-parallelizer I developed treats this succesfully. On my web site, two examples of the code I parallelized have function calls inside the loops being parallelized. Do you know if icc can parallelize these code?
2. loop with dependencies in your example, seems to be inherently not-parallelizable.
08-26-2009 11:31 PM
I seem to have not understood your email. You asked for loops that the Intel fortran compiler would not parallelize.
Perhaps you meant paralellisable loops that the Fortran compiler would not parallelise.
As far as Iknow,ifort -parallel will not parallelize ANY loop with a function or subroutine call in it.
In the past, I have had to get the compiler to inline the call (-ip or -ipo). Once that is done, the compiler can determine if the inlined code is threadsafe or not.
Or, put explicit OpenMP directives around it after ensuring that the called subroutine is threadsafe.
I just tried it and even with !DEC$ PARALLEL before a loop containing a subroutine, the compiler reported that the loop was not parallelized: existance of parallel dependence.
08-27-2009 12:11 AM
I finally found the reference in the Compiler docs I was looking for:
The compiler can only effectively analyze loops with a relatively simple structure. For example, the compiler cannot determine the thread safety of a loop containing external function calls because it does not know whether the function call might have side effects that introduce dependences. Fortran90 programmers can use the PURE attribute to assert that subroutines and functions contain no side effects. You can invoke interprocedural optimization with the -ipo (Linux* OS and Mac OS X) or /Qipo (Windows) compiler option. Using this option gives the compiler the opportunity to analyze the called function for side effects.
I tried declaring a PUREexternal subroutine in a module, but I still could not get the compiler to parallelize the loop
containing the subroutine call using 11.0.074