In many cases, the compiler

dingjun_chencmgl_ca · ‎10-25-2013

HI, Dear Sir or Madam,

I am testing Intel 2013 Fortran Compiler (v14.0) with my fortran application. I changed some OpenMP parallel do statements into the following OpenMP Parallel do SIMD and found out there was not any performance gain obtained. If so, why did you introduce such a new feature into the 2013 Fortran compiler? I am wondering about it. Could you explain more details to me so that I can use it properly? Thanks in advance.

!$OMP PARALLEL DO SIMD

do-loop

[!$OMP END PARALLEL DO SIMD

TimP · ‎10-25-2013

In many cases, the compiler would perform the simd vectorization regardless of whether the simd clause is present. In other cases, the compiler might conclude correctly there is little benefit and perform the vectorization only when you specify it with simd. Examination of -opt-report or -vec-report6 should give you some idea whether simd was able to make a difference.

If you have the desirable situation of nested loops where you specify omp parallel do for the outer loop, and the inner loop is to be vectorized with simd, you should not add the simd clause to the outer loop. That would invoke "outer loop vectorization" which is something of a last resort for the situation where it's not possible to vectorize the inner loop.

If you want to test the compiler for the situation where the inner loop should be vectorized, you can leave the omp parallel do as is on the outer loop and set !$omp simd .... on the inner loop.

The 14.0.1 compiler has some situations where the legacy directive

!dir$ vector aligned

is no longer effective to improve performance of a loop with a conditional. I'm told this is unintentional, and might be changed back.

It might be replaced by e.g.

!$omp simd aligned(a,b,c,...)

where all the aligned arrays are named in hope of improving vectorization. Unfortunately, this doesn't improve performance over the old directive.

By the way, asserting alignment on the omp parallel do loop will work only where the loop count is properly related to hardware vector length times number of threads, as well as requiring the array to be aligned.

Needless to say, you'll probably find a variety of opinions on this.

dingjun_chencmgl_ca · ‎10-29-2013

Thanks. Tim.

I am not clear about the the following indication: such as

Z:\imex\bin\opt64vector\gmresp.f(344): (col. 7) remark: LOOP WAS VECTORIZED

but in fact, there is no such a loop on line 344 col. 7 in my source codes. What does above statement mean? I am confusing about it.

Could you explain more details? Thanks again.

The following is the output from Intel Fortran V14 compiler for windows.

ifort.exe /Qauto /cm /nologo /w /MT /QxHost /arch:AVX /Qsimd /align:array64byte /Qprec-div- /Qopt-matmul /O3 /Qopenmp-simd /Qvec-report:6 /Qfpp /Qopenmp -c gmresp.f

ifort: command line remark #10382: option '/QxHOST' setting '/QxAVX'

Z:\imex\bin\opt64vector\gmresp.f(345): (col. 10) remark: vectorization support: reference ILVTCA has unaligned access

Z:\imex\bin\opt64vector\gmresp.f(345): (col. 10) remark: vectorization support: reference ILVTCA has aligned access

Z:\imex\bin\opt64vector\gmresp.f(345): (col. 10) remark: vectorization support: unaligned access used inside loop body

Z:\imex\bin\opt64vector\gmresp.f(344): (col. 7) remark: vectorization support: unroll factor set to 4

Z:\imex\bin\opt64vector\gmresp.f(344): (col. 7) remark: LOOP WAS VECTORIZED

Z:\imex\bin\opt64vector\gmresp.f(397): (col. 7) remark: vectorization support: reference GMRESP$NVARESC has unaligned access

Z:\imex\bin\opt64vector\gmresp.f(397): (col. 7) remark: vectorization support: unaligned access used inside loop body

Z:\imex\bin\opt64vector\gmresp.f(397): (col. 7) remark: LOOP WAS VECTORIZED

Z:\imex\bin\opt64vector\gmresp.f(397): (col. 7) remark: loop was not vectorized: not inner loop

Z:\imex\bin\opt64vector\gmresp.f(449): (col. 19) remark: loop was not vectorized: type conversion prohibits vectorization

Z:\imex\bin\opt64vector\gmresp.f(567): (col. 28) remark: loop was not vectorized: type conversion prohibits vectorization

Z:\imex\bin\opt64vector\gmresp.f(657): (col. 28) remark: loop was not vectorized: type conversion prohibits vectorization

Z:\imex\bin\opt64vector\gmresp.f(632): (col. 13) remark: loop was not vectorized: unsupported loop structure

Z:\imex\bin\opt64vector\gmresp.f(747): (col. 31) remark: loop was not vectorized: type conversion prohibits vectorization

Z:\imex\bin\opt64vector\gmresp.f(722): (col. 16) remark: loop was not vectorized: unsupported loop structure

Z:\imex\bin\opt64vector\gmresp.f(818): (col. 16) remark: loop was not vectorized: existence of vector dependence

Z:\imex\bin\opt64vector\gmresp.f(820): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 820 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed FLOW dependence between HMAT line 819 and XMU line 820

Z:\imex\bin\opt64vector\gmresp.f(820): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 820 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed FLOW dependence between HMAT line 819 and XMU line 820

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 819 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed FLOW dependence between HMAT line 819 and XMU line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 819 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed FLOW dependence between HMAT line 819 and XMU line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 819 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed FLOW dependence between HMAT line 819 and XMU line 820

Z:\imex\bin\opt64vector\gmresp.f(820): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 820 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(819): (col. 16) remark: vector dependence: assumed FLOW dependence between HMAT line 819 and XMU line 820

Z:\imex\bin\opt64vector\gmresp.f(820): (col. 16) remark: vector dependence: assumed ANTI dependence between XMU line 820 and HMAT line 819

Z:\imex\bin\opt64vector\gmresp.f(908): (col. 31) remark: loop was not vectorized: type conversion prohibits vectorization

Z:\imex\bin\opt64vector\gmresp.f(904): (col. 19) remark: loop was not vectorized: not inner loop

Z:\imex\bin\opt64vector\gmresp.f(972): (col. 22) remark: loop was not vectorized: type conversion prohibits vectorization

Z:\imex\bin\opt64vector\gmresp.f(979): (col. 13) remark: vectorization support: reference HMAT has aligned access

Z:\imex\bin\opt64vector\gmresp.f(978): (col. 10) remark: vectorization support: unroll factor set to 4

Z:\imex\bin\opt64vector\gmresp.f(978): (col. 10) remark: LOOP WAS VECTORIZED

.....................................................................................

........................................................................................

dingjun_chencmgl_ca · ‎10-29-2013

By the way,

Some loops are hopefully wanted to be vectorized. I can use the following, for example,

/******** an example ***********/

!dir$ OMP SIMD

do ks = ksts(icatis(ica)+1)+1, kstscf(ica)

tempc1 = tempc1 + v(ks)*vq(ks,itt)
end do

/******** the end of example ***********/

to direct the Fortran compiler.

but there are some loops that are not wanted to be vectorized. Could you tell me how to direct the compiler to handle such a case?

Can I use the following

!dir$ OMP SIMD NOVECREMAINDER

directive to complete above function?

Thanks and I look forward to hearing from you.

TimP · ‎10-29-2013

dingjun.chencmgl.ca wrote:

!dir$ OMP SIMD

do              ks = ksts(icatis(ica)+1)+1, kstscf(ica)

                   tempc1 = tempc1 + v(ks)*vq(ks,itt)
end do

Can I use the following

!dir$ OMP SIMD NOVECREMAINDER

Your loop would be appropriate for

!$OMP SIMD reduction(+: tempc1)

if you want vectorization

!dir$ novector

if you want no vectorization.

!dir$ simd novecremainder

would not prevent vectorization of the main body of the loop; it would only prevent vectorized remainder, such as you might see for AVX or MIC, for the left-over loop iterations which don't fit in the vector count.

Setting a simd directive on a loop but omitting clauses required by the context, such as reduction, produces unspecified results (may work on one platform and break on another).

TimP · ‎10-29-2013

The LOOP WAS VECTORIZED report usually refers to the file name and line which comes first inside the vectorized loop or array assignment.

I guess you have multi-rank array assignments, since you have notations that both an inner and an outer loop are on the same source line. The compiler would try to vectorize the innermost (smallest stride) part, unless it seems desirable to expand the whole thing into a single vectorizable loop.

If you are using array notation, it seems the loop you show would be better written as dot_product. Then you would still have the option of setting !dir$ novector but not the option of using OpenMP nor SIMD.

You also have notations indicating that you have a mixture of data types unsuitable for vectorization in some loops or array assignments.

John_Campbell · ‎10-29-2013

You could you write the following and let ifort optimise from there:

tempc1 = dot_product ( V (ksts(icatis(ica)+1)+1:kstscf(ica)), &
vq(ksts(icatis(ica)+1)+1:kstscf(ica),itt) )

For clarity, I'd also include :
k1 = ksts(icatis(ica)+1)+1
k2 = kstscf(ica)
tempc1 = dot_product ( V (k1:k2), vq(k1:k2,itt) )

I'd expect that ifort should provide this by default.

Auto vectorisation is probably easier than explicit SIMD specification, which I understand you are trying to test.

John

dingjun_chencmgl_ca · ‎10-30-2013

Could you tell me where I can locate the number of line, for example line 404,? I am sure it is not the number of lines in my source code. Therefore, I cannot insert such a statement in my source codes. What is the real number of line? Thanks.

Rebuilding "grwritecn.obj" on host "DINGJUN.cgy.cmgl.ca"

======== Finished "symfcp.obj" on host "DINGJUN.cgy.cmgl.ca" ======== ifort.exe /Qauto /cm /nologo /w /MT /QxHost /arch:AVX /QaxAVX /align:array64byte /Qprec-div- /Qopt-matmul /O3 /Qopenmp-simd /Qparallel /Qguide /Qopt-subscript-in-range /Qfpp /Qopenmp -c symfcp.f

ifort: command line remark #10382: option '/QxHOST' setting '/QxAVX'

GAP REPORT LOG OPENED ON Wed Oct 30 14:09:39 2013

Z:\imex\bin\opt64vector\symfcp.f(404): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 404 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(407): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 407 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(422): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 422 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(422): remark #30525: (PAR) Insert a "!dir$ loop count min(512)" statement right before the loop at line 422 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 512 iterations.

Z:\imex\bin\opt64vector\symfcp.f(430): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 430 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(430): remark #30525: (PAR) Insert a "!dir$ loop count min(512)" statement right before the loop at line 430 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 512 iterations.

Z:\imex\bin\opt64vector\symfcp.f(517): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 517 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(559): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 559 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(681): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 681 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ISTICA. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(681): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 681 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(750): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 750 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(768): remark #30525: (PAR) Insert a "!dir$ loop count min(512)" statement right before the loop at line 768 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 512 iterations.

Z:\imex\bin\opt64vector\symfcp.f(798): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 798 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(798): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 798 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(815): remark #30525: (PAR) Insert a "!dir$ loop count min(512)" statement right before the loop at line 815 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 512 iterations.

Z:\imex\bin\opt64vector\symfcp.f(819): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 819 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(819): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 819 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(841): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 841 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(845): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 845 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(845): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 845 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(855): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 855 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: JLIST, JLCTNC. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(855): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 855 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(864): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 864 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(868): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 868 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(868): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 868 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(878): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 878 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: JLIST, JUCTNC. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(878): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 878 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(899): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 899 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(917): remark #30525: (PAR) Insert a "!dir$ loop count min(512)" statement right before the loop at line 917 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 512 iterations.

Z:\imex\bin\opt64vector\symfcp.f(944): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 944 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(944): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 944 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(993): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 993 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(997): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 997 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: ILIST. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(997): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 997 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(1007): remark #30519: (PAR) Insert a "!dir$ parallel" statement right before the loop at line 1007 to parallelize the loop. [VERIFY] Make sure that these arrays in the loop do not have cross-iteration dependencies: JLIST, JABTNC. A cross-iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (by a read or a write) in another iteration of the loop.

Z:\imex\bin\opt64vector\symfcp.f(1007): remark #30525: (PAR) Insert a "!dir$ loop count min(128)" statement right before the loop at line 1007 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 128 iterations.

Z:\imex\bin\opt64vector\symfcp.f(1026): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 1026 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Z:\imex\bin\opt64vector\symfcp.f(1043): remark #30525: (PAR) Insert a "!dir$ loop count min(256)" statement right before the loop at line 1043 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.

Number of advice-messages emitted for this compilation session: 38.

END OF GAP REPORT LOG

Test 2013 Fortran Compiler (version 14.0) SIMD option