Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28441 Discussions

Problem with the compilation on the Linux cluster

Marco_M_3
Beginner
668 Views

I'm compiling a code, already tested on Windows on out cluster (CentOS 7, with Intel Xeon CPUs). Everything compiles fine but the code gives this error out:

LLL.f90(3927): (col. 38) remark: unroll pragma will be ignored due to loop cannot be unrolled
LLL.f90(4825): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4811): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4862): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4886): (col. 9) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4913): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4940): (col. 9) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4943): (col. 17) remark: unroll pragma will be ignored due to unroll_and_jam pragma expected
LLL.f90(4889): (col. 17) remark: unroll pragma will be ignored due to loop cannot be unrolled

 

I used the same exact flag of the Windows code and I never received such a warning:

ifort -O3 -ipo-separate -unroll=50 -parallel -threads -qopt-prefetch=3 -qopt-matmul  -assume byterecl -qopenmp  -c LLL.f90

Since I don't want to to change the entire code if it's not necessary, would it be possible to still use !DIR$ NOUNROLL  and !DIR$ UNROLL=n instead of !DIR$ NOUNROLL_AND_JAM. Should I add a specific flag to enforce it? I tried to look in the documentation, but I couldn't find anything. 

For information I used ifortran 15 and ifortran 17 to compile on windows and ifortran 13/15/17 to try to compile on linux, and I recieved the same error for all 3 compilers on CentOS.

Thanks,

Marco

 

PS: NOUNROLL in those particular loops is necessary for how the code has been written and to compare the performance. Here are the compilation flags I used on windows:
 

/nologo /MP /O3 /Qunroll:20 /Qparallel /Qopt-prefetch=3 /Qipo /Qopt-matmul /I"..\lib" /Qopenmp /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:static /threads /c

/MP is windows specific, but the rest are the same. And no warning or error has been generated.

 

**EDIT** 

I tried to change to NOUNROLL_AND_JAM and UNROLL_AND_JAM = 10. Now the error becomes:

LLL.f90(3397): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(3397): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4687): (col. 17) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4125): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4101): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4183): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)

 

0 Kudos
5 Replies
TimP
Honored Contributor III
668 Views

Those aren't errors.  They are simply warnings that a pragma you have entered will have no effect because it is redundant.  There would be no unroll in those positions even without your directive.  Automatic unroll_and_jam rarely happens in my experience, it is only for situations which are the same as ones which the compiler has been taught to recognize for major benchmarks.  As the warnings indicate, you would not want additional unrolling where unroll_and_jam happens.

It seem unlikely that so much unroll would be useful, even if the compiler should implement it.  The default unroll for vectorized sum reduction is excessive in many situations but doesn't get reduced by any options permitting vectorization, in my experience.  Setting Qunroll:4 can be beneficial in many cases where the default unroll is less and there is no parallelization.  I assume you don't have an old CPU (one which doesn't support SSE4.2) which might benefit from more unrolling than the current CPUs do.

0 Kudos
Marco_M_3
Beginner
668 Views

According to the compiler the warning the limit is 16 for the unroll and I choose 10 and now 2 to test both on the directive of the code and the compiler flag.

The problem is that the code seems to do the unrolling from the compiler flag (the only warnings are for the explicit directives in the code).

 

Moreover, in my code ignoring the NOUNROLL WILL result in a wrong code due to how the OMP directive is written to take advantage of the locality of the memory. So I need the code not to ignore that part.

Also the CPU on the cluster are 6-10 years old depending on the node, and the one on my desktop is a non Intel are architecture, so I might benefit from more unrolling.

Marco

0 Kudos
Steve_Lionel
Honored Contributor III
668 Views

The "due to (null)" bothers me - the IPO processing (which is generating these messages) should do a better job with its diagnostics.

0 Kudos
Marco_M_3
Beginner
668 Views

That makes two of us... :)

Do you have any idea why the compiler should expect unroll_and_jam instead of unroll? I tried to find an answer in the documentation but I couldn't find any

0 Kudos
Steve_Lionel
Honored Contributor III
668 Views

Maybe Tim will have some idea - all that stuff is a mystery to me.

0 Kudos
Reply