Compiler warning: remark: unroll pragma will be ignored due to unrolling factor mismatched

Richard_H_6 · ‎11-11-2013

Hi guys,

Do you know how to disable the following warning? It's annoying. The example is quite similar to the the one as in http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/cref_cls/common/cppref_pragma_unroll_nounroll.htm

void unroll(int a[], int b[], int c[], int d[])
{
int i;
#pragma unroll(4)
for (i = 0; i < 16; i++) {
b = a + 1;
d = c + 1;
}
}

$ icc --version
icc (ICC) 13.1.2 20130514

$ icc -c test.c -o test.o
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.

This warning will not be there if the loop size is changed to 32.
void unroll(int a[], int b[], int c[], int d[])
{
int i;
#pragma unroll(4)
for (i = 0; i < 32; i++) {
b = a + 1;
d = c + 1;
}
}
Actually we have a pragma abstraction in our projects. So having unroll(4) makes sense for other compilers (e.g. TI cl6x).
Could you help to give a solution / suggestion on this? At least, could you tell me how to disable it since there's no remark number for it.

Thanks,
Richard

QIAOMIN_Q_ · ‎11-13-2013

Due to potential OUTPUT dependence at code

b = a + 1;

d = c + 1;

So please addd "#pragma simd" under the #pragma unroll(4) if you are sure that there are no aliasing among these four arrays/pointers.

Then ---The world is quiet.

Thank you. -- QIAOMIN.Q

Intel Developer Support

Richard_H_6 · ‎11-13-2013

Hi QIAOMIN,

Thanks for your feedback. However, it doesn't help after I tried to add "#pragma simd" under the #pragma unroll(4) . I added "-vec-report2" to compiler option and the output is as follows.
$ icc -c -vec-report2 test.c -o test.o
test.c(7): (col. 3) remark: loop was not vectorized: low trip count.
test.c(7): (col. 3) warning #13379: loop was not vectorized with "simd"
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.

Do you have other suggestions?

Thanks,
Richard

QIAOMIN_Q_ · ‎11-13-2013

Hello, there is no problem with the compiler 14.0.1

$ icc -V

Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008

Part of the optimization report as below:

//

<488977u.c;-1:-1;hpo_vectorization;unroll;0> HPO Vectorizer Report (unroll)

488977u.c(6:7-6:7):VEC:unroll: LOOP WAS VECTORIZED

<488977u.c;6:6;hlo_linear_trans;unroll;0>

//

Thanks,

Qiaomin

Richard_H_6 · ‎11-28-2013

Hi Qiaomin,

Thanks for your reply. I've verified that after getting icc 14.0.1. However, as you can imagine, I can't add "#pragma simd" in all cases if "unroll" can be used. What do you think? Are there other solutions?

Thanks,
Richard

QIAOMIN_Q_ · ‎12-01-2013

Hello Richard

Actually you don't need to add "#pragma simd" and "#pragma unroll" in all cases ,the compiler will unroll loops based on default heuristics ,in this specific sample code ,there are vector dependence among the four pointers -(int a[], int b[], int c[], int d[]) ,so you can see 'loop was not vectorized' in the vectorization report. adding "#pragma simd" or "#pragma vector always" only whenever you are sure about of no pointer aliasing and no calculation dependences in the loop .

The unroll pragma is supported only when option O3 is set. and adding -unroll-aggressive enables more aggressive unrolling heuristics .

However ,you should add explicit simd&unroll pragma when needed ,because in most cases the compiler does a good default job on these two things.unrolling a loop also may increase register pressure and code size in some cases.

Regards,

Qiao

Bernard · ‎12-02-2013

>>>,there are vector dependence among the four pointers -(int a[], int b[], int c[], int d[]) >>>

Do you mean pointer aliasing which cannot be known at compile time?

TimP · ‎12-02-2013

QIAOMIN Q. (Intel) wrote:

Hello Richard

Actually you don't need to add "#pragma simd" and "#pragma unroll" in all cases ,the compiler will unroll loops based on default heuristics ,in this specific sample code ,there are vector dependence among the four pointers -(int a[], int b[], int c[], int d[]) ,so you can see 'loop was not vectorized' in the vectorization report. adding "#pragma simd" or "#pragma vector always" only whenever you are sure about of no pointer aliasing and no calculation dependences in the loop .

The unroll pragma is supported only when option O3 is set. and adding -unroll-aggressive enables more aggressive unrolling heuristics .

However ,you should add explicit simd&unroll pragma when needed ,because in most cases the compiler does a good default job on these two things.unrolling a loop also may increase register pressure and code size in some cases.

Regards,

Qiao

Intel compilers in the past haven't always unrolled automatically as much as is desirable. Prior to core-i7 "Nehalem," aggressive unrolling (more than 4) could often be useful. Even with core-i7-2 and -3, non-vectorizable loops frequently benefited from unrolling by 4, even though the compiler chose not to unroll. With corei7-4 "Haswell" I don't see benefit for unrolling by more than the Intel compiler chooses on many cases. I didn't see documentation on why this would be. For corei7-2 and 3 the combined working of loop stream detector and micro-op cache had been improved so as to reduce need for unrolling at compile time and produce full performance across a range of loop counts and instruction and data alignments, so maybe these have been improved further. As Qiao said, there is less importance of unroll directives.

14.0.1 compiler more frequently takes advantage of __restrict pointer definitions than previous icc did. If the compiler reports dependence, it often means simply that there isn't sufficient information (such as __restrict qualifier) to support disambiguation. #pragma omp simd is one of several ways to over-rule the compiler's finding of potential aliasing.

QIAOMIN_Q_ · ‎12-02-2013

when coding like this

#pragma unroll(4)
#pragma ivdep //when array a,b,c,d point to non-alising memory location ,or restrict keyword can be used.
#pragma vector aligned //when array a,b,c,d are aligned
for (i = 0; i < 16; i++) {
b = a + 1;
d = c + 1;
and compile using $ icc 488977u.c -c -vec-report6 -O3

See the output

488977u.c(8): (col. 2) remark: vectorization support: reference b has aligned access
488977u.c(8): (col. 2) remark: vectorization support: reference a has aligned access
488977u.c(9): (col. 2) remark: vectorization support: reference d has aligned access
488977u.c(9): (col. 2) remark: vectorization support: reference c has aligned access
488977u.c(7): (col. 2) remark: loop was completely unrolled
488977u.c(7): (col. 2) remark: vectorization support: unroll factor set to 4
488977u.c(7): (col. 2) remark: LOOP WAS VECTORIZED

when with no ivdep pragma specified ,you get warning like
488977u.c(6): (col. 2) remark: loop was not vectorized: existence of vector dependence
488977u.c(8): (col. 2) remark: vector dependence: assumed FLOW dependence between d line 8 and a line 7
488977u.c(7): (col. 2) remark: vector dependence: assumed ANTI dependence between a line 7 and d line 8

488977u.c(7): (col. 2) remark: vector dependence: assumed OUTPUT dependence between b line 7 and d line 8
488977u.c(8): (col. 2) remark: vector dependence: assumed OUTPUT dependence between d line 8 and b line 7

The compiler cannot safely vectorize a loop if there is even a potential dependency. Consider the following example:

for (i = 0; i < size; i++) { c = a * b;}

In the above example, the compiler needs to determine whether, for some iteration i, c might refer to the same memory location as a or b for a different iteration. (Such memory locations are sometimes said to be “aliased”). For example, if a pointed to the same memory location as c[i-1], there would be a read-after-write dependency(FLOW dependence) as in the earlier example. If the compiler cannot exclude this possibility, it will not vectorize the loop unless you provide the compiler with hints.

Bernard · ‎12-03-2013

Unrolling by more than four can increase register usage pressure and as @Tim mentioned probably for small loops which fit LSD which is coupled with micro-ops cache can do a better job than aggressive unrolling.

Richard_H_6 · ‎12-04-2013

Hi all,

Thanks for all your inputs. From the above description, I think I may get a conclusion that unroll() is not that useful but restrict keyword does.
As a result, I checked again for my code. It seems icc won't complain if there's restrict keyword there.
However, I think I might find a icc bug. You can have a look at my following code.

void test01(float *__restrict a, float *__restrict b)
{
int i;
#pragma unroll(2)
for (i = 0; i < 8; i++)
{
b[2*i] = a[2*i];
b[2*i + 1] = a[2*i + 1];
}
}

typedef float * __restrict DLB_CLVEC;
void test02(DLB_CLVEC a, DLB_CLVEC b)
{
int i;
#pragma unroll(2)
for (i = 0; i < 8; i++)
{
b[2*i] = a[2*i];
b[2*i + 1] = a[2*i + 1];
}
}

$icc --version
icc (ICC) 14.0.1 20131008
$ icc -O3 -vec-report2 -c test.c
test.c(19): (col. 3) remark: LOOP WAS VECTORIZED
test.c(31): (col. 3) remark: loop was not vectorized: existence of vector dependence
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched

I expect that test01 and test02 are exactly the same. However, it seems that icc ignores the restrict keyword in the typedef.
What do you think?

Thanks,
Richard

Bernard · ‎12-05-2013

While not using typedef as shown in your code was your loop vectorized?

Richard_H_6 · ‎12-06-2013

Yes. It was correctly vectorized as "test.c(19): (col. 3) remark: LOOP WAS VECTORIZED". It seems the difference is only due to typedef.

Richard

QIAOMIN_Q_ · ‎12-23-2013

Thanks ,this bug of "typedef float * __restrict DLB_CLVEC; doesn't take effect" has been inputed in our bug-tracking system ,i will keep you posted whenever there are progress on this.

Thank you.
--
QIAOMIN.Q
Intel Developer Support

User forums: http://software.intel.com/en-us/forums/

Richard_H_6 · ‎12-23-2013

Hi QIAOMIN,

Great!

Thanks,
Richard

Richard_H_6 · ‎12-23-2013

Hi QIAOMIN,

Great!

Thanks,
Richard

QIAOMIN_Q_ · ‎01-20-2014

The problem only happen with icc not icpc .The fix for this will be shipped in an upcoming release of Compiler 14.0 .Thanks for your issue submission.

Thank you.
--
QIAOMIN.Q
Intel Developer Support
Please participate in our redesigned community support web site:

User forums: http://software.intel.com/en-us/forums/

Richard_H_6 · ‎01-24-2014

That's great!

Thanks,

Richard