Software Archive
Read-only legacy content

loop unrolling issues on MIC

Alin_M_Elena
Beginner
1,209 Views

Hi,

I am trying to play with an Intel MIC.

I have hit an issue which I traced back to unrolling.

It seems that the compiler fails to unroll loops that contain constants in the expressions. I have attached a small code showing the issue. 

the problem goes around this part

t1=omp_get_wtime()
  do i=1,ns
    a=a/real(n1,pr)
  enddo
 t1=omp_get_wtime()-t1

a above is an array. if n1 is a paremeter I get the first time below, if n1 is a normal variable I get the second time.

The second line in the output corresponds to explicity writting the a loop.

~/lavello/XeonPhi/arrays $ ./arrays.MIC
15.201463 0.71788597
15.435404 0.71788788

the same code in c++ (only with explicit loops)

~/lavello/XeonPhi/arrays $ ./arrays.xx
0.015044 0.0153539

here is the vectorisation report for fortran that show the issue... the loops that contain expressions with parameter types get unrolled to 2 while the ones containing normal variables to 8.

ifort -openmp -O3 -mmic -o arrays.MIC arrays.F90 -vec-report7
arrays.F90(20): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(18): (col. 3) remark: PARTIAL LOOP WAS VECTORIZED.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: unroll factor set to 2.
arrays.F90(24): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: unroll factor set to 8.
arrays.F90(30): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(36): (col. 4) remark: vectorization support: unroll factor set to 2.
arrays.F90(36): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(44): (col. 4) remark: vectorization support: unroll factor set to 8.
arrays.F90(44): (col. 4) remark: LOOP WAS VECTORIZED.

in c++ case both loops have the same unroll factor

[alin@phinally:~/lavello/XeonPhi/arrays]: icpc -o arrays.xx arrays.cxx -openmp -O3 -vec-report6 -mmic
arrays.cxx(13): (col. 10) remark: vectorization support: reference b has aligned access.
arrays.cxx(13): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.cxx(12): (col. 3) remark: LOOP WAS VECTORIZED.
arrays.cxx(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.cxx(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.cxx(18): (col. 4) remark: vectorization support: unroll factor set to 8.
arrays.cxx(18): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.cxx(17): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.cxx(27): (col. 5) remark: vectorization support: reference b has aligned access.
arrays.cxx(27): (col. 5) remark: vectorization support: reference b has aligned access.
arrays.cxx(26): (col. 5) remark: vectorization support: unroll factor set to 8.
arrays.cxx(26): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.cxx(25): (col. 3) remark: loop was not vectorized: not inner loop.

the versions of my compilers are

[alin@phinally:~/lavello/XeonPhi/arrays]: ifort --version
ifort (IFORT) 13.0.1 20121010
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.

[alin@phinally:~/lavello/XeonPhi/arrays]: icpc --version
icpc (ICC) 13.0.1 20121010
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.

0 Kudos
14 Replies
TimP
Honored Contributor III
1,209 Views

I suppose you may need to examine generated code and see whether one case has been optimized down to a simple multiplication by a constant, but not another.  It's difficult to draw conclusions when you test the compiler's ability to optimize away unused code.  It looks like you are using a rather old compiler (in terms of MIC releases).

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

Hi Tim,

Thank you for your answer... I will try to get access to a newer compiler... but the code was compiled with the latest version and the same behaviour was obtained.

I do not understand your comment on unused code. can you be more specific what you mean by it. 

I find it strange that it unrolls correctly in the case of c++ but not fortran...and also I find it strange that unrolling seems to depend on the fact that a data is a constants or not (in the context of the loop none of them changes value).

regards,

Alin

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

I have updated the compiler... 

I see now there is a level 7 of reporting for vectorisation

[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O3 -mmic -o arrays.MIC arrays.F90 -vec-report6
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(20): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(18): (col. 3) remark: PARTIAL LOOP WAS VECTORIZED.
arrays.F90(24): (col. 5) remark: vectorization support: unroll factor set to 2.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(24): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(23): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(30): (col. 5) remark: vectorization support: unroll factor set to 16.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(30): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(29): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(36): (col. 4) remark: vectorization support: unroll factor set to 2.
arrays.F90(37): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(37): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(37): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(36): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(35): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(44): (col. 4) remark: vectorization support: unroll factor set to 16.
arrays.F90(45): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(45): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(45): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(44): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(43): (col. 3) remark: loop was not vectorized: not inner loop.
[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O3 -mmic -o arrays.MIC arrays.F90 -vec-report7
arrays.F90(18): (col. 3) remark: VEC#00001NPNR 1.
arrays.F90(18): (col. 3) remark: VEC#00052 1.
arrays.F90(18): (col. 3) remark: VEC#00204 30.
arrays.F90(18): (col. 3) remark: VEC#00207 2.
arrays.F90(18): (col. 3) remark: VEC#00213 4.
arrays.F90(24): (col. 5) remark: VEC#00001NPNR 1.
arrays.F90(24): (col. 5) remark: VEC#00052 2.
arrays.F90(24): (col. 5) remark: VEC#00101UUSL 1.
arrays.F90(24): (col. 5) remark: VEC#00101UUSS 1.
arrays.F90(24): (col. 5) remark: VEC#00201 105.
arrays.F90(24): (col. 5) remark: VEC#00202 105.000000.
arrays.F90(24): (col. 5) remark: VEC#00203 12.000000.
arrays.F90(24): (col. 5) remark: VEC#00204 35.
arrays.F90(24): (col. 5) remark: VEC#00212 3.
arrays.F90(30): (col. 5) remark: VEC#00001NPNR 1.
arrays.F90(30): (col. 5) remark: VEC#00052 2.
arrays.F90(30): (col. 5) remark: VEC#00101UUSL 1.
arrays.F90(30): (col. 5) remark: VEC#00101UUSS 1.
arrays.F90(30): (col. 5) remark: VEC#00201 15.
arrays.F90(30): (col. 5) remark: VEC#00202 20.500000.
arrays.F90(30): (col. 5) remark: VEC#00203 12.820000.
arrays.F90(30): (col. 5) remark: VEC#00204 35.
arrays.F90(36): (col. 4) remark: VEC#00001NPNR 1.
arrays.F90(36): (col. 4) remark: VEC#00052 2.
arrays.F90(36): (col. 4) remark: VEC#00101UUSL 1.
arrays.F90(36): (col. 4) remark: VEC#00101UUSS 1.
arrays.F90(36): (col. 4) remark: VEC#00201 105.
arrays.F90(36): (col. 4) remark: VEC#00202 105.000000.
arrays.F90(36): (col. 4) remark: VEC#00203 12.000000.
arrays.F90(36): (col. 4) remark: VEC#00204 35.
arrays.F90(36): (col. 4) remark: VEC#00212 3.
arrays.F90(44): (col. 4) remark: VEC#00001NPNR 1.
arrays.F90(44): (col. 4) remark: VEC#00052 2.
arrays.F90(44): (col. 4) remark: VEC#00101UUSL 1.
arrays.F90(44): (col. 4) remark: VEC#00101UUSS 1.
arrays.F90(44): (col. 4) remark: VEC#00201 15.
arrays.F90(44): (col. 4) remark: VEC#00202 20.500000.
arrays.F90(44): (col. 4) remark: VEC#00203 12.820000.
arrays.F90(44): (col. 4) remark: VEC#00204 35.

now it turns out that the good loop is unrolled even more...

~/lavello/XeonPhi/arrays $ ./arrays.MIC
15.195465 0.30842900
14.081736 0.30841279

To complicate things even more if I disable unrolling... the situation does not seem to change for the "bad" loop

[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O3 -mmic -o arrays.MIC arrays.F90 -unroll0 -vec-report6
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(20): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(18): (col. 3) remark: PARTIAL LOOP WAS VECTORIZED.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(24): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(23): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(30): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(29): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(37): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(37): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(37): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(36): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(35): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(45): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(45): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(45): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(44): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(43): (col. 3) remark: loop was not vectorized: not inner loop.
[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O3 -mmic -o arrays.MIC arrays.F90 -unroll0 -vec-report7
arrays.F90(18): (col. 3) remark: VEC#00001NPNR 1.
arrays.F90(18): (col. 3) remark: VEC#00052 1.
arrays.F90(18): (col. 3) remark: VEC#00204 30.
arrays.F90(18): (col. 3) remark: VEC#00207 2.
arrays.F90(18): (col. 3) remark: VEC#00213 4.
arrays.F90(24): (col. 5) remark: VEC#00001NPNR 1.
arrays.F90(24): (col. 5) remark: VEC#00052 2.
arrays.F90(24): (col. 5) remark: VEC#00101UUSL 1.
arrays.F90(24): (col. 5) remark: VEC#00101UUSS 1.
arrays.F90(24): (col. 5) remark: VEC#00201 70.
arrays.F90(24): (col. 5) remark: VEC#00202 17.500000.
arrays.F90(24): (col. 5) remark: VEC#00203 8.000000.
arrays.F90(24): (col. 5) remark: VEC#00204 25.
arrays.F90(24): (col. 5) remark: VEC#00212 2.
arrays.F90(30): (col. 5) remark: VEC#00001NPNR 1.
arrays.F90(30): (col. 5) remark: VEC#00052 2.
arrays.F90(30): (col. 5) remark: VEC#00101UUSL 1.
arrays.F90(30): (col. 5) remark: VEC#00101UUSS 1.
arrays.F90(30): (col. 5) remark: VEC#00201 10.
arrays.F90(30): (col. 5) remark: VEC#00202 2.500000.
arrays.F90(30): (col. 5) remark: VEC#00203 7.960000.
arrays.F90(30): (col. 5) remark: VEC#00204 25.
arrays.F90(36): (col. 4) remark: VEC#00001NPNR 1.
arrays.F90(36): (col. 4) remark: VEC#00052 2.
arrays.F90(36): (col. 4) remark: VEC#00101UUSL 1.
arrays.F90(36): (col. 4) remark: VEC#00101UUSS 1.
arrays.F90(36): (col. 4) remark: VEC#00201 70.
arrays.F90(36): (col. 4) remark: VEC#00202 17.500000.
arrays.F90(36): (col. 4) remark: VEC#00203 8.000000.
arrays.F90(36): (col. 4) remark: VEC#00204 25.
arrays.F90(36): (col. 4) remark: VEC#00212 2.
arrays.F90(44): (col. 4) remark: VEC#00001NPNR 1.
arrays.F90(44): (col. 4) remark: VEC#00052 2.
arrays.F90(44): (col. 4) remark: VEC#00101UUSL 1.
arrays.F90(44): (col. 4) remark: VEC#00101UUSS 1.
arrays.F90(44): (col. 4) remark: VEC#00201 10.
arrays.F90(44): (col. 4) remark: VEC#00202 2.500000.
arrays.F90(44): (col. 4) remark: VEC#00203 7.960000.
arrays.F90(44): (col. 4) remark: VEC#00204 25.
[alin@phinally:~/lavello/XeonPhi/arrays]: ssh mic0
~ $ cd ~/lavello/XeonPhi/arrays
~/lavello/XeonPhi/arrays $ ./arrays.MIC
14.075157 0.52774692
14.069236 0.52784920

I will try next to get the results with optimisation turned off..

regards,

Alin

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

here are the results with unroll0 and increasing the optimisation level from O0 to O2

[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O0 -mmic -o arrays.MIC arrays.F90 -unroll0 -vec-report7
[alin@phinally:~/lavello/XeonPhi/arrays]: ssh mic0
~ $ cd ~/lavello/XeonPhi/arrays
~/lavello/XeonPhi/arrays $ ./arrays.MIC
31.092502 31.458152
31.185777 31.564384
~/lavello/XeonPhi/arrays $ Connection to mic0 closed.
[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O1 -mmic -o arrays.MIC arrays.F90 -unroll0 -vec-report7
[alin@phinally:~/lavello/XeonPhi/arrays]: ssh mic0
~ $ cd ~/lavello/XeonPhi/arrays
~/lavello/XeonPhi/arrays $ ./arrays.MIC
29.891860 29.940608
29.886266 29.938223
~/lavello/XeonPhi/arrays $ Connection to mic0 closed.
[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O2 -mmic -o arrays.MIC arrays.F90 -unroll0 -vec-report6
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(20): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(18): (col. 3) remark: PARTIAL LOOP WAS VECTORIZED.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(24): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(24): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(23): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(30): (col. 5) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(30): (col. 5) remark: LOOP WAS VECTORIZED.
arrays.F90(29): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(37): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(37): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(37): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(37): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(36): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(35): (col. 3) remark: loop was not vectorized: not inner loop.
arrays.F90(45): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(45): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(45): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(45): (col. 10) remark: vectorization support: unaligned load will be scalarized.
arrays.F90(44): (col. 4) remark: LOOP WAS VECTORIZED.
arrays.F90(43): (col. 3) remark: loop was not vectorized: not inner loop.
[alin@phinally:~/lavello/XeonPhi/arrays]: ifort -openmp -O2 -mmic -o arrays.MIC arrays.F90 -unroll0 -vec-report7
arrays.F90(18): (col. 3) remark: VEC#00001NPNR 1.
arrays.F90(18): (col. 3) remark: VEC#00052 1.
arrays.F90(18): (col. 3) remark: VEC#00204 30.
arrays.F90(18): (col. 3) remark: VEC#00207 2.
arrays.F90(18): (col. 3) remark: VEC#00213 4.
arrays.F90(24): (col. 5) remark: VEC#00001NPNR 1.
arrays.F90(24): (col. 5) remark: VEC#00052 2.
arrays.F90(24): (col. 5) remark: VEC#00101UUSL 1.
arrays.F90(24): (col. 5) remark: VEC#00101UUSS 1.
arrays.F90(24): (col. 5) remark: VEC#00201 70.
arrays.F90(24): (col. 5) remark: VEC#00202 17.500000.
arrays.F90(24): (col. 5) remark: VEC#00203 8.000000.
arrays.F90(24): (col. 5) remark: VEC#00204 25.
arrays.F90(24): (col. 5) remark: VEC#00212 2.
arrays.F90(30): (col. 5) remark: VEC#00001NPNR 1.
arrays.F90(30): (col. 5) remark: VEC#00052 2.
arrays.F90(30): (col. 5) remark: VEC#00101UUSL 1.
arrays.F90(30): (col. 5) remark: VEC#00101UUSS 1.
arrays.F90(30): (col. 5) remark: VEC#00201 10.
arrays.F90(30): (col. 5) remark: VEC#00202 2.500000.
arrays.F90(30): (col. 5) remark: VEC#00203 7.960000.
arrays.F90(30): (col. 5) remark: VEC#00204 25.
arrays.F90(36): (col. 4) remark: VEC#00001NPNR 1.
arrays.F90(36): (col. 4) remark: VEC#00052 2.
arrays.F90(36): (col. 4) remark: VEC#00101UUSL 1.
arrays.F90(36): (col. 4) remark: VEC#00101UUSS 1.
arrays.F90(36): (col. 4) remark: VEC#00201 70.
arrays.F90(36): (col. 4) remark: VEC#00202 17.500000.
arrays.F90(36): (col. 4) remark: VEC#00203 8.000000.
arrays.F90(36): (col. 4) remark: VEC#00204 25.
arrays.F90(36): (col. 4) remark: VEC#00212 2.
arrays.F90(44): (col. 4) remark: VEC#00001NPNR 1.
arrays.F90(44): (col. 4) remark: VEC#00052 2.
arrays.F90(44): (col. 4) remark: VEC#00101UUSL 1.
arrays.F90(44): (col. 4) remark: VEC#00101UUSS 1.
arrays.F90(44): (col. 4) remark: VEC#00201 10.
arrays.F90(44): (col. 4) remark: VEC#00202 2.500000.
arrays.F90(44): (col. 4) remark: VEC#00203 7.960000.
arrays.F90(44): (col. 4) remark: VEC#00204 25.
[alin@phinally:~/lavello/XeonPhi/arrays]: ssh mic0
~ $ cd ~/lavello/XeonPhi/arrays
~/lavello/XeonPhi/arrays $ ./arrays.MIC
14.073770 0.52780604
14.069090 0.52781415

0 Kudos
robert-reed
Valued Contributor II
1,209 Views

FYI I've moved your thread to the Intel Xeon Phi coprocessor forum where it will likely get the best exposure.

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

Thank you Robert... this was exactly what I wanted to ask...

Can you delete or merge it with this one http://software.intel.com/en-us/forums/topic/405600 ?

regards,

Alin

0 Kudos
Kevin_D_Intel
Employee
1,209 Views

Hi Alin, I reproduced the reported behavior you noted with the latest compiler (where unrolling is set at 2 and 16). I'll dig a bit deeper from there and post when I know more.

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

Thank you Kevin!

I have slimmed a little bit the code to make life easier to debug... 

the content of the two loops I have split in two separate subroutines and added a dummy subroutine to prevent the unused code behaviour Tim suggested.

 The same behaviour. I will upload the files if needed.

regards,

Alin

0 Kudos
Kevin_D_Intel
Employee
1,209 Views

Alin, Thank you for the other test cases attached to the other thread. Those were more convenient to analyze.

With our newest (14.0) compiler (due to release in the next month), both loops report an unroll factor of 2; however, that is not what I see gating performance. loopn.f90 realizes the biggest performance boost from vectorization (that’s including w/o unrolling); however, while loop2.f90 is also reportedly vectorized, the generated code between the two is rather different and as noted, loop2.f90 is about 20x times slower. I directed your test cases to our vectorization developers (under the internal tracking id below) for deeper analysis and will let you know what I hear from them.

(Internal tracking id: DPD200246949)

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

Thank you Kevin! I did not realise I posted the case in the other topic.

regards,
Alin 

0 Kudos
Vladimir_Dergachev
1,209 Views

I have seen very similar behaviour with C compiler for Xeon Phi as well. It seems it fails to elevate common expressions outside the loop on most occasions. This is especially visible with chained pointers.

0 Kudos
Kevin_D_Intel
Employee
1,209 Views

If you can provide any test case(s) we can direct those to development. For this case the developers are progressing after identifying the root cause related to optimization of complex multiply/divide by a constant.

0 Kudos
Alin_M_Elena
Beginner
1,209 Views

Hi Kevin,

out of curiosity in the case above I have changed from complex(kind=pr) to real(kind=pr)

now the vec report reads equal unroll factors of 4... but... the times are again different... but the slower one seems to be the one using a variable this time. The difference seems to be a factor of 10... which to be honest I would not expect

ifort -o arrays.MIC -openmp -mmic -O3 loop2.F90 loopn.F90 arrays.F90 dummy.mico -vec-report6
loop2.F90(7): (col. 3) remark: vectorization support: unroll factor set to 4.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: reference x has aligned access.
loop2.F90(7): (col. 3) remark: vectorization support: reference x has aligned access.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loop2.F90(7): (col. 3) remark: LOOP WAS VECTORIZED.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: reference x has aligned access.
loop2.F90(7): (col. 3) remark: vectorization support: reference x has aligned access.
loop2.F90(7): (col. 3) remark: vectorization support: reference x has aligned access.
loop2.F90(7): (col. 3) remark: vectorization support: reference x has aligned access.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loop2.F90(7): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loop2.F90(7): (col. 3) remark: REMAINDER LOOP WAS VECTORIZED.
loopn.F90(8): (col. 3) remark: vectorization support: unroll factor set to 4.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: reference x has aligned access.
loopn.F90(8): (col. 3) remark: vectorization support: reference x has aligned access.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loopn.F90(8): (col. 3) remark: LOOP WAS VECTORIZED.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: reference x has aligned access.
loopn.F90(8): (col. 3) remark: vectorization support: reference x has aligned access.
loopn.F90(8): (col. 3) remark: vectorization support: reference x has aligned access.
loopn.F90(8): (col. 3) remark: vectorization support: reference x has aligned access.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned store will be scalarized.
loopn.F90(8): (col. 3) remark: vectorization support: unaligned load will be scalarized.
loopn.F90(8): (col. 3) remark: REMAINDER LOOP WAS VECTORIZED.
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(20): (col. 5) remark: vectorization support: reference c has aligned access.
arrays.F90(19): (col. 5) remark: vectorization support: reference a has aligned access.
arrays.F90(20): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(19): (col. 5) remark: vectorization support: unaligned store will be scalarized.
arrays.F90(18): (col. 3) remark: PARTIAL LOOP WAS VECTORIZED.
arrays.F90(24): (col. 10) remark: vectorization support: call to function dividen_ cannot be vectorized.
arrays.F90(23): (col. 3) remark: loop was not vectorized: existence of vector dependence.
arrays.F90(30): (col. 10) remark: vectorization support: call to function divide2_ cannot be vectorized.
arrays.F90(29): (col. 3) remark: loop was not vectorized: existence of vector dependence.

~/lavello/XeonPhi/arrays/real $ ./arrays.MIC
0.17513299 0.19222021E-01

regards,

Alin

0 Kudos
Kevin_D_Intel
Employee
1,209 Views

Hi Alin - I confirmed those findings too and reported it to Development (see internal tracking id below) for deeper analysis. I’ll keep you posted.

(Internal tracking id: DPD200247006)

0 Kudos
Reply