vectorisation unsuccessful (ifort 13.1.3) [double complex arrays]

Todor_K_ · ‎01-29-2014

Dear all,

I have several 2D allocatable arrays of type complex(kind=kind(1.D0))

I want to do this:

...

[fortran]

do j=1,dim2

do i=1,dim1

a(i,j)=b(i,j) / c(i,j)

end do

[/fortran]

...

When adding a !dir$ vector always directive before the inner loop, the compiler issues the following remark -- "vectorization support: scalar type occupies entire vector". Is the problem that the type complex(kind=kind(1.d0)) is as wide as an xmm register (2xdouble = 128 bits on x64 machines)? Is this a real problem? Complex numbers are still just two doubles. At least +, or - operations seem straightforwardly vectorisable. But, these are not vectorised by the compiler either.

Ron_Green · ‎01-29-2014

I'm not sure what version of the compiler you are using, as I can't reproduce the message you are seeing. But yes, the message is pretty clear that a single element of double complex occupies an entire 128 xmm register. hence 'vectorization' is not possible since you are not able to process multiple data elements in parallel.

You'll note that if you use AVX via -xavx you can get that inner loop to vectorize even without the dir$ vector always (at least with the 14.0 compiler. again, not sure how old of a compiler you are using).

There is a new KB article on this: http://software.intel.com/en-us/articles/fdiag15015 ; for diagnostic 15015 "unsupported data type".

with latest compiler:

[fortran]program fq
implicit none
integer, parameter :: N=100
integer :: i,j
complex (kind=kind(1.D0)), allocatable :: a(:,:), b(:,:), c(:,:)

allocate (a(N,N))
allocate (b(N,N))
allocate (c(N,N))

b = (2.0,1.0)
c = (4.0,1.0)

do j=1,N
do i=1,N
a(i,j) = b(i,j) / c(i,j)
end do
end do

write(*,*) a(1,1)

end program fq[/fortran]

ifort -O2 -xsse4.2 -vec-report3 fq.f90
fq.f90(11): (col. 1) remark: loop was not vectorized: unsupported data type
fq.f90(12): (col. 1) remark: loop was not vectorized: unsupported data type
fq.f90(16): (col. 21) remark: loop was not vectorized: unsupported data type

but with AVX

$ ifort -O2 -xavx -vec-report3 fq.f90
fq.f90(11): (col. 1) remark: LOOP WAS VECTORIZED
fq.f90(11): (col. 1) remark: REMAINDER LOOP WAS VECTORIZED
fq.f90(12): (col. 1) remark: LOOP WAS VECTORIZED
fq.f90(12): (col. 1) remark: REMAINDER LOOP WAS VECTORIZED
fq.f90(16): (col. 21) remark: LOOP WAS VECTORIZED
fq.f90(16): (col. 21) remark: REMAINDER LOOP WAS VECTORIZED

Note that the Forum formatting is messing up the line numbers. line (11) and (12) are the initialization lines for b and c. line(16) is the inner loop.

Ron_Green · ‎01-29-2014

My mistake - I didn't read your SUBJECT carefully. So you're on 13.1.3

the messages may change, but the concept is the same. -xavx will work with that compiler too.

without -xavx and 13.1.3 I get: vectorization possible but seems inefficient.

adding dir$ vector always
I still get 'unsupported data type'

Notice that 14.0 did NOT require the vector always to vectorize this loop. Compiler is improving with age!

ADDING to this - IF you are using double complex in your application, and IF you have an AVX enabled processor, I might suggest upgrading to 14.0, since as you see it is able to automatically vectorize loops that your 13.1 refused to do. Could make a good performance improvement.

ron

TimP · ‎01-29-2014

I forgot one point, although SSE2 is the default vectorization now, SSE3 or newer is needed for complex.

SSE3 vectorization of complex double isn't called vectorization, even though it optimizes with SSE3, as it still works one loop iteration at a time. As Ron indicated, successful AVX vectorization would report out, as then each parallel operation performs 2 loop iterations.

Another aspect which isn't covered in the vec-report is the distinction between -complex-limited-range, where division can actually be vectorized at the expense of cutting the working exponent range to roughly 10^+-153, and the default where complex division works for the full exponent range. vec-report may tell you it's vectorized even though all the time may be spent in carefully expanded scalar code which protects the full exponent range.

The "seems inefficient" report for SSE3 vectorization of a loop which does little but divide could be the compiler's recognition that the scalar full range divide will work better with no vectorization at all.

Todor_K_ · ‎01-29-2014

Hello Tim, that's exactly the information i was looking for. Thanks a lot!

Thank you, also, for clarifying the terminology. I was less concerned with loop iterations, more with the number of "elementary chunks" (i.e. doubles in this case) one could treat atomically. It is nice the compiler does the obvious and uses SSE even if the terminology is not applicable :). I didn't know about the -complex-limited-range flag which seems quite useful. The limitations you mention are something i could certainly live with.

Thank you too Ron, good to know about the difference it makes for the old compiler when used with and without the directive. I've only came to the gist of the problem after transforming the array syntax A=B/C into do-loop syntax and forcing the vectorisation with vector always. Before , it was the same unhelpful message "vectorization possible but seems inefficient". Version 14, it seems, shows much clearer information about this particular problem. Nice to know error reporting is progressing too. We'll see if i can get the Lords of the Cluster to upgrade.

Ron_Green · ‎01-30-2014

I am a advocate of array syntax, and hate to see people resort to coding loops instead of using array syntax. So I tried the code with array syntax, AVX, and the 14.0 compiler:

[fortran]

$ more fq2.f90
program fq
implicit none
integer, parameter :: N=100
integer :: i,j
complex (kind=kind(1.D0)), allocatable :: a(:,:), b(:,:), c(:,:)

allocate (a(N,N))
allocate (b(N,N))
allocate (c(N,N))

b = (2.0,1.0)
c = (4.0,1.0)

a = b / c

write(*,*) a(1,1)

end program fq

[/fortran]

Indeed, the compiler did recognize the vectorization opportunity in the array syntax, and with AVX, did vectorize the loop:

[cpp]

$ ifort -O2 -xavx -vec-report3 fq2.f90
fq2.f90(11): (col. 1) remark: LOOP WAS VECTORIZED
fq2.f90(11): (col. 1) remark: REMAINDER LOOP WAS VECTORIZED
fq2.f90(12): (col. 1) remark: LOOP WAS VECTORIZED
fq2.f90(12): (col. 1) remark: REMAINDER LOOP WAS VECTORIZED
fq2.f90(14): (col. 1) remark: LOOP WAS VECTORIZED
fq2.f90(14): (col. 1) remark: REMAINDER LOOP WAS VECTORIZED

[/cpp]

like before, the User Forum formatting is messing with the actual line numbers, but line (14) is the a = b / c array syntax.

ron