double complex vectorization ???

linux_f90 · ‎01-06-2006

I tried to vectorize my sample code with limited succes.

On SSE3 Xeonvectorization works for all data types except double complex.

(-xP option;

Intel Fortran Compiler for Intel EM64T-based applications, Version 8.1 Build 20040922)

On SSE2 Xeon vectorization works for all data types exceptany complex.

(-xW or -xN option;

Intel Fortran Compiler for 32-bit applications, Version 8.0

Build 20031231Z)

The code and vec_report is below

Is the complex vectorization really implemented (as Intel claims)?

Any help will be greately appreciated.

Thanks

***************************************************************

program mymain
c
implicit none
integer, parameter :: nt=100000
c
complex(8), dimension(nt) :: ca1,ca2 !! not work
c double complex, dimension(nt) :: ca1,ca2 !! not work
c complex*16, dimension(nt) :: ca1,ca2 !! not work
c complex(4), dimension(nt) :: ca1,ca2 !! workwith xP only
c complex, dimension(nt) :: ca1,ca2 !! workwith xP only
c
real*8 ra1(nt),ra2(nt)
integer ia1(nt),ia2(nt),i
c
common /irc/ ia2,ra2,ca2
c
do i=1,nt
ia2(i)=ia1(i)
enddo
do i=1,nt
ra2(i)=ra1(i)
enddo
do i=1,nt
ca2(i)=ca1(i)
enddo
c
stop
end

************************************************************************

ifort -c -O3 -xW -132 -vec_report2 mymain.f

mymain.f(17) : (col. 6) remark: LOOP WAS VECTORIZED.
mymain.f(20) : (col. 6) remark: LOOP WAS VECTORIZED.
mymain.f(24) : (col. 10) remark: loop was not vectorized: data type unsupported on given target architecture.

Steven_L_Intel1 · ‎01-06-2006

You can fit only one double complex value in an SSE register. What's the point?

linux_f90 · ‎01-06-2006

What do you mean by "What's the point?" ?

Intel claims that vectorizer supports double complex as well since v8.0

As you can see from my example, -xN or -xW do notvectorize on single complex either.

Steven_L_Intel1 · ‎01-06-2006

Would you please show me, and tell me where you found it, the text that says that double complex operations can be vectorized? Here is what the 8.0 release notes say:

A new generation of Intel Pentium 4 processors supports the Streaming SIMD Extensions 3 (SSE3) instruction set, which can improve performance of vectorized loops containing complex data types, float-to-integer conversions, and horizontal adds.

Note that it says "complex data types", not "double complex data types".

The "point" is that a single SSE register is 128 bits. It can hold four singles, two doubles or two (single) complexes. If I'm wrong here, I'm sure Aart will step in and tell me so, but as I understand it, you can load only a single double complex value in an SSE register, so only one operation could be performed and vectorization is pointless.

linux_f90 · ‎01-06-2006

-

I have a July 19, 2005 "White Paper" pdf downloaded yestrday from intel.com

It's called "Optimizing aplications with the Intel C++ and Fortran Compilers"

for Windows and Linux updated for Intel Compilers 9.0

Quote from page 10:

Vectorization and Loop Optimization

...

The Intel C++ and Fortran Compilers automatically vectorize code. The vectorizer supports the following features:

* Multiple data types: The vectorizer supports the float/double and char/short/int/long types (both signed and unsigned), as well as the _Complex float and _Complex double ...

This paper always specifies if there is something new for v9.0. So I assumed that the vectorization is the same for both v8.x and v9.x.

I just tried v9.0 on Xeon SSE2 (Intel Fortran Compiler for 32-bit applications, Version 9.0 Build 20050809Z ) without success: it did not vectorized single complex.

Even if double complex is not going to work (which is a great disappointment), why I cannot vectorize single complex with -xN or -xW ?

Steven_L_Intel1 · ‎01-06-2006

The SSE3 instructions designed to help vectorize complex operations are not generated when you use -xN or -xW, as those specify SSE2 support only.

Intel_C_Intel · ‎01-06-2006

Dear Linux_F90,

Under xP, operations on both single- and double-precision complex numbers are converted into SSE3 instructions. However, operations on the single-precision complex numbers are converted into SSE3 instructions by the vectorizer (with a vector length of two), while operations on the double-precision complex numbers are converted into SSE3 instructions by the code generator (with a trivial vector length of one). Since even the latter exploits SIMD parallelism, some people (not I) still refer to the latter conversion as vectorization, which may cause the confusion (since even under xP, the vectorization diagnostics will say that no vectorization occurs for double-precision complex loops). If you inspect the generated assembly, however, you will find efficient SSE3 instruction sequences for all sorts of operations on both single- and double-precision complex numbers under xP.

Hope this clarifies things.

Aart Bik (actually on sabbatical)
http://www.aartbik.com/

Steven_L_Intel1 · ‎01-06-2006

Ah, I knew Aart couldn't stay away... Thanks for the clarification.

linux_f90 · ‎01-06-2006

Steve, Aart:

thank you for the explanations.

Do you think I can get any gain by going into double float and usearrays of double dimension?

Intel_C_Intel · ‎01-06-2006

Dear Linux_F90,

I guess you really want to know whether casting the double complex arrays into double FP arrays of twice the length will trick the compiler into vectorization under xW and xN as well. For simple operations (initialization, addition, subtraction), this trick will actually work. For the more elaborate, and more commonly occurring complex operations (multiplication, division, etc.), this trick requires an extensive rewriting that probably disables vectorization. Under xP, none of these rewritings will have a speed advantage whatsoever.

Aart Bik
http://www.aartbik.com/