- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
c
implicit none
integer, parameter :: nt=100000
c
complex(8), dimension(nt) :: ca1,ca2 !! not work
c double complex, dimension(nt) :: ca1,ca2 !! not work
c complex*16, dimension(nt) :: ca1,ca2 !! not work
c complex(4), dimension(nt) :: ca1,ca2 !! workwith xP only
c complex, dimension(nt) :: ca1,ca2 !! workwith xP only
c
real*8 ra1(nt),ra2(nt)
integer ia1(nt),ia2(nt),i
c
common /irc/ ia2,ra2,ca2
c
do i=1,nt
ia2(i)=ia1(i)
enddo
do i=1,nt
ra2(i)=ra1(i)
enddo
do i=1,nt
ca2(i)=ca1(i)
enddo
c
stop
end
mymain.f(20) : (col. 6) remark: LOOP WAS VECTORIZED.
mymain.f(24) : (col. 10) remark: loop was not vectorized: data type unsupported on given target architecture.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you mean by "What's the point?" ?
Intel claims that vectorizer supports double complex as well since v8.0
As you can see from my example, -xN or -xW do notvectorize on single complex either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A new generation of Intel Pentium 4 processors supports the Streaming SIMD Extensions 3 (SSE3) instruction set, which can improve performance of vectorized loops containing complex data types, float-to-integer conversions, and horizontal adds.
Note that it says "complex data types", not "double complex data types".
The "point" is that a single SSE register is 128 bits. It can hold four singles, two doubles or two (single) complexes. If I'm wrong here, I'm sure Aart will step in and tell me so, but as I understand it, you can load only a single double complex value in an SSE register, so only one operation could be performed and vectorization is pointless.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Linux_F90,
Under xP, operations on both single- and double-precision complex numbers are converted into SSE3 instructions. However, operations on the single-precision complex numbers are converted into SSE3 instructions by the vectorizer (with a vector length of two), while operations on the double-precision complex numbers are converted into SSE3 instructions by the code generator (with a trivial vector length of one). Since even the latter exploits SIMD parallelism, some people (not I) still refer to the latter conversion as vectorization, which may cause the confusion (since even under xP, the vectorization diagnostics will say that no vectorization occurs for double-precision complex loops). If you inspect the generated assembly, however, you will find efficient SSE3 instruction sequences for all sorts of operations on both single- and double-precision complex numbers under xP.
Hope this clarifies things.
Aart Bik (actually on sabbatical)
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve, Aart:
thank you for the explanations.
Do you think I can get any gain by going into double float and usearrays of double dimension?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Linux_F90,
I guess you really want to know whether casting the double complex arrays into double FP arrays of twice the length will trick the compiler into vectorization under xW and xN as well. For simple operations (initialization, addition, subtraction), this trick will actually work. For the more elaborate, and more commonly occurring complex operations (multiplication, division, etc.), this trick requires an extensive rewriting that probably disables vectorization. Under xP, none of these rewritings will have a speed advantage whatsoever.
Aart Bik
http://www.aartbik.com/

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page