- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having some trouble understanding why a seemingly simple loop doesn't get vectorized:
VV=VV+(CT11*D11+CT21*D21+CT12*D12+CT22*D22)*CN1
VH=VH+(CT11*D12+CT21*D22 +CT12*D11+CT22*D21)*CN2
HV=HV-(CT11*D21+CT21*D11+CT12*D22+CT22*D12)*CN2
HH=HH+(CT11*D22+CT21*D12+CT12*D21+CT22*D11)*CN1
400 CONTINUE
COMPLEX*16 VV,VH,HV,HH
...
DO 400 N=NMIN,NMAX
//some setup
DV1N=M*DV1(N)
DV2N=DV2(N)
CT11=DCMPLX(TR11(M1,N,NN),TI11(M1,N,NN))
CT22=DCMPLX(TR22(M1,N,NN),TI22(M1,N,NN))
CT12=DCMPLX(TR12(M1,N,NN),TI12(M1,N,NN))
CT21=DCMPLX(TR21(M1,N,NN),TI21(M1,N,NN))
CN1=CAL(N,NN)*FC
CN2=CAL(N,NN)*FS
D11=DV1N*DV1NN
D12=DV1N*DV2NN
D21=DV2N*DV1NN
D22=DV2N*DV2NN
DV2N=DV2(N)
CT11=DCMPLX(TR11(M1,N,NN),TI11(M1,N,NN))
CT22=DCMPLX(TR22(M1,N,NN),TI22(M1,N,NN))
CT12=DCMPLX(TR12(M1,N,NN),TI12(M1,N,NN))
CT21=DCMPLX(TR21(M1,N,NN),TI21(M1,N,NN))
CN1=CAL(N,NN)*FC
CN2=CAL(N,NN)*FS
D11=DV1N*DV1NN
D12=DV1N*DV2NN
D21=DV2N*DV1NN
D22=DV2N*DV2NN
//does not vectorize!
VV=VV+(CT11*D11+CT21*D21+CT12*D12+CT22*D22)*CN1
VH=VH+(CT11*D12+CT21*D22 +CT12*D11+CT22*D21)*CN2
HV=HV-(CT11*D21+CT21*D11+CT12*D22+CT22*D12)*CN2
HH=HH+(CT11*D22+CT21*D12+CT12*D21+CT22*D11)*CN1
400 CONTINUE
Compiling with:
ifort -msse4.1 -O3 --vec-report3
Gives:
ifort -msse4.1 -O3 --vec-report3
Gives:
vec2.f(214): (col. 16) remark: loop was not vectorized: unsupported data type.
line 214 is the "DO 400 N=NMIN,NMAX" line. Unfortunately theres no otherexplanationwhat the unsupported data type might be, and searching on google turned up nothing to help me. This loop looks trivially parallel to me, its literally just multiplying data from a few arrays and accumulating into 4 complex numbers. I'm just not sure how to expose that parallelism to the compiler.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
REAL*4 and REAL*8 are performed with supported SSE instruction set (in hardware). REAL*16 is not supported by the hardware (SSE). REAL*16 is performed instead by software emulation and is not vectorizable. COMPLEX*16 is a composite of two REAL*16 variables so would not be vectorizable using SSE instructions.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
COMPLEX*16 is a composite of two REAL*16 variables so would not be vectorizable using SSE instructions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't pretend to be an expert in this, but my understanding is that SSE4 has instructions to aid in vectorizing single-precision complex, but not double-precision complex. Given that a single value of the latter would fill an SSE register, this is not too surprising.
Tim is right - complex*16 is double-precision, not quad-precision. complex(16) would be quad-precision. The * notation indicates the total length in bytes of the datatype, not the size of each component.
Tim is right - complex*16 is double-precision, not quad-precision. complex(16) would be quad-precision. The * notation indicates the total length in bytes of the datatype, not the size of each component.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page