- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was examining the assembly output of a WHERE clause and I have several questions
Consider the following fortran subset
PROGRAM test LOGICAL :: mask(4) = (/ .FALSE., .TRUE., .TRUE., .FALSE. /) REAL :: b(4) = (/4.0, 5.0, 3.0, 2.0/) REAL :: a(4) REAL :: c(4) = (/42.0, 47.0, 19.0, 52.0 /) a = 0.0 WHERE(mask) a = c * b END WHERE END PROGRAM testCompiling with /Ox /architecture:SSE2 produced the following (excepted) code
movaps xmm5, XMMWORD PTR TEST$C$0$0 mulps xmm5, XMMWORD PTR TEST$B$0$0 movdqa xmm1, XMMWORD PTR TEST$MASK$0$0 pand xmm1, XMMWORD PTR _2il0floatpacket$1 cvtdq2ps xmm2, xmm1 pxor xmm3, xmm3 mov eax, esp pxor xmm0, xmm0 cvtdq2ps xmm4, xmm0 cmpneqps xmm4, xmm2 movaps xmm6, xmm4 andps xmm5, xmm4 andnps xmm6, xmm3 orps xmm6, xmm5 movaps XMMWORD PTR TEST$A$0$0, xmm6 mov esp, eaxQuestion:
- When I do the same code with larger, fixed-size arrays, the optimizer doesn't opt to create temporary arrays and use the value of mask to copy values of b and c Instead it essentially repeats a subset of the above code segment. If the arrays are allocatable things get more complex, but it looks like it uses mask to copy the appropriate values into temporary arrays. Is this always (or mostly) true?
- If mask is "block dense" in the sense that all the true values are grouped together, would a FORALL result in faster code? My sense is that for fixed-size arrays the answer would be "YES" if the arrays were large and for allocatable arrays the answer would be "MAYBE."
- Would using the MKL gemv routine be a good compromise? Would gemv be faster than the WHERE clause method? Would it ever be faster than the FORALL method?
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page