Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Optimization of a WHERE clause

Dishaw__Jim
Beginner
446 Views
I was examining the assembly output of a WHERE clause and I have several questions

Consider the following fortran subset

PROGRAM test

LOGICAL :: mask(4) = (/ .FALSE., .TRUE., .TRUE., .FALSE. /)
REAL :: b(4) = (/4.0, 5.0, 3.0, 2.0/)
REAL :: a(4)
REAL :: c(4) = (/42.0, 47.0, 19.0, 52.0 /)
 
a = 0.0
WHERE(mask)
  a = c * b
END WHERE
END PROGRAM test
Compiling with /Ox /architecture:SSE2 produced the following (excepted) code
movaps    xmm5, XMMWORD PTR TEST$C$0$0
mulps     xmm5, XMMWORD PTR TEST$B$0$0
movdqa    xmm1, XMMWORD PTR TEST$MASK$0$0
pand      xmm1, XMMWORD PTR _2il0floatpacket$1
cvtdq2ps  xmm2, xmm1
pxor      xmm3, xmm3
mov       eax, esp
pxor      xmm0, xmm0
cvtdq2ps  xmm4, xmm0
cmpneqps  xmm4, xmm2
movaps    xmm6, xmm4
andps     xmm5, xmm4
andnps    xmm6, xmm3
orps      xmm6, xmm5
movaps    XMMWORD PTR TEST$A$0$0, xmm6
mov       esp, eax
Question:
  1. When I do the same code with larger, fixed-size arrays, the optimizer doesn't opt to create temporary arrays and use the value of mask to copy values of b and c Instead it essentially repeats a subset of the above code segment. If the arrays are allocatable things get more complex, but it looks like it uses mask to copy the appropriate values into temporary arrays. Is this always (or mostly) true?
  2. If mask is "block dense" in the sense that all the true values are grouped together, would a FORALL result in faster code? My sense is that for fixed-size arrays the answer would be "YES" if the arrays were large and for allocatable arrays the answer would be "MAYBE."
  3. Would using the MKL gemv routine be a good compromise? Would gemv be faster than the WHERE clause method? Would it ever be faster than the FORALL method?

0 Kudos
0 Replies
Reply