I need to form the cross products of a single vector with a long string of other vectors (all vectors 3D). I have not been able to see how to vectorize this process in Fortran. C++ allows insertion of assembler, so something can be hand-crafted in that compiler. But it's not portable. Any suggestions?
Fortran allows the programmer to write out steps using vector syntax that the compiler could, in principle, translate directly into efficient SIMD code. However, I have never seen ifort perform such a translation. Can you perhaps be a bit more specific as to the KINDs of your data and how they are laid out in memory?
All the vectors involved use 4-byte floats. The layout in memory can be changed to make vectorization possible. But, for example, in other vectorized operations in the same section of code, the data is already laid out Vector(1:N, 1:3) rather than Vector(1:3,1:N). That is all the x coordinates are in contiguous memory. As with the y and z-coordinates.
Can you explain in a little more detail what your in's and out's are. From my read (which may be incorrect). Pseudo code
For Each 3PointVectorReference v in LongVectorOf3PointVectors
v = v cross Crossing3PointVector
IOW you seem to be asking for an in situ cross. Is this correct?
Or, is the output going to a different LongVectorOf3PointVectors?
You might want to make a variation on this:
! in some module ! AVX four-up double vector type TypeYMM SEQUENCE real(8) :: v(0:3) end type TypeYMM ... RECURSIVE SUBROUTINE CROSS_ymm_ymm_ymm(VA,VB,VC) use MOD_ALL ! contains types type(TypeYMM) :: VA(3), VB(3), VC(3) type(TypeYMM), automatic :: VS(3) if((loc(VC) .eq. loc(VA)) .or. (loc(VC) .eq. loc(VB))) then ! COMPUTE THE CROSS PRODUCT !DEC$ VECTOR ALWAYS VS(1)%v = VA(2)%v*VB(3)%v - VA(3)%v*VB(2)%v !DEC$ VECTOR ALWAYS VS(2)%v = VA(3)%v*VB(1)%v - VA(1)%v*VB(3)%v !DEC$ VECTOR ALWAYS VS(3)%v = VA(1)%v*VB(2)%v - VA(2)%v*VB(1)%v ! TRANSFER THE CROSS PRODUCT RESULT TO THE OUTPUT VECTOR !DEC$ VECTOR ALWAYS VC(1)%v = VS(1)%v !DEC$ VECTOR ALWAYS VC(2)%v = VS(2)%v !DEC$ VECTOR ALWAYS VC(3)%v = VS(3)%v else ! COMPUTE THE CROSS PRODUCT without temps !DEC$ VECTOR ALWAYS VC(1)%v = VA(2)%v*VB(3)%v - VA(3)%v*VB(2)%v !DEC$ VECTOR ALWAYS VC(2)%v = VA(3)%v*VB(1)%v - VA(1)%v*VB(3)%v !DEC$ VECTOR ALWAYS VC(3)%v = VA(1)%v*VB(2)%v - VA(2)%v*VB(1)%v endif END SUBROUTINE CROSS_ymm_ymm_ymm
The obvious thing to do is
cross_product = reshape([single(2)*vector(:,3)-single(3)*vector(:,2),single(3)*vector(:,1)-single(1)*vector(:,3),single(1)*vector(:,2)-single(2)*vector(:,1)],shape(vector))
which gives the compiler maximum information and freedom to vectorize. However, don't be surprised if given sufficient rope, the compiler manages to hang itself. Are you targeting systems with xmm, ymm, or zmm registers?
In a bored moment I ran some quick cross product tests. Vectorisation ( for real(4) ) did speed it up but even without I could do 600,000 cross products in less than 1ms on a basic laptop (I5 processor). Is it worth worrying about?