Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

David_DiLaura1

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-23-2016
01:00 PM

7 Views

SIMD cross product

Colleagues,

I need to form the cross products of a single vector with a long string of other vectors (all vectors 3D). I have not been able to see how to vectorize this process in Fortran. C++ allows insertion of assembler, so something can be hand-crafted in that compiler. But it's not portable. Any suggestions?

David

5 Replies

Highlighted
##

Fortran allows the programmer to write out steps using vector syntax that the compiler could, in principle, translate directly into efficient SIMD code. However, I have never seen ifort perform such a translation. Can you perhaps be a bit more specific as to the KINDs of your data and how they are laid out in memory?

JVanB

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-23-2016
02:32 PM

7 Views

Fortran allows the programmer

Highlighted
##

All the vectors involved use 4-byte floats. The layout in memory can be changed to make vectorization possible. But, for example, in other vectorized operations in the same section of code, the data is already laid out Vector(1:N, 1:3) rather than Vector(1:3,1:N). That is all the x coordinates are in contiguous memory. As with the y and z-coordinates.

David_DiLaura1

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-23-2016
03:12 PM

7 Views

All the vectors involved use

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-23-2016
04:02 PM

7 Views

Can you explain in a little

Can you explain in a little more detail what your in's and out's are. From my read (which may be incorrect). Pseudo code

LongVectorOf3PointVectors

Crossing3PointVector

For Each 3PointVectorReference v in LongVectorOf3PointVectors

v = v cross Crossing3PointVector

end for

IOW you seem to be asking for an in situ cross. Is this correct?

Or, is the output going to a different LongVectorOf3PointVectors?

You might want to make a variation on this:

! in some module ! AVX four-up double vector type TypeYMM SEQUENCE real(8) :: v(0:3) end type TypeYMM ... RECURSIVE SUBROUTINE CROSS_ymm_ymm_ymm(VA,VB,VC) use MOD_ALL ! contains types type(TypeYMM) :: VA(3), VB(3), VC(3) type(TypeYMM), automatic :: VS(3) if((loc(VC) .eq. loc(VA)) .or. (loc(VC) .eq. loc(VB))) then ! COMPUTE THE CROSS PRODUCT !DEC$ VECTOR ALWAYS VS(1)%v = VA(2)%v*VB(3)%v - VA(3)%v*VB(2)%v !DEC$ VECTOR ALWAYS VS(2)%v = VA(3)%v*VB(1)%v - VA(1)%v*VB(3)%v !DEC$ VECTOR ALWAYS VS(3)%v = VA(1)%v*VB(2)%v - VA(2)%v*VB(1)%v ! TRANSFER THE CROSS PRODUCT RESULT TO THE OUTPUT VECTOR !DEC$ VECTOR ALWAYS VC(1)%v = VS(1)%v !DEC$ VECTOR ALWAYS VC(2)%v = VS(2)%v !DEC$ VECTOR ALWAYS VC(3)%v = VS(3)%v else ! COMPUTE THE CROSS PRODUCT without temps !DEC$ VECTOR ALWAYS VC(1)%v = VA(2)%v*VB(3)%v - VA(3)%v*VB(2)%v !DEC$ VECTOR ALWAYS VC(2)%v = VA(3)%v*VB(1)%v - VA(1)%v*VB(3)%v !DEC$ VECTOR ALWAYS VC(3)%v = VA(1)%v*VB(2)%v - VA(2)%v*VB(1)%v endif END SUBROUTINE CROSS_ymm_ymm_ymm

Jim Dempsey

Highlighted
##

JVanB

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-23-2016
04:07 PM

7 Views

The obvious thing to do is

The obvious thing to do is

cross_product = reshape([single(2)*vector(:,3)-single(3)*vector(:,2),single(3)*vector(:,1)-single(1)*vector(:,3),single(1)*vector(:,2)-single(2)*vector(:,1)],shape(vector))

which gives the compiler maximum information and freedom to vectorize. However, don't be surprised if given sufficient rope, the compiler manages to hang itself. Are you targeting systems with xmm, ymm, or zmm registers?

Highlighted
##

In a bored moment I ran some quick cross product tests. Vectorisation ( for real(4) ) did speed it up but even without I could do 600,000 cross products in less than 1ms on a basic laptop (I5 processor). Is it worth worrying about?

andrew_4619

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-23-2016
06:43 PM

7 Views

In a bored moment I ran some

For more complete information about compiler optimizations, see our Optimization Notice.