- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The correct mnemonic is PSADBW (compute sum of absolute differences). This can be use to (sort of) perform a horizontal add under some circumstances. The code snippet I gave was incorrect and after looking at the code again, it was incomplete. The idea was to use this instruction to count the bytes == 0xFF.
The code though bloats up to (untested code):
[plain]movaps xmm0,[somewhere]; xmm0 = read of 4 floats
pxor xmm1,xmm1 ; xmm1 = 4 0.0's
cmpgtps xmm0,xmm1 ; xmm0 = xmm0.gt.xmm1 indicators (xmm1 still all 0's)
pasdbw xmm0,xmm1 ; xmm0 (lsw of low half) = sum of absolute differences in bytes 0:7 of xmm0-xmm1
; xmm0 (lsw of high half) = sum of absolute differences in bytes 8:15 of xmm0-xmm1
; When words of xmm1 = {0,0,0,8,0,0,0,8} indicates all 4 floats were < 0
; now check for double 8's
pshufd xmm0,xmm0,0330 ; shuffle dwords 11,01,10,00 {0,0,8,8} indicates all 4 floats were < 0
pasdbw xmm0,xmm1 ; low half xmm0 words {0,0,0,16} indicates all 4 floats were < 0
pextrw eax,xmm1,0 ; extract low word of xmm1 into eax
test eax,16
[/plain]
>>How are the xmm registered manipulated directly in Fortran?
Your original post said you were using SSE instructions - so I assume assembler (which you can mix with FORTRAN).
If you want to stick with FORTRAN then do as one of the earlier suggestions - use the logical functions on a union with the floats (real(4)'s) e.g.
! ARRAY I(1:4) OVERLAYSARRAY OF4 REAL(4)'S
IF(RSHFT(I(1),31)+RSHFT(I(2),31)+RSHFT(I(3),31)+RSHIFT(I(4),31) .EQ. 4) THEN...
Check the code out with optimizations enabled, this may do a good enough job.
Note, there is one "minor" error in the above. Floating point numbers have +0.0 and -0.0 (did you know this?). The above will include -0.0 in the set of .lt.0. You will have to decide if this is correct or not.
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure from your description why you wouldn't use something like if(any(logicalarray)), leaving it up to the compiler to decide whether to use masking operations, or, as hinted in previous responses, use masking intrinsics explicitly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
movaps xmm0,[somewhere] ; xmm0 = read of 4 floats
pxor xmm1,xmm1 ; xmm1 = 4 0.0's
cmpgeps xmm1,xmm0 ; xmm1 = .lt.0 indicators
pasdbw xmm2,xmm1 ; xmm2 (lsw) = sum of absolute differences in bytes
; 0 when all flags 0 or all flags FFFFFFFF (-1)
; +n when flags differ
paddd xmm2,xmm1 ; xmm2(lsdw) -1 only when all flags were FFFFFFFF
; .ge. 0 when not
movss temp,xmm2 ;
test temp,0
...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Grrr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The correct mnemonic is PSADBW (compute sum of absolute differences). This can be use to (sort of) perform a horizontal add under some circumstances. The code snippet I gave was incorrect and after looking at the code again, it was incomplete. The idea was to use this instruction to count the bytes == 0xFF.
The code though bloats up to (untested code):
[plain]movaps xmm0,[somewhere]; xmm0 = read of 4 floats
pxor xmm1,xmm1 ; xmm1 = 4 0.0's
cmpgtps xmm0,xmm1 ; xmm0 = xmm0.gt.xmm1 indicators (xmm1 still all 0's)
pasdbw xmm0,xmm1 ; xmm0 (lsw of low half) = sum of absolute differences in bytes 0:7 of xmm0-xmm1
; xmm0 (lsw of high half) = sum of absolute differences in bytes 8:15 of xmm0-xmm1
; When words of xmm1 = {0,0,0,8,0,0,0,8} indicates all 4 floats were < 0
; now check for double 8's
pshufd xmm0,xmm0,0330 ; shuffle dwords 11,01,10,00 {0,0,8,8} indicates all 4 floats were < 0
pasdbw xmm0,xmm1 ; low half xmm0 words {0,0,0,16} indicates all 4 floats were < 0
pextrw eax,xmm1,0 ; extract low word of xmm1 into eax
test eax,16
[/plain]
>>How are the xmm registered manipulated directly in Fortran?
Your original post said you were using SSE instructions - so I assume assembler (which you can mix with FORTRAN).
If you want to stick with FORTRAN then do as one of the earlier suggestions - use the logical functions on a union with the floats (real(4)'s) e.g.
! ARRAY I(1:4) OVERLAYSARRAY OF4 REAL(4)'S
IF(RSHFT(I(1),31)+RSHFT(I(2),31)+RSHFT(I(3),31)+RSHIFT(I(4),31) .EQ. 4) THEN...
Check the code out with optimizations enabled, this may do a good enough job.
Note, there is one "minor" error in the above. Floating point numbers have +0.0 and -0.0 (did you know this?). The above will include -0.0 in the set of .lt.0. You will have to decide if this is correct or not.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
movaps xmm0,[somewhere]; xmm0 = read of 4 floats
psrad xmm0,31 ; xmm0 = shift of sign bits through entire dword
pxor xmm1,xmm1 ; xmm1 = 4 0.0's
cmpgtps xmm0,xmm1 ; xmm0 = xmm0.gt.xmm1 indicators (xmm1 still all 0's)
pasdbw xmm0,xmm1 ; xmm0 (lsw of low half) = sum of absolute differences in bytes 0:7 of xmm0-xmm1
; xmm0 (lsw of high half) = sum of absolute differences in bytes 8:15 of xmm0-xmm1
; When words of xmm1 = {0,0,0,8,0,0,0,8} indicates all 4 floats were < 0
; now check for double 8's
pshufd xmm0,xmm0,0330 ; shuffle dwords 11,01,10,00 {0,0,8,8} indicates all 4 floats were < 0
pasdbw xmm0,xmm1 ; low half xmm0 words {0,0,0,16} indicates all 4 floats were < 0
pextrw eax,xmm1,0 ; extract low word of xmm1 into eax
test eax,16
Jim

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page