Community
cancel
Showing results for 
Search instead for 
Did you mean: 
capens__nicolas
New Contributor I
86 Views

Replacement of packusdw

Jump to solution
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas
0 Kudos
1 Solution
neni
New Contributor II
86 Views
Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version

View solution in original post

1 Reply
neni
New Contributor II
87 Views
Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version

View solution in original post

Reply