Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Replacement of packusdw

capens__nicolas
New Contributor I
331 Views
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas
0 Kudos
1 Solution
neni
New Contributor II
331 Views
Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version

View solution in original post

0 Kudos
1 Reply
neni
New Contributor II
332 Views
Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version
0 Kudos
Reply