Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

Replacement of packusdw

capens__nicolas
New Contributor I
163 Views
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas
0 Kudos
1 Solution
neni
New Contributor II
163 Views
Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version

View solution in original post

1 Reply
neni
New Contributor II
164 Views
Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version
Reply