- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.
First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?
Thanks,
Nicolas
I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.
First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?
Thanks,
Nicolas
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - c0d1f1ed
Hi all,
I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.
First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?
Thanks,
Nicolas
I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.
First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?
Thanks,
Nicolas
use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - c0d1f1ed
Hi all,
I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.
First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?
Thanks,
Nicolas
I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.
First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?
Thanks,
Nicolas
use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page