- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been searching for days for the reverse of the PMOVMSKB instruction.
I want to collapse a 64bit result down to 8bits and then restore it again.
for example
0xFF00FF00FF00FF00 = (PMOVMSKB) 10101010b = 0xFF00FF00FF00FF00
Can anyone help. If I had hair I'd be pulling it out :-)
I want to collapse a 64bit result down to 8bits and then restore it again.
for example
0xFF00FF00FF00FF00 = (PMOVMSKB) 10101010b = 0xFF00FF00FF00FF00
Can anyone help. If I had hair I'd be pulling it out :-)
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just use a lookup table (LUT). It only needs 256 entries. So the reverse of "pmovmskb eax, mm0" becomes a simple "movq mm0, [LUT+eax*8]".
Although a 2 kB lookup table isn't much, here's an alternative that requires no table in case you have very poor L1 cache hit ratios:
Nicolas
Although a 2 kB lookup table isn't much, here's an alternative that requires no table in case you have very poor L1 cache hit ratios:
[plain]movd mm0, eax punpcklbw mm0, mm0 pshufw mm0, mm0, 0x00 pand mm0, [mask8040201008040201h] pcmpeb mm0, [mask8040201008040201h][/plain]I hope this helps!
Nicolas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - c0d1f1ed
Just use a lookup table (LUT). It only needs 256 entries. So the reverse of "pmovmskb eax, mm0" becomes a simple "movq mm0, [LUT+eax*8]".
Although a 2 kB lookup table isn't much, here's an alternative that requires no table in case you have very poor L1 cache hit ratios:
Nicolas
Although a 2 kB lookup table isn't much, here's an alternative that requires no table in case you have very poor L1 cache hit ratios:
[plain]movd mm0, eaxI hope this helps!
punpcklbw mm0, mm0
pshufw mm0, mm0, 0x00
pand mm0, [mask8040201008040201h]
pcmpeb mm0, [mask8040201008040201h][/plain]
Nicolas
Thanks for this. I had hoped for a single instruction to do it, but this combination is fine.
Thanks again, Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - dattrax
Thanks for this. I had hoped for a single instruction to do it, but this combination is fine.
Thanks again, Jim
Thanks again, Jim
You're welcome. Note that the LUT method really isa single instruction solution. You can actually use the second method to fill in the table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my opinion the code should restore just the sign bits to MM register unless of course you wanted to use 0xFF and 0x00 bytes as masks for AND or OR instructions later but I guess if that was the case it would be much easier just to shift right arithmetically by 7 bits (thus propagating sign bit) instead of using PMOVMSKB in the first place. Or is this some sort of "compression"?
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page