- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I'm going to try some simd byte manipulation, but i noticed that byte operations are missing..

I tried to do byte add/sub, by thinking them as word or doublewords, it works, but I don't think it's a good idea. What to do if I need this:

new_byte = (byte * 200 - 50) for each of 16 bytes within a simd reg?

I tried to map the bytes to words, but it's a waste of memory.. is there any other way?

thanks,

Tom

Link Copied

11 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

new_char = char * 20 / 150 + 40

if char is 255, new_char is 74, so no prob with overflow..

thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I can suggest you to use precalculted transformation table for 256 elements and to unroll the cycle in order to decreasing branches

char table[256] = {40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 42, ...., 74};

if the formula is stable, it's preferable rather than multiplication and dividing

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

lookup table is a good idea, but i need to write simd for other reasons too..

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

W O R D

00000011 00000001

B Y T E B Y T E

if I sum word_A + word_B, it's the same of sum byte_A0 + byte_B0, byte_A1 + byte_B1 (or at least if I keep bytes less then 255)

but the * and / sounds a bit harder by shifting, because there are not byte shift instructions, if I shift left that word:

00000011 00000001 --> shift right 2 bits--> 00000000 11000000

so the left byte is ok, but the right one is not..

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

rshift2mask = 00111111 00111111

00000011 00000001 --> shift right 2 bits--> 00000000 11000000

after applying mask by bitwise and:

00000000 00000000

you just need 8 masks for rshift and 8 for lshift

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

using unsigned char for arithmetic, the result of (char*20) cannot exceed 255

The result of (x<256) / 150 in unsigned char arithmetic can only be 0 or 1

Therefore your end result can only be a list of bytes containing 40 or 41

While I won't write the code for you, the gist would be

multiply the 16 bytes by 20

compare result against 16 bytes of 150 producing a mask

negate the mask

add 40 to all bytes in result.

EDIT

However, you state:

>>if char is 255, new_char is 74, so no prob with overflow..

Therefore the original problem statement should have been stated clearly

new_char = (char)((int)char * 20 / 150 + 40)

For this you would modify the above by first converting 8 uchars to 8 uints

then multiply uints by 20 to produce temps

zero results

loop on

compare temps against 150s to produce a mask

if maskall zeros exit

negate mask

add mask to results

subtract 150s from temps

and with mask

end loop

convert 16-bit results back to 8 chars (shuffle)

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

by "mul bytes" you mean using word mult instruction? if I move 16 bytes to register, I need to convert all 16 bytes to integers by shuffling data, or you mean converting before moving to reg?

thanks for your reply guys, I will try those solutions soon

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

__m128i _mm_cvtepu8_epi16 (__m128i a);

This converts 8 uchars into 8 shorts

If you have earlier version of sse use

__m128i _mm_shuffle_epi8 (__m128i a, __m128i b);

Then shuffle can be used afterwards to convert back from 16-bit to 8-bit.

Properly constructed, you could load 16 bytes into SSE register then using shuffle, convert 8 of those to 16 bits, mung those 8, producing 8 results in SSE register, then convert the other 8 bytes to 16-bits, and mung those. IOW one 16-byte load, one 16-byte store (two passes to produce results).

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

you're right, I managed that way..

thanks

thanks

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page