Link Copied

Of what?

Your description sounds like you should be looking at CMP... and UCMP... (or VCMP... and VUCMP...)

Jim Dempsey

Signed or unsigned?

How do you handle saturation? ((src2 + (src2/10)) above max, (src2-(src2/10)) below min)

Jim Dempsey

src2/10 is problematic because you do not have an SSE/AVX instruction to divide integers, much less divide unsigned chars. This is why I suggest you examine your requirements to see if you can use divide by 8 or divide by 16. These results can be produced using a shift an mask (x/8 = (xmm >> 3) & maskOf31s), (x/6 = (xmm >> 4) & maskOf15s). IOW using two SSE instructions.

pseudo code

xmm2 = [src2]

xmm1 = [src1]

xmm0 = xmm2 >> shift; (128-bit shift of 3 or 4 bit positions)

xmm0 = xmm0 & mask; (31's or 15's)

xmm3 = xmm2 + xmm0; (src2 + fraction of src2)

xmm4 = xmm2 - xmm0; (src2 - fraction of src2)

xmm3 = xmm1 - xmm3; (src1 - (src2 + fraction of src2))

xmm3 = .not. xmm3

xmm4 = xmm1 - xmm4; (src1 - (src2 - fraction of src2))

xmm4 = xmm3 || xmm4

at this point any byte with the msb set is outside your range

Jim Dempsey

);

*);*

);

Why are you trying to convert to float?

It may be benificial to the readers if you state what you are trying to accomplish (as opposed to how you think you can do it).

RE: fscanf

Can you read the whole array in one shot?

Then use your own conversion function to convert space delimited text integers to "char" and store via p++.

if(readBigArray(big_f, bigBuff)) exit(-1);

if(convert(bigBuff, big_pic, 4096*4096)) exit(-2);

...

int convert(char* b, char* c, int nc)

{

// scan and convert

while(nc--)

{

int i = 0;

while(isspace(*b))

++b;

if(!isdigit(*b))

break;

while(isdigit(*b))

i = i*10 + *b++ - '0';

*c++ = i;

}

return nc;

}

You can add additional error tests if you like.

Jim Dempsey

As stated earlier you can perform these divisions using the shift and mask (2 instructions) for all 16-bytes.

As opposed to 16 x (maybe 3 instructions).

Jim Dempsey

Even then, you will not have parallel division for bytes.

Basically, as Jim has already suggested, if you are looking for help it is better to state what you want to accomplish instead of trying to guess how when you are obviously not aware of architectural and instruction set limitations.

Or at least produce what you have told us what you want.

struct div10_s

{

union

{

unsigned char uc[256][256];

unsigned short us[256*256];

};

div10_s()

{

for(int Left=0;Left < 256;++Left)

for(int Right=0; Right < 256; ++Right)

uc[Left][Right] = Right / 10;

} // div10_s()

__m128i _mm_div10_epu8(__m128i a)

{

__declspec(align(16))

unsigned short asuShorts[8];

_mm_store_si128((__m128i*)asuShorts,a);

for(int i = 0; i < 8; ++i)

asShorts

*= us[asShorts*

*];*

return *((__m128i*)asShorts);

}

}

You still should consider /8 or /16 as this can be performed entirely within the SSE instruction set.

Jim Dempseyreturn *((__m128i*)asShorts);

}

}

You still should consider /8 or /16 as this can be performed entirely within the SSE instruction set.

Jim Dempsey

