- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Of what?

Your description sounds like you should be looking at CMP... and UCMP... (or VCMP... and VUCMP...)

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Signed or unsigned?

How do you handle saturation? ((src2 + (src2/10)) above max, (src2-(src2/10)) below min)

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

src2/10 is problematic because you do not have an SSE/AVX instruction to divide integers, much less divide unsigned chars. This is why I suggest you examine your requirements to see if you can use divide by 8 or divide by 16. These results can be produced using a shift an mask (x/8 = (xmm >> 3) & maskOf31s), (x/6 = (xmm >> 4) & maskOf15s). IOW using two SSE instructions.

pseudo code

xmm2 = [src2]

xmm1 = [src1]

xmm0 = xmm2 >> shift; (128-bit shift of 3 or 4 bit positions)

xmm0 = xmm0 & mask; (31's or 15's)

xmm3 = xmm2 + xmm0; (src2 + fraction of src2)

xmm4 = xmm2 - xmm0; (src2 - fraction of src2)

xmm3 = xmm1 - xmm3; (src1 - (src2 + fraction of src2))

xmm3 = .not. xmm3

xmm4 = xmm1 - xmm4; (src1 - (src2 - fraction of src2))

xmm4 = xmm3 || xmm4

at this point any byte with the msb set is outside your range

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

);

*);*

);

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Why are you trying to convert to float?

It may be benificial to the readers if you state what you are trying to accomplish (as opposed to how you think you can do it).

RE: fscanf

Can you read the whole array in one shot?

Then use your own conversion function to convert space delimited text integers to "char" and store via p++.

if(readBigArray(big_f, bigBuff)) exit(-1);

if(convert(bigBuff, big_pic, 4096*4096)) exit(-2);

...

int convert(char* b, char* c, int nc)

{

// scan and convert

while(nc--)

{

int i = 0;

while(isspace(*b))

++b;

if(!isdigit(*b))

break;

while(isdigit(*b))

i = i*10 + *b++ - '0';

*c++ = i;

}

return nc;

}

You can add additional error tests if you like.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

As stated earlier you can perform these divisions using the shift and mask (2 instructions) for all 16-bytes.

As opposed to 16 x (maybe 3 instructions).

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Even then, you will not have parallel division for bytes.

Basically, as Jim has already suggested, if you are looking for help it is better to state what you want to accomplish instead of trying to guess how when you are obviously not aware of architectural and instruction set limitations.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Or at least produce what you have told us what you want.

struct div10_s

{

union

{

unsigned char uc[256][256];

unsigned short us[256*256];

};

div10_s()

{

for(int Left=0;Left < 256;++Left)

for(int Right=0; Right < 256; ++Right)

uc[Left][Right] = Right / 10;

} // div10_s()

__m128i _mm_div10_epu8(__m128i a)

{

__declspec(align(16))

unsigned short asuShorts[8];

_mm_store_si128((__m128i*)asuShorts,a);

for(int i = 0; i < 8; ++i)

asShorts

*= us[asShorts*

*];*

return *((__m128i*)asShorts);

}

}

You still should consider /8 or /16 as this can be performed entirely within the SSE instruction set.

Jim Dempseyreturn *((__m128i*)asShorts);

}

}

You still should consider /8 or /16 as this can be performed entirely within the SSE instruction set.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Microsoft Office 2010 is actually the newest software from microsoft office 2010 keys Microsoft Corporation introduced in the last year. Its leading aims tend to be to catch the present business requirements and to be on top of every competition with regard to the international market criteria. This can be a very good idea to obtain Microsoft Office 2010 Key immediately to maintain norton antivirus keys yourself up-to-date and to present you with the vast qualified progress opportunities for success. Microsoft Office 2010 is available in both 32-bit and 64-bit editions, but attention please the two are not able to co-exist on the very same personal computer. All of the Office 2010 editions are kaspersky antivirus keys suitable for Windows XP SP3, Windows Vista and Windows 7.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page