- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My master plan is all thwarted...
The data I need to sort fits quite nicely in one 64 bits and two 32 bits values. They're all positive and their encoding is such that they can be sorted as if they were one big 128 bits integer. It was all planned from the start so that I could load'em straight to SSE registers, do some easy comparison work and get the sorting done in the blink of an eye.
Unfortunately, without the ability to compare unsigned integers values, I end up with this code as my best guess:
[cpp] bool isLess(const sDisplayItem &A, const sDisplayItem &B) { /* TODO: there should be a faster way... */ int ab,ba; __m128i mX = _mm_set1_epi32( 0x80000000 ); __m128i mA = _mm_sub_epi32( _mm_load_si128( (__m128i *)&A), mX); /* make values signed */ __m128i mB = _mm_sub_epi32( _mm_load_si128( (__m128i *)&B), mX); /* make values signed */ __m128i AB = _mm_cmplt_epi32(mA, mB); __m128i BA = _mm_cmpgt_epi32(mA, mB); ab = _mm_movemask_ps(_mm_castsi128_ps(AB)); ba = _mm_movemask_ps(_mm_castsi128_ps(BA)); return ab>ba; } [/cpp]Anybody ran into a similar situation before, any trick to share ? Thanks
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A simpler version was found, using SSE4.
But is this the best way?
[cpp] bool isLess(const sDisplayItem &A, const sDisplayItem &B) { int ab,ba; __m128i mA = _mm_load_si128( (__m128i *)&A); __m128i mB = _mm_load_si128( (__m128i *)&B); __m128i mC = _mm_min_epu32(mA, mB); __m128i AB = _mm_cmpeq_epi32(mA, mC); __m128i BA = _mm_cmpeq_epi32(mB, mC); ab = _mm_movemask_ps(_mm_castsi128_ps(AB)); ba = _mm_movemask_ps(_mm_castsi128_ps(BA)); return ab>ba; } [/cpp]
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page