Sorting an array of 128bits keys

Jerome_M_Intel · ‎07-19-2010

My master plan is all thwarted... The data I need to sort fits quite nicely in one 64 bits and two 32 bits values. They're all positive and their encoding is such that they can be sorted as if they were one big 128 bits integer. It was all planned from the start so that I could load'em straight to SSE registers, do some easy comparison work and get the sorting done in the blink of an eye. Unfortunately, without the ability to compare unsigned integers values, I end up with this code as my best guess:

[cpp]
    bool isLess(const sDisplayItem &A, const sDisplayItem &B)
    {    /* TODO: there should be a faster way... */
        int ab,ba;
            
        __m128i mX = _mm_set1_epi32( 0x80000000 );        
        __m128i mA = _mm_sub_epi32( _mm_load_si128( (__m128i *)&A), mX); /* make values signed */
        __m128i mB = _mm_sub_epi32( _mm_load_si128( (__m128i *)&B), mX); /* make values signed */
        
        __m128i AB = _mm_cmplt_epi32(mA, mB);    
        __m128i BA = _mm_cmpgt_epi32(mA, mB);    
        
        ab = _mm_movemask_ps(_mm_castsi128_ps(AB));
        ba = _mm_movemask_ps(_mm_castsi128_ps(BA));
        
        return ab>ba;
    }
[/cpp]

Anybody ran into a similar situation before, any trick to share ? Thanks

Jerome_M_Intel · ‎07-19-2010

A simpler version was found, using SSE4. But is this the best way?

[cpp]
	bool isLess(const sDisplayItem &A, const sDisplayItem &B) 
	{	
		int ab,ba;
			
		__m128i mA = _mm_load_si128( (__m128i *)&A);
		__m128i mB = _mm_load_si128( (__m128i *)&B);
		__m128i mC = _mm_min_epu32(mA, mB);
		
		__m128i AB = _mm_cmpeq_epi32(mA, mC);	
		__m128i BA = _mm_cmpeq_epi32(mB, mC);	
		
		ab = _mm_movemask_ps(_mm_castsi128_ps(AB));
		ba = _mm_movemask_ps(_mm_castsi128_ps(BA)); 
		
		return ab>ba;
	}
[/cpp]