Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Interleaved to Planer (16 bit)

reportbase
Beginner
454 Views

Is there a way to convert a 16 bit image of interleaved (gray and alpha), to its equivalent planer buffers. The following code snippit is the only way that I could come up with. Any ideas to spead this up using IPP. Im aware of "ippiCopy..." but it has no functions split up interleaved 16 bit images into planer data.

unsigned short gray[] = {};

unsigned char gray1[w*h];
unsigned char gray2[w*h];
for (int n = 0; n < w*h; ++n)
{
unsigned char* a = (unsigned char*)&gray;
gray1 = *a++;
gray2 = *a;
}

0 Kudos
1 Solution
renegr
New Contributor I
454 Views

that should do it:

[bash]/*
Created: ReneGr
*/
void Copy_16u_C2P2(const unsigned short* pSrc, unsigned short* pDst1, unsigned short* pDst2, int n)
{
  const __m128i* pSrcMM = (__m128i*)pSrc;
  __m128i* pDstMM1 = (__m128i*)pDst1;
  __m128i* pDstMM2 = (__m128i*)pDst2;
  __m128i t1, t2, t3, t4;
  for( ; n>=16; n-=16, pSrcMM+=2, ++pDstMM1, ++pDstMM2)
  {
    t1 = _mm_unpacklo_epi16( pSrcMM[0], pSrcMM[1]);
    t2 = _mm_unpackhi_epi16( pSrcMM[0], pSrcMM[1]);
    t3 = _mm_unpacklo_epi16( t1, t2);
    t4 = _mm_unpackhi_epi16( t1, t2);
    t1 = _mm_unpacklo_epi16( t3, t4);
    t2 = _mm_unpackhi_epi16( t3, t4);
    _mm_store_si128( pDstMM1, t1);
    _mm_store_si128( pDstMM2, t2);
  }

  if (n)
  {
    // C++ version for last 15 values
  }
}
[/bash]

Attention: the pointers must be 16 byte aligned. If pointers aren't 16 byte aligned, you could use _mm_storeu_si128 for storing and _mm_loadu_si128 for loading. But this will be much slower.

View solution in original post

0 Kudos
3 Replies
renegr
New Contributor I
454 Views
no but it's easy to implement in SSE2 intrinsics. It's just a loop around 3 _mm_unpacklo/_mm_unpackhi commands.
0 Kudos
reportbase
Beginner
454 Views
Thanks. Sadly, I did a google search for a code snippit, but found nothing. I would be eternally in your debt if you could paste a snippit to get me started on how to do this with SSE. Thank you.
0 Kudos
renegr
New Contributor I
455 Views

that should do it:

[bash]/*
Created: ReneGr
*/
void Copy_16u_C2P2(const unsigned short* pSrc, unsigned short* pDst1, unsigned short* pDst2, int n)
{
  const __m128i* pSrcMM = (__m128i*)pSrc;
  __m128i* pDstMM1 = (__m128i*)pDst1;
  __m128i* pDstMM2 = (__m128i*)pDst2;
  __m128i t1, t2, t3, t4;
  for( ; n>=16; n-=16, pSrcMM+=2, ++pDstMM1, ++pDstMM2)
  {
    t1 = _mm_unpacklo_epi16( pSrcMM[0], pSrcMM[1]);
    t2 = _mm_unpackhi_epi16( pSrcMM[0], pSrcMM[1]);
    t3 = _mm_unpacklo_epi16( t1, t2);
    t4 = _mm_unpackhi_epi16( t1, t2);
    t1 = _mm_unpacklo_epi16( t3, t4);
    t2 = _mm_unpackhi_epi16( t3, t4);
    _mm_store_si128( pDstMM1, t1);
    _mm_store_si128( pDstMM2, t2);
  }

  if (n)
  {
    // C++ version for last 15 values
  }
}
[/bash]

Attention: the pointers must be 16 byte aligned. If pointers aren't 16 byte aligned, you could use _mm_storeu_si128 for storing and _mm_loadu_si128 for loading. But this will be much slower.

0 Kudos
Reply