Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Field copy with Alpha

BatterseaSteve
Beginner
795 Views

Hi 

I have a question

I have a source buffer of interlaced, interleaved YUV with alpha

i.e. UYAVYA - 1920x1080

I need to strip the alpha, extract one field, resize it to half and convert to 420planer

as a single buffer (i.e not 3 separate planes but a single buffer of concatenated Y,U,V

Can anyone recommend the fastest set of methods to do this? 

Cheers

Steve

0 Kudos
13 Replies
BatterseaSteve
Beginner
795 Views

The main problem I have with this is actually the alpha strip. i.e. getting the UYAVYA into UYVY.

I tried the following:

IppiSize uyvyRoi = {1,1920*1080};

this->uyvyBuffer = ippsMalloc_8u(1920*1080*sizeof(Ipp8u)*2);

ippiCopy_16u_C1R((Ipp8u *)uyavyaBuffer, 3, this->uyvyBuffer, 2, uyvyRoi);

but this is incredibly slow (15msec!)

I am struggling to find anything else that will strip a single channel from a 3 channel image, or copy a 3 channel image to a 2 channel image.

Any help appreciated.

Steve

0 Kudos
Sergey_K_Intel
Employee
795 Views

Hi Steve,

For stripping alpha channel values you can probably use ippiSwapChannels_8u_C4C3R function. There's dstOrder[3] array, specifying the order of output channels.

Regards,
Sergey 

0 Kudos
Igor_A_Intel
Employee
795 Views

Hi Steve,

as finally you need planar format - you can try to use ippiCopy_8u_C4C1R 3 times - for Y, U and V

regards, Igor

0 Kudos
BatterseaSteve
Beginner
795 Views

Hi Sergey

Thanks for responding. As far as I can tell, the ippiSwapChannels_8u_C4C3R will only swap 4 component to 3 component - I need 3 component to 2 component.

Cheers

Steve

0 Kudos
BatterseaSteve
Beginner
795 Views

OK

I have most of this worked out - except the alpha strip

As I said above, i can use ippiCopy_16u_C1R - but this is taking a long time - 15msecs

Can anyone tell me why?

Cheers

0 Kudos
SergeyKostrov
Valued Contributor II
795 Views
>>As I said above, i can use ippiCopy_16u_C1R - but this is taking a long time - 15msecs >> >>Can anyone tell me why? You didn't tell us anything about hardware, that is CPU, and my question is do you use a right CPU Dispatching DLL for Image Processing domain?
0 Kudos
BatterseaSteve
Beginner
795 Views

Hi Sergey

Thanks for replying 

CPU is E2660, machine is z820 dual CPU 32 cores. So no slouch! Running g windows 7 64 bi

Compiled on vs2010 64 bit

As far as I know I am using auto dispatching. 

I will check though. 

0 Kudos
SergeyKostrov
Valued Contributor II
795 Views
Take a look at Intel® Integrated Performance Primitives for Windows* OS User's Guide: ... Page 19 Intel® Integrated Performance Primitives Theory of Operation Dispatching ... There is a table Identification of Codes Associated with Processor-Specific Libraries.
0 Kudos
BatterseaSteve
Beginner
795 Views

Hi Sergey

I run ippInit since I am using static Libs

I check the cpuType from ippGetCpuType and it returns ippCpuAVX which is correct for my CPU I think.

Cheers

Steve

0 Kudos
BatterseaSteve
Beginner
795 Views

Hi Digging a bit deeper reveals the problem to be due to the stepSize I am using.

If I run:

ippiSize uyvyRoi = {1,1920*1080};

uyavyaBuffer = ippsMalloc_8u(1920*1080*sizeof(Ipp8u)*3);

uyvyBuffer = ippsMalloc_8u(1920*1080*sizeof(Ipp8u)*2);

ippiCopy_16u_C1R((Ipp8u *)uyavyaBuffer, 3, uyvyBuffer, 2, uyvyRoi);

I get anything from 7-15msecs.

If I run:

ippiSize uyvyRoi = {1920,1080};

uyavyaBuffer = ippsMalloc_8u(1920*1080*sizeof(Ipp8u)*3);

uyvyBuffer = ippsMalloc_8u(1920*1080*sizeof(Ipp8u)*2);

ippiCopy_16u_C1R((Ipp8u *)uyavyaBuffer, 1920*2, uyvyBuffer, 1920*2, uyvyRoi);

I get 0.2msecs......

So the point is that trying to strip the alpha by treating the source buffer as a 3x(1920x1080) buffer and the dest as a 2x(1920x1080) buffer is bad. Which puts me back to square one...... Anyone got any other ideas - 'cause I can write a c loop thats quicker.

Cheers Steve

0 Kudos
Igor_A_Intel
Employee
795 Views

Hi Steve,

as I've already answered - you should use ippiCopy_8u_C3C1R 2 times (as finaly you need planar format) - this F extracts any channel you need from C3 image (so C4C1 - the same but for C4 image)

regards, Igor

0 Kudos
BatterseaSteve
Beginner
795 Views

Hi Igor

Thanks for replying and sorry - I did see your comment before and thought I had replied.

The problem with ippiCopy_8u_C3C1R is that it extracts the UYA VYA image into Y's and UV's.

I tried this as you suggested but I need the Y's the U's and the V's in separate buffers.

So the question remains as to how I extract the UV's into separate U and V buffers?

I have looked at the format conversions to see if there was one that would take NV12 422 and convert to YUV420p but I could not see one.

I've been on this a week now and am now delving into trying to use SSE!

Any thoughts gratefully received?

Cheers

Steve

0 Kudos
BatterseaSteve
Beginner
795 Views

Hi

I solved this using a bit of bespoke SSE to split the UV buffer into separate U and V buffers.

Total time seems acceptable.

Cheers

Steve

0 Kudos
Reply