Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Jack_Chimasera
Beginner
65 Views

Haswell VPP colour-conversion performance issue

Hello

I've been developing a solution with a full in-gpu pipeline for a customer. It decodes multiple streams, blits the frames into a common RGB32 surface, uses VPP to convert it to NV12, and encodes it into a new stream. The pipeline looks like so :

H.264 streams -> decoders -> NV12 surfaces -> blit -> RGB32 surface -> VPP conversion -> NV12 surface -> encoder

On Ivy Bridge i3-3225 (HD4000) I got a result of 183fps (1080p, lowest quality, every parameter is equal between tests).

Of a new haswell i5 (HD4600), I got a very disappointing result of 139fps, much lower than before.

Investigating further, I've found the culprit to be the VPP colour space conversion. If I removed just this operation from the pipeline, The old HD4000's performance moved up modestly to 201fps, while the HD4600 jumped to about twice its former performance, 263fps.

Is this a known issue ? Are there plans to fix it ? Is there a way around it ?

regards

Jack Chimasera

0 Kudos
4 Replies
Anthony_P_Intel
Employee
65 Views

Hi Jack,

Did you see my response to question here? http://software.intel.com/en-us/forums/topic/476401

Also, There are some processing options that might be getting applied that would affect performance and you may not be intending to use.  (See section 4.11 of the Developers Guide for discussion of "Hint-based VPP filters")

Jack_Chimasera
Beginner
65 Views

Hello Tony

I've seen your response. However, currently I've lost access to the Haswell machine I was testing on, so it'll take another week until I can test it again.

Your comment here makes more sense than the comment there, that Haswell adds extra filters I didnt' ask for. I'll check that, thanks.

regards

Jack Chimasera

Jack_Chimasera
Beginner
65 Views

Hello Tony

I have tried to disable all hint-based processing. However, only 3 types (the first 3 specified below) indeed "allowed" themselves to be "do not use"'d, while the rest caused initialization failure.

Running with these 3 options didn't help at all, I'm afraid, leaving the performance as bad as it was, far inferior to Ivybridge i3-3225.

regards

Jack Chimasera

Anthony_P_Intel
Employee
65 Views

Hi Jack,

We are still looking into this, and I believe there are a few factors at work here.  One issue may be a bug in the drivers (please watch for updates soon), but another issue my be the architecture of the application.  Please be sure that your code is written to allow full asynchronous use of available resources, as there are some implemenation differences between the two platforms, and there are some known cases where synchronous workloads might be slower on the newer platform.

Reply