Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.
3058 Discussions

Performance of copy to system memory after hardware decoding

EtienneCo
Beginner
743 Views

Hello, I am trying to use vaapi in gstreamer to perform jpeg decoding in hardware before my application post-processes the jpegs in system memory.

I found the articles about the performance bottleneck of copying from video-memory to system-memory (https://software.intel.com/content/www/us/en/develop/articles/copying-accelerated-video-decode-frame-buffers.html), even though I am not sure how optimized the copy in the gstreamer elements using vaapi is.

For test-purpose, I checked two gstreamer pipelines with several test jpegs on Ubuntu 20.10.
 This pipeline using vaapijpegdec  takes 2.9 seconds to run (I first checked using the iHD driver but it was taking 19 seconds):

user@ark1220-desktop:~/testimages$ export LIBVA_DRIVER_NAME=i965
user@ark1220-desktop:~/testimages$ gst-launch-1.0 -v multifilesrc location="%03d.jpg" index=0 ! jpegparse ! vaapijpegdec ! filesink location=out
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Got context from element 'vaapidecode_jpeg0': gst.gl.GLDisplay=context, gst.gl.GLDisplay=(GstGLDisplay)"\(GstGLDisplayX11\)\ gldisplayx11-0";
Got context from element 'vaapidecode_jpeg0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)"\(GstVaapiDisplayGLX\)\ vaapidisplayglx0";
/GstPipeline:pipeline0/GstJpegParse:jpegparse0.GstPad:src: caps = image/jpeg, parsed=(boolean)true, format=(string)I420, width=(int)688, height=(int)512, framerate=(fraction)1/1
/GstPipeline:pipeline0/GstVaapiDecode_jpeg:vaapidecode_jpeg0.GstPad:sink: caps = image/jpeg, parsed=(boolean)true, format=(string)I420, width=(int)688, height=(int)512, framerate=(fraction)1/1
Redistribute latency...
/GstPipeline:pipeline0/GstVaapiDecode_jpeg:vaapidecode_jpeg0.GstPad:src: caps = video/x-raw, format=(string)NV12, width=(int)688, height=(int)512, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)1/1
/GstPipeline:pipeline0/GstFileSink:filesink0.GstPad:sink: caps = video/x-raw, format=(string)NV12, width=(int)688, height=(int)512, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)1/1
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:02.924764261
Setting pipeline to NULL ...
Freeing pipeline ...

 

This other pipeline using software decoding takes 2.06 seconds to run:

$ gst-launch-1.0 -v multifilesrc location="%03d.jpg" index=0 ! jpegparse ! jpegdec ! fakesink

Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
/GstPipeline:pipeline0/GstJpegParse:jpegparse0.GstPad:src: caps = image/jpeg, parsed=(boolean)true, format=(string)I420, width=(int)688, height=(int)512, framerate=(fraction)1/1
/GstPipeline:pipeline0/GstJpegDec:jpegdec0.GstPad:sink: caps = image/jpeg, parsed=(boolean)true, format=(string)I420, width=(int)688, height=(int)512, framerate=(fraction)1/1
/GstPipeline:pipeline0/GstJpegDec:jpegdec0.GstPad:src: caps = video/x-raw, format=(string)I420, width=(int)688, height=(int)512, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)1:4:0:0, framerate=(fraction)1/1
/GstPipeline:pipeline0/GstFakeSink:fakesink0.GstPad:sink: caps = video/x-raw, format=(string)I420, width=(int)688, height=(int)512, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)1:4:0:0, framerate=(fraction)1/1
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:02.062320412
Setting pipeline to NULL ...
Freeing pipeline ...

 

Is this approximately the expected performance? (that copy from video-memory to system-memory takes as much time as doing the decoding itself). Or is this level of performance unexpected and the gstreamer pipeline should be optimized? This is on an E3940 CPU.

Thanks a lot
Etienne

0 Kudos
0 Replies
Reply