- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My MSDK version is Linux_16.1.64.1.11164, when I use VPP in my project(yv12->nv12->vpp->264 encode), I found speed is not enough for me.
my project can be descripted briefly as follow:
yv12->nv12->vpp->h264 encode->mpegts
I made some experiences and get some result by using the sample:
1.for 264 encode
./sample_encode_drm h264 -f 25 -b 2048 -w 720 -h 576 -i /home/wdg/video/beijing420p.yuv -o /home/wdg/video/intel.new.h264 -hw -u quality
./sample_encode_drm h264 -f 25 -b 2048 -w 720 -h 576 -i /home/wdg/video/beijing420p.yuv -o /home/wdg/video/intel.new.h264 -hw -u quality
libva info: VA-API version 0.34.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Destination picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Frame rate 25.00
Bit rate(Kbps) 2048
Target usage quality
Memory type system
Media SDK impl hw
Media SDK version 1.6
Processing started
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Destination picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Frame rate 25.00
Bit rate(Kbps) 2048
Target usage quality
Memory type system
Media SDK impl hw
Media SDK version 1.6
Frame number: 3000, fps:382.55, spend:7.84s
Processing finished
real 0m7.853s
user 0m1.932s
sys 0m0.920s
the fps is about 382
2.for vpp
./sample_vpp_drm -lib hw -sw 720 -sh 576 -scc yv12 -dw 720 -dh 576 -denoise 32 -i /home/wdg/video/beijing420p.yuv -o /home/wdg/video/out.yuv
libva info: VA-API version 0.34.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Intel(R) Media SDK VPP Sample Version 0.0.000.0000
Input format YV12
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Frame rate 30.00
PicStruct progressive
Output format NV12
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Frame rate 30.00
PicStruct progressive
Video Enhancement Algorithms
Denoise ON
VideoAnalysis OFF
ProcAmp OFF
Detail OFF
ImgStab OFF
Memory type system
MediaSDK impl hw
MediaSDK ver 1.6
VPP started
Frame number: 3000
VPP finished
real 0m28.072s
user 0m20.269s
sys 0m3.336s
fps = 3000/28.072=106.86
3.for HD encode
./sample_encode_drm h264 -f 25 -b 4000 -w 1920 -h 1080 -i /home/wdg/video/hd1080p_1000.yuv -o /home/wdg/video/intel.new.hd.h264 -hw -u balanced
Input file format YUV420
Output video AVC
Source picture:
Resolution 1920x1088
Crop X,Y,W,H 0,0,1920,1080
Destination picture:
Resolution 1920x1088
Crop X,Y,W,H 0,0,1920,1080
Frame rate 25.00
Bit rate(Kbps) 4000
Target usage balanced
Memory type system
Media SDK impl hw
Media SDK version 1.6
Processing started
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 1920x1088
Crop X,Y,W,H 0,0,1920,1080
Destination picture:
Resolution 1920x1088
Crop X,Y,W,H 0,0,1920,1080
Frame rate 25.00
Bit rate(Kbps) 4000
Target usage balanced
Memory type system
Media SDK impl hw
Media SDK version 1.6
Frame number: 1000, fps:95.37, spend:10.49s
Processing finished
real 0m10.568s
user 0m2.924s
sys 0m0.696s
the fps is about 95
4.for HD vpp
./sample_vpp_drm -lib hw -sw 1920 -sh 1080 -scc yv12 -dw 1920 -dh 1080 -denoise 32 -i /home/wdg/video/hd1080p_1000.yuv -o /home/wdg/video/out.yuv
Input format YV12
Resolution 1920x1088
Crop X,Y,W,H 0,0,1920,1080
Frame rate 30.00
PicStruct progressive
Output format NV12
Resolution 1920x1088
Crop X,Y,W,H 0,0,1920,1080
Frame rate 30.00
PicStruct progressive
Video Enhancement Algorithms
Denoise ON
VideoAnalysis OFF
ProcAmp OFF
Detail OFF
ImgStab OFF
Memory type system
MediaSDK impl hw
MediaSDK ver 1.6
VPP started
Frame number: 1000
VPP finished
real 0m38.521s
user 0m23.905s
sys 0m4.044s
fps = 1000/38.521 = 25.96
in all vpp, I just test denoise filter only, and the vpp speed is slower than encode.
In my opinion, h264 encoding is more complicated than vpp(just denoise), so the vpp shoud be faster than encode.
It seems that VPP in this version is inefficient, mybe soft implement?
Is there any mistake I made ?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In a completely synchronous setting, total time for your pipeline could be characterized as vpp+encode. However, Media SDK is asynchronous and many operations can happen simultaneously. In my tests (sorry, code isn't ready to distribute quite yet) an HD pipeline with denoise+encode executed in nearly the same time as encode alone.
Another thing to consider is I/O. With the encode sample, a compressed bitstream is written to disk. With the vpp sample the output is raw frames. It is a lot more data to move and the sample I/O is far from optimized. The time measured with the sample is mostly disk I/O, not denoise.
I understand that evaluating performance is a big part of figuring out if you're going to make the time to write the code. The performance story is clearer for transcode, since here all stages of the pipeline can work together as intended. We're hoping to address this example/documentation gap as soon as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, I forgot to consider the I/O performance.I'll try again
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeffrey Mcallister (Intel) wrote:
In a completely synchronous setting, total time for your pipeline could be characterized as vpp+encode. However, Media SDK is asynchronous and many operations can happen simultaneously. In my tests (sorry, code isn't ready to distribute quite yet) an HD pipeline with denoise+encode executed in nearly the same time as encode alone.
Another thing to consider is I/O. With the encode sample, a compressed bitstream is written to disk. With the vpp sample the output is raw frames. It is a lot more data to move and the sample I/O is far from optimized. The time measured with the sample is mostly disk I/O, not denoise.
I understand that evaluating performance is a big part of figuring out if you're going to make the time to write the code. The performance story is clearer for transcode, since here all stages of the pipeline can work together as intended. We're hoping to address this example/documentation gap as soon as possible.
I used the sample_encoder to test again.The sample provides the vpp resize if dstw and dsth are not same with src, and result is as follow:
./sample_encode_drm h264 -f 25 -b 2048 -w 720 -h 576 -i /home/wdg/video/beijing420p.yuv -o /home/wdg/video/intel.new.h264 -hw -u quality
libva info: VA-API version 0.34.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Destination picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Frame rate 25.00
Bit rate(Kbps) 2048
Target usage quality
Memory type system
Media SDK impl hw
Media SDK version 1.6
Processing started
Hello!
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Destination picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Frame rate 25.00
Bit rate(Kbps) 2048
Target usage quality
Memory type system
Media SDK impl hw
Media SDK version 1.6
Frame number: 3000, fps:425.65, spend:7.05s
Processing finished
./sample_encode_drm h264 -f 25 -b 2048 -w 720 -h 576 -i /home/wdg/video/beijing420p.yuv -o /home/wdg/video/intel.new.h264 -hw -u quality -dstw 640 -dsth 480
libva info: VA-API version 0.34.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Destination picture:
Resolution 640x480
Crop X,Y,W,H 0,0,640,480
Frame rate 25.00
Bit rate(Kbps) 2048
Target usage quality
Memory type system
Media SDK impl hw
Media SDK version 1.6
Processing started
Hello!
Intel(R) Media SDK Encoding Sample Version 0.0.000.0000
Input file format YUV420
Output video AVC
Source picture:
Resolution 720x576
Crop X,Y,W,H 0,0,720,576
Destination picture:
Resolution 640x480
Crop X,Y,W,H 0,0,640,480
Frame rate 25.00
Bit rate(Kbps) 2048
Target usage quality
Memory type system
Media SDK impl hw
Media SDK version 1.6
Frame number: 3000, fps:188.20, spend:15.94s
Processing finished
I hope that the result is almost same or faster(for resolution is lower than src),but it's not.
It seems that the method used in the sample is not good enough, Can you provide some detail for using vpp+encode?
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A simple thing to add which will improve performance is the -vaapi flag.
By default, the samples use system memory. System memory is best for software sessions. GPU memory (VAAPI surfaces for Linux) is best for hardware sessions. Here there is an implicit copy to the CPU between VPP and encode without -vaapi, which adds significant overhead.
The non-transcode samples all share the limitation of having I/O as the main bottleneck, which is why you might not see faster runtimes with resize as you might expect. The multi-transcode sample is better, but also is provided more as a functional than a performance demo. This is a recognized gap that we are working on.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used -vaapi flag, and it improved performance greatly.
Follow the sample, I used system memory in my project, and it's not a good idea for performance.I'll add video memory .
Thanks!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page