- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am checking out the GPU (HD505) encoding performance of a E3950 board that runs a custom Linux image:
- kernel 4.14 (with i915 patches)
- Media driver (master branch, HEAD)
- Media sdk (master branch, HEAD)
both are build from source (open source github repository).
I am using the 'sample encode' tool to encode 100 YUV frames (4096x2160) into hevc format.
And, compared to another board that ships an HD530 GPU, it takes 4 times longer to encode the file. (~25s vs ~6s)
Profiling the system with the help of vtune , I saw that most of the time, the 'sample_encode' tool was waiting for the GPU.
To make sure I did not forget anything while building the media stack on the image,
does those results seems ok on your side ?
I mean, If I was expecting to lose performance on the E3950 board, this difference sounds excessive to me .
- Tags:
- Development Tools
- Graphics
- Intel® Media SDK
- Intel® Media Server Studio
- Media Processing
- Optimization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi TurtleCrazy,
No, sample_encode should not be that slow. It looks like you have an installation issue.
Yes, E3950 is Apollo Lake, its graphic core should have hardware video codec and it is supports through Media SDK for Embedded Linux release, you can get the release package from the following site:
https://software.intel.com/en-us/media-sdk
But this release requires the old kernel, it should be Kernel 4.1 as I remembered(you can check release notes to confirm), if you want to use the latest Kernel, you can build our open source stack by following the article:
https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
Mark Liu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Liu, Mark (Intel) wrote:
Hi TurtleCrazy,
No, sample_encode should not be that slow. It looks like you have an installation issue.
Ok. Regarding the fact that the GPU/CPU concurrency is 89% GPU, what ' s that I may have missed ?
Vtune drivers are installed, let me now which entries in profiling reports may be relevant in my case.
Yes, E3950 is Apollo Lake, its graphic core should have hardware video codec and it is supports through Media SDK for Embedded Linux release, you can get the release package from the following site:
https://software.intel.com/en-us/media-sdk
But this release requires the old kernel, it should be Kernel 4.1 as I remembered(you can check release notes to confirm),
Not only the kernel is outdated, so are yocto bsps. I installed and checked this release once, to have a quick look, and its sounds it s faster than the opensource stack, but not so much (~16s vs 20s for the same test).
if you want to use the latest Kernel, you can build our open source stack by following the article:
https://software.intel.com/en-us/articles/build-and-debug-open-source-media-stack
Mark Liu
That 's what I did, see my previous message. The open source media stack seems running correctly.
Kernel Linux 4.14.35, (with Intel patches for Yocto) with the following patches for i915:
- https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=39ccc9852e2b46964c9c44eba52db57413ba6d27
- https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=b68763741aa29f2541c7ca58bcb0c2bb6cb5f449
From the intel repo on github
- Libva 2.1, https://github.com/intel/libva, master branch, rev:b3be72a5a110880f70626d7c3bed953cdde124b2
- Media driver: https://github.com/intel/media-driver, Master branch, rev: b3be72a5a110880f70626d7c3bed953cdde124b2
- GmmLib, https://github.com/intel/gmmlib, Master branch: rev: b3be72a5a110880f70626d7c3bed953cdde124b2
- MSDK, https://github.com/Intel-Media-SDK/MediaSDK, Master branch, rev:b3be72a5a110880f70626d7c3bed953cdde124b2
Output:
Linux 4.14.35-intel-pk-standard #1 SMP PREEMPT Thu May 31 15:13:26 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux e3950:~# vainfo error: can't connect to X server! libva info: VA-API version 1.1.0 libva info: va_getDriverName() returns 0 libva info: User requested driver 'iHD' libva info: Trying to open /usr/lib/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_1 libva info: va_openDriver() returns 0 vainfo: VA-API version: 1.1 (libva 2.1.1.pre1) vainfo: Driver version: Intel iHD driver - 2.0.0 vainfo: Supported profile and entrypoints VAProfileNone : VAEntrypointVideoProc VAProfileNone : VAEntrypointStats VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSlice VAProfileH264Main : VAEntrypointFEI VAProfileH264Main : VAEntrypointEncSliceLP VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSlice VAProfileH264High : VAEntrypointFEI VAProfileH264High : VAEntrypointEncSliceLP VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileJPEGBaseline : VAEntrypointVLD VAProfileJPEGBaseline : VAEntrypointEncPicture VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice VAProfileH264ConstrainedBaseline: VAEntrypointFEI VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP VAProfileVP8Version0_3 : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointEncSlice VAProfileHEVCMain : VAEntrypointFEI VAProfileHEVCMain10 : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD e3950:~# lsmod Module Size Used by intel_rapl 20480 0 pwm_lpss_pci 16384 0 x86_pkg_temp_thermal 16384 0 coretemp 16384 0 pwm_lpss 16384 1 pwm_lpss_pci igb 172032 0 spi_pxa2xx_platform 24576 0 i915 1351680 1 mei_me 28672 0 mei 61440 1 mei_me uio 16384 0 e3950:~# dmes sg | grep l i915 0.000000] Kernel command line: BOOT_IMAGE=/bzImage root=PARTUUID=a1e761f4-3445-4474-b25f-213365295c23 rootwait rootfstype=ext4 console=ttyS0,115200 console=tty0 i915.modeset=1 i915.fastboot=1 nopti nokaslr nospectre_v2 spectre_v2=off [ 3.913038] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 3.929474] [drm] Finished loading DMC firmware i915/bxt_dmc_ver1_07.bin (v1.7) [ 5.059890] [drm] Initialized i915 1.6.0 20170818 for 0000:00:02.0 on minor 0 [ 5.475086] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device e3950:~# time /opt/intel/media/sdk/samples/sample_encode h265 -i datatest/v.yuv -o t.h265 -w 4096 -h2160 -i datatest/v.yuv -o t.h265 -w 4096 -h[1@ plugin_loader.h :186 [INFO] Plugin was loaded from GUID: { 0x6f, 0xad, 0xc7, 0x91, 0xa0, 0xc2, 0xeb, 0x47, 0x9a, 0xb6, 0xdc, 0xd5, 0xea, 0x9d, 0xa3, 0x47 } (Intel (R) Media SDK HW plugin for HEVC ENCODE) libva info: VA-API version 1.1.0 libva info: va_getDriverName() returns 0 libva info: User requested driver 'iHD' libva info: Trying to open /usr/lib/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_1 libva info: va_openDriver() returns 0 Encoding Sample Version 8.3.26. Input file format YUV420 Output video HEVC Source picture: Resolution 4096x2160 Crop X,Y,W,H 0,0,4096,2160 Destination picture: Resolution 4096x2160 Crop X,Y,W,H 0,0,4096,2160 Frame rate 30.00 Bit rate(Kbps) 60928 Gop size 0 Ref dist 0 Ref number 0 Idr Interval 0 Target usage balanced Memory type system Media SDK impl hw Media SDK version 1.26 Processing started Frame number: 1 Frame number: 100 Frame number: 100 plugin_loader.h :212 [INFO] MFXBaseUSER_UnLoad(session=0x0x6c3b70), sts=0 Processing finished real 0m18.367s user 0m3.096s sys 0m1.869s
See the difference between real and (user+sys) delays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To complete, Now some additional features into the i915, driver, like HuC and GuC, are enabled:
options i915 enable_fbc=1 enable_guc_loading=2 enable_guc_submission=2 enable_rc6=3
As a result, from /sys/kernel/debug/dri/0/:
i915_guc_load_status: GuC firmware status: path: i915/bxt_guc_ver8_7.bin fetch: SUCCESS load: SUCCESS version wanted: 8.7 version found: 8.7 header: offset is 0; size = 128 uCode: offset is 128; size = 140544 RSA: offset is 140672; size = 256 GuC status 0x800330ed: Bootrom status = 0x76 uKernel status = 0x30 MIA Core status = 0x3 ... i915_huc_load_status: HuC firmware status: path: i915/bxt_huc_ver01_07_1398.bin fetch: SUCCESS load: SUCCESS version wanted: 1.7 version found: 1.7 header: offset is 0; size = 128 uCode: offset is 128; size = 154048 RSA: offset is 154176; size = 256 HuC status 0x00006080:
But this doesn't make any difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did the calculation based on your log,
So you have 4096x2160 YUV420 for 100 frame in 25 second, which is about 4FPS.
This should be closed to what we saw with our testing.
You can improve it by tuning some of the parameters, I suggest you try following argument in the command line:
sample_encode h265 -hw -i datatest/v.yuv -o t.h265 -h 2160 -w 4096 -vaapi -u speed -async 3
Let me know how much it can improve?
Mark Liu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Liu, Mark (Intel) wrote:
I did the calculation based on your log,
So you have 4096x2160 YUV420 for 100 frame in 25 second, which is about 4FPS.
This should be closed to what we saw with our testing.
Ok, thanks. There is the main idea behind the question. I want to make sure that the distribution,and especially the encoding stack I built for this board, runs correctly. I was questioning about this because of the great gap between this board and the other one.
You can improve it by tuning some of the parameters, I suggest you try following argument in the command line:
sample_encode h265 -hw -i datatest/v.yuv -o t.h265 -h 2160 -w 4096 -vaapi -u speed -async 3
Let me know how much it can improve?
Thanks, I do not access to the board for now, I ll check this later on.
A single comment, what 's the "-vaapi" options stands for ?
BTW, now, I ll release the board to encoding to experts that will be happy to tweak the process according to their needs. :)
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These are the tweaks which targeting on the improvement:
-vaapi: specified video memory instead of system memory during encoding, this avoid the extra copying
-u: target usage set to to speed(7) which might decrease some of the quality but we get speed up.
-async: the whole MSDK API is an async model, it will keep sending the video frame to the hardware without block but it requires a sync operation. this argument specified how many frame can be in the queue before the sync operation. 3 or 4 would be optimal, too many will not improve.
Mark Liu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Liu, Mark (Intel) wrote:
You can improve it by tuning some of the parameters, I suggest you try following argument in the command line:
sample_encode h265 -hw -i datatest/v.yuv -o t.h265 -h 2160 -w 4096 -vaapi -u speed -async 3
Let me know how much it can improve?
Mark Liu
Except for the "u" flag, which may lower quality , I does not make significant differences, regarding encoding duration. Furthermore, turning on the "-vaapi" flag may increase the elapsed time. actually, the process "user" time was increased a lot in this case.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page