- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
hardware is Intel CPU E3-1285L v4 (Intel® Iris™ Pro Graphics P6300). OS is Centos 7.1 for SDK 2016 and 7.2 for SDK 2017.
I'm using it to encode in parallel 18 SD videos to h264.
With SDK 2016, the GPU load is around 70%.
After installing SDK 2017, the GPU load jumps up to 90-95%. That is 20% performance drop. The hardware and the used software are the same, just the SDK version is different. Have someone else with such issues?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Image attached / below is some perf data from our internal regression tool on 1285 V4's (0x162a), 3.0 / 1.1 GHz parts. The S-curve here (sorted ratio of performance) is between MSS 2017 (CentOS 7.2 - numerator) and 2016 (CentOS7.1 - denominator) running with frequency and power defaults. All workloads below are N:N transcode model. Within measurement noise most workloads are faster on 2017. SD should be the same performance. We haven't seen any cases of a 20% reduction in performance.
So a few questions ...
1) you mention utilization; was performance impacted?
2) Did you collect GPU utilization stats using metrics monitor (which improved absolute results considerably between the two releases) or vTune?
3) can you say more about your workload, e.g. is it progressive to progressive; AVC to AVC, graphcs to graphics memory; any use of VPP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Every input source (Currently 6 sources) is encoded four times to H.264 High profile with "balanced" target usage in different resolutions: 712x576,640x512,480x384,360x288.
In this scenario I have total of 24 encodings in parallel.
> 1) you mention utilization; was performance impacted?
Yes. If I start to encode another source, then the Render usage hits 100% and everything just stops to encode properly.
> 2) Did you collect GPU utilization stats using metrics monitor (which improved absolute results considerably between the two releases) or vTune?
I measure GPU usage with two tools. "intel_gpu_top" and "metrics_monitor" from the SDK. Here are the results with the same hardware and user software:
SDK 2016 intel_gpu_top:
render busy: 69%: █████████████▉ render space: 1548/131072
task percent busy
CS: 68%: █████████████▋ vert fetch: 0 (0/sec)
GAM: 68%: █████████████▋ prim fetch: 0 (0/sec)
TSG: 66%: █████████████▎ VS invocations: 0 (0/sec)
VFE: 35%: ███████ GS invocations: 0 (0/sec)
TDG: 0%: GS prims: 0 (0/sec)
RS: 0%: CL invocations: 0 (0/sec)
VF: 0%: CL prims: 0 (0/sec)
SVG: 0%: PS invocations: 0 (0/sec)
GAFM: 0%: PS depth pass: 0 (0/sec)
SOL: 0%:
CL: 0%:
VS: 0%:
SF: 0%:
GAFS: 0%:
DS: 0%:
HS: 0%:
SDK 2016 metrics_monitor:
RENDER usage: 70.00, VIDEO usage: 70.00, VIDEO_E usage: 0.00 VIDEO2 usage: 65.00
RENDER usage: 72.00, VIDEO usage: 69.00, VIDEO_E usage: 0.00 VIDEO2 usage: 64.00
RENDER usage: 70.00, VIDEO usage: 64.00, VIDEO_E usage: 0.00 VIDEO2 usage: 63.00
RENDER usage: 71.00, VIDEO usage: 62.00, VIDEO_E usage: 0.00 VIDEO2 usage: 63.00
RENDER usage: 68.00, VIDEO usage: 65.00, VIDEO_E usage: 0.00 VIDEO2 usage: 61.00
SDK 2017 intel_gpu_top
render busy: 80%: ████████████████ render space: 56/16384
task percent busy
CS: 80%: ████████████████ vert fetch: 0 (0/sec)
TSG: 75%: ███████████████ prim fetch: 0 (0/sec)
GAM: 49%: █████████▉ VS invocations: 0 (0/sec)
VFE: 39%: ███████▉ GS invocations: 0 (0/sec)
TDG: 0%: GS prims: 0 (0/sec)
RS: 0%: CL invocations: 0 (0/sec)
VF: 0%: CL prims: 0 (0/sec)
SVG: 0%: PS invocations: 0 (0/sec)
SF: 0%: PS depth pass: 0 (0/sec)
GAFS: 0%:
GAFM: 0%:
SOL: 0%:
DS: 0%:
VS: 0%:
GS: 0%:
SDK 2017 metrics_monitor:
RENDER usage: 84.00, VIDEO usage: 7.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1150.00
RENDER usage: 83.00, VIDEO usage: 9.00, VIDEO_E usage: 0.00 VIDEO2 usage: 3.00 GT Freq: 1150.00
RENDER usage: 86.00, VIDEO usage: 4.00, VIDEO_E usage: 0.00 VIDEO2 usage: 4.00 GT Freq: 1150.00
RENDER usage: 85.00, VIDEO usage: 6.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1150.00
RENDER usage: 84.00, VIDEO usage: 4.00, VIDEO_E usage: 0.00 VIDEO2 usage: 3.00 GT Freq: 1150.00
> 3) can you say more about your workload, e.g. is it progressive to progressive; AVC to AVC, graphcs to graphics memory; any use of VPP
Input is mpeg2 (as shown bellow). I've tried with FFMPEG ver 2.8, 3.05, 3.1.6 and 3.2.2, but the results are the same. In their code I cannot see any Video pre-processing (VPP) usage.
Input video as diagnosed by Mediainfo tool is:
Video ID : 256 (0x100) Menu ID : 1 (0x1) Format : MPEG Video Format version : Version 2 Format profile : Main@Main Format settings, BVOP : Yes Format settings, Matrix : Custom Format settings, GOP : M=2, N=12 Format settings, picture structure : Frame Codec ID : 2 Duration : 10 s 200 ms Bit rate mode : Constant Bit rate : 5 623 kb/s Maximum bit rate : 5 467 kb/s Width : 720 pixels Height : 576 pixels Display aspect ratio : 16:9 Frame rate : 25.000 FPS Standard : PAL Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Interlaced Scan order : Top Field First Compression mode : Lossy Bits/(Pixel*Frame) : 0.542 Time code of first frame : 09:49:11:03 Time code source : Group of pictures header GOP, Open/Closed : Open Stream size : 6.84 MiB (92%)
BRS/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Which application do you use? That's your internal application or you collect the data with sample_multi_transcode? If that was your application, have you tried to replicate with sample_multi_transcode? any difference?
At least one of the sources you are using is interlaced. What is output: interlaced or progressive? In case you have progressive streams on the output: how many VPP deinterlace components are there in the pipeline? Considering your scenario it should be reasonable to:
- Have single VPP deinterlace component right after decoder
- Have a split of the data flow after the VPP deinterlace component which will feed 6 VPP scale components
Also, do you have SW memory for any reason between components somewhere in the pipeline?
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would also like to give a comment on the GPU engine usage data between releases:
RENDER usage: 71.00, VIDEO usage: 62.00, VIDEO_E usage: 0.00 VIDEO2 usage: 63.00 # for 2016 RENDER usage: 84.00, VIDEO usage: 7.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1150.00 # for 2017
There are 2 notable things:
- Increase in GPGPU (Render) usage from ~70% to 85-90%
- Decrease in VDBOX 1 and 2 usages (Video and Video2) from ~65% to ~5-10%
The reason of the VDBOX usage drop is completely different GPU tasks scheduling scheme introduced in MSS 2017 which is capable to manage inter dependencies (kernel mode driver level change). So, the reason of ~65% VDBOX 1 and 2 utilization in MSS 2016 was the fact that VDBOX-es were stalled waiting for the dependencies (executed on GPGPU) to be resolved. In MSS 2017 this was reworked and now you see low VDBOX engines utilization meaning that engines are capable to execute something else.
Being said that I have one more question: when you compared MSS 2016 and MSS 2017, have you configured pipelines to produce data with the fixed output rates or they were permitted to transcode as fast as possible? If the latter is true, can you, please, provide elapsed times and CPU% data comparing MSS 2016 and MSS 2017?
Dmitry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
SDK 2016 "top" output:
top - 10:35:37 up 184 days, 20:52, 1 user, load average: 4.33, 3.90, 3.97 Tasks: 185 total, 2 running, 183 sleeping, 0 stopped, 0 zombie %Cpu(s): 33.9 us, 4.2 sy, 0.0 ni, 61.6 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 16351768 total, 11743684 free, 1470532 used, 3137552 buff/cache KiB Swap: 1564668 total, 1564668 free, 0 used. 12257720 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20203 root 20 0 1420144 251660 42168 R 57.7 1.5 2363:14 test 327 root 20 0 1418096 248568 42472 S 54.7 1.5 8203:06 test 1084 root 20 0 1418288 257196 43956 S 51.3 1.6 6457:13 test 329 root 20 0 1417328 249608 42812 S 51.0 1.5 6923:24 test 19963 root 20 0 1286924 206528 33872 S 45.3 1.3 1937:12 test 1372 root 20 0 1284108 201040 34608 S 43.0 1.2 5008:25 test
SDK 2017 "top" output:
top - 10:37:53 up 6 days, 17:14, 6 users, load average: 3.31, 3.55, 3.48 Tasks: 220 total, 4 running, 216 sleeping, 0 stopped, 0 zombie %Cpu(s): 31.1 us, 3.2 sy, 0.0 ni, 65.5 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 16185324 total, 3822944 free, 1470464 used, 10891916 buff/cache KiB Swap: 3129340 total, 3129340 free, 0 used. 12694764 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26273 root 20 0 2090420 149288 21432 R 47.5 0.9 16:57.85 test 26009 root 20 0 2089788 148748 21440 R 45.8 0.9 16:59.13 test 25899 root 20 0 2089788 150452 21400 S 45.5 0.9 16:28.78 test 25890 root 20 0 1892292 219528 21552 S 39.2 1.4 13:37.96 test 25896 root 20 0 1712796 132356 18052 R 38.2 0.8 13:25.99 test 25885 root 20 0 1847880 217068 21360 S 34.6 1.3 12:04.93 test
The output is interlaced as the output. The speed is limited per resolution.
As I've mentioned earlier, I'm using ffmpeg for the tests. Here is the example ffmpeg pipeline:
ffmpeg -loglevel info -re -i 'udp://@239.238.1.0:7000?localaddr=172.18.0.9&fifo_size=100000&timeout=10&overrun_nonfatal=1' -filter_complex [0:v]setdar=ratio=16/9:max=1000,split=3[out1][out2][out3] -map [out1] -vcodec h264_qsv -profile:v high -preset medium -s 640x512 -b:v 1700k -minrate 1500k -maxrate 1900k -bufsize:v 2.8M -pix_fmt nv12 -g 25 -flags +cgop+ilme -map 0:a:0 -c:a:0 mp2 -b:a:0 192000 -map 0:a:0 -c:a:1 aac -b:a:1 192000 -flush_packets 0 -f mpegts -mpegts_flags pat_pmt_at_frames -mpegts_flags resend_headers 'udp://239.204.1.2:7000?localaddr=10.0.8.36&pkt_size=1316&buffer_size=65536' -map [out2] -vcodec h264_qsv -profile:v high -preset medium -s 480x384 -b:v 900k -minrate 800k -maxrate 1100k -bufsize:v 2.8M -pix_fmt nv12 -g 25 -flags +cgop+ilme -map 0:a:0 -c:a:0 mp2 -b:a:0 192000 -map 0:a:0 -c:a:1 aac -b:a:1 192000 -flush_packets 0 -f mpegts -mpegts_flags pat_pmt_at_frames -mpegts_flags resend_headers 'udp://239.204.1.3:7000?localaddr=10.0.8.36&pkt_size=1316&buffer_size=65536' -map [out3] -vcodec h264_qsv -profile:v high -preset medium -s 720x576 -b:v 2560k -minrate 1024k -maxrate 3072k -bufsize:v 2.8M -pix_fmt nv12 -g 25 -flags +cgop+ilme -map 0:a:0 -c:a:0 mp2 -b:a:0 192000 -map 0:a:0 -c:a:1 aac -b:a:1 192000 -flush_packets 0 -f mpegts 'udp://239.204.1.1:7000?localaddr=10.0.8.36&pkt_size=1316&buffer_size=65536'
Unfortunately I'm unable to use "multi_transcode", because some errors occur:
[root@transcoder-1 x64]# ./sample_multi_transcode -i::mpeg2 -i::../content/test_stream.mpeg2 -o::h264 out.h264 Multi Transcoding Sample Version 7.0.16053497 libva info: VA-API version 0.99.0 libva info: va_getDriverName() returns 0 libva info: User requested driver 'iHD' libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so libva info: Found init function __vaDriverInit_0_32 libva info: va_openDriver() returns 0 Return on error: error code -2, /home/lab_msdk/buildAgentDir/buildAgent_MediaSDK4/git/mdp_msdk-samples/samples/sample_multi_transcode/src/pipeline_transcode.cpp 3372 Return on error: error code -2, /home/lab_msdk/buildAgentDir/buildAgent_MediaSDK4/git/mdp_msdk-samples/samples/sample_multi_transcode/src/sample_multi_transcode.cpp 277
What you mean with "SW memory"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
>> Unfortunately I'm unable to use "multi_transcode", because some errors occur
The correct command line would be:
./sample_multi_transcode -i::mpeg2 ../content/test_stream.mpeg2 -o::h264 out.h264 -hw
If you want to replicate your experiment with ffmpeg, then you need to use parfile instead of command line arguments. Something like that:
$ ./sample_multi_transcode -par ffmpeg-1n.par $ cat ffmpeg-1n.par -i::mpeg2 ../content/test_stream.mpeg2 -o::sink -hw -async 1 -i::source -o::h264 out_640x512.264 -w 640 -h 512 -hw -async 1 -i::source -o::h264 out_480x384.264 -w 480 -h 384 -hw -async 1 -i::source -o::h264 out_720x576.264 -w 720 -h 576 -hw -async 1
And you will probably want to align other parameters like gop structure, bitrates, etc. Please, refer to sample_multi_transcode help (-? option) and sample manual for the list of the supported encoding options. Let me know if you will encounter any problems.
>> As I've mentioned earlier, I'm using ffmpeg for the tests. Here is the example ffmpeg pipeline:
Unfortunately, I don't have experience with ffmpeg support myself. So, I can't say right away whether behavior you observe is related with some mediasdk/ffmpeg integration specifics or not. I will try to find someone here @ Intel who worked on ffmpeg integration. In a meanwhile it would be helpful if you will check and confirm whether you see the problem on mediasdk sample application - that would greatly narrow down the root cause.
Dmitry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
thank you for the help:), So:
SDK 2017 result:
RENDER usage: 14.00, VIDEO usage: 2.00, VIDEO_E usage: 0.00 VIDEO2 usage: 2.00 GT Freq: 1150.00 RENDER usage: 12.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1150.00 RENDER usage: 14.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 2.00 GT Freq: 300.00 RENDER usage: 13.00, VIDEO usage: 2.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00 RENDER usage: 15.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 2.00 GT Freq: 1100.00 RENDER usage: 13.00, VIDEO usage: 2.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1150.00 RENDER usage: 15.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00 RENDER usage: 13.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1100.00 RENDER usage: 14.00, VIDEO usage: 2.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00 RENDER usage: 15.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00
SDK 2016 result:
RENDER usage: 12.00, VIDEO usage: 12.00, VIDEO_E usage: 0.00 VIDEO2 usage: 5.00 RENDER usage: 12.00, VIDEO usage: 10.00, VIDEO_E usage: 0.00 VIDEO2 usage: 8.00 RENDER usage: 14.00, VIDEO usage: 12.00, VIDEO_E usage: 0.00 VIDEO2 usage: 7.00 RENDER usage: 15.00, VIDEO usage: 10.00, VIDEO_E usage: 0.00 VIDEO2 usage: 9.00 RENDER usage: 15.00, VIDEO usage: 14.00, VIDEO_E usage: 0.00 VIDEO2 usage: 9.00 RENDER usage: 13.00, VIDEO usage: 13.00, VIDEO_E usage: 0.00 VIDEO2 usage: 11.00 RENDER usage: 14.00, VIDEO usage: 12.00, VIDEO_E usage: 0.00 VIDEO2 usage: 7.00 RENDER usage: 15.00, VIDEO usage: 11.00, VIDEO_E usage: 0.00 VIDEO2 usage: 10.00 RENDER usage: 13.00, VIDEO usage: 10.00, VIDEO_E usage: 0.00 VIDEO2 usage: 9.00 RENDER usage: 12.00, VIDEO usage: 10.00, VIDEO_E usage: 0.00 VIDEO2 usage: 10.00
Used par file is:
-i::mpeg2 /root/tests/raw.mpeg2 -o::sink -fps 25 -hw -async 1 -i::source -o::h264 out_640x512.264 -w 640 -h 512 -gop_size 25 -b 1700 -hw -async 1 -i::source -o::h264 out_480x384.264 -w 480 -h 384 -gop_size 25 -b 900 -hw -async 1 -i::source -o::h264 out_720x576.264 -w 720 -h 576 -gop_size 25 -b 2500 -hw -async 1
With the ffmpeg pipeline from my previous post (over the same test input file) SDK 1017 (Only one instance):
RENDER usage: 12.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 3.00 GT Freq: 1150.00 RENDER usage: 11.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 2.00 GT Freq: 1100.00 RENDER usage: 10.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00 RENDER usage: 9.00, VIDEO usage: 1.00, VIDEO_E usage: 0.00 VIDEO2 usage: 3.00 GT Freq: 1150.00 RENDER usage: 10.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1100.00 RENDER usage: 10.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1100.00 RENDER usage: 13.00, VIDEO usage: 2.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00 RENDER usage: 8.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00 RENDER usage: 9.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 1.00 GT Freq: 1150.00 RENDER usage: 11.00, VIDEO usage: 0.00, VIDEO_E usage: 0.00 VIDEO2 usage: 0.00 GT Freq: 1150.00
Hm....
What is the relation between "GT Freq" and RENDER? I've noticed, that higher Frequency value means lower Render usage % per instance (sometimes).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I found something. Every time, the load of 2017 is significaly higher from the 2016 load, some errors occur on start. Here is example core dump:
(gdb) where full #0 0x00007fdb0a6bf976 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #1 0x00007fdb0a6c7b3c in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #2 0x00007fdb0a6c31b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #3 0x00007fdb0a6c71ab in _dl_open () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #4 0x00007fdb0a4b102b in dlopen_doit () from /lib64/libdl.so.2 No symbol table info available. #5 0x00007fdb0a6c31b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #6 0x00007fdb0a4b162d in _dlerror_run () from /lib64/libdl.so.2 No symbol table info available. #7 0x00007fdb0a4b10c1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2 No symbol table info available. #8 0x00007fdac14f84b0 in ?? () from /opt/intel/mediasdk/lib64/iHD_drv_video.so No symbol table info available. #9 0x00007fdac14cf8ce in ?? () from /opt/intel/mediasdk/lib64/iHD_drv_video.so No symbol table info available. #10 0x00007fdb03d8a168 in va_openDriver () from /lib64/libva.so.1 No symbol table info available. #11 0x00007fdb03d8b048 in vaInitialize () from /lib64/libva.so.1
Or more frequently just a console message:
libva info: VA-API version 0.99.0 libva info: va_getDriverName() returns 0 libva info: User requested driver 'iHD' libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so libva error: dlopen of /opt/intel/mediasdk/lib64/iHD_drv_video.so failed: /opt/intel/mediasdk/lib64/iHD_drv_video.so: undefined symbol: clock_getres, version GLIBC_2.2.5 libva info: va_openDriver() returns -1 libva info: VA-API version 0.99.0 libva info: va_getDriverName() returns 0 libva info: User requested driver 'iHD' libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so libva error: dlopen of /opt/intel/mediasdk/lib64/iHD_drv_video.so failed: /opt/intel/mediasdk/lib64/iHD_drv_video.so: undefined symbol: clock_getres, version GLIBC_2.2.5 libva info: va_openDriver() returns -1
Maybe this is the root cause of my problem.
When no SIGSEGV signal is catched or no "dlopen" error occur, then the load is almost same for SD channels of both SDK versions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Let me try to answer some of your questions.
>> What is the relation between "GT Freq" and RENDER?
GT Freq stands for GPU frequency. RENDER, VIDEO, VIDEO2, VIDEO_E are parts of GPU called GPU engines dedicated to support specific functionality (also the can be referred as GPGPU, VDBOX-1, VDBOX-2, VEBOX). The fact that GPU frequency fluctuates may mean that either you did not pin it to some specific value (recommended during benchmark process for the stable results) or that you meat throttling. Mind that CPU and GPU may not be able to work being both in the turbo frequency ranges. This is highly dependent on the particular tasks you send to GPU. Thus, you need to consider your workload and define best strategy to negotiate between CPU and GPU frequencies. One of the strategies which can be considered for the GPU-bound server workloads is to disable CPU Turbo boost in BIOS.
>> When no SIGSEGV signal is catched or no "dlopen" error occur, then the load is almost same for SD channels of both SDK versions.
The error in dlopen "undefined symbol: clock_getres, version GLIBC_2.2.5" sounds as a environment or installation problem. With such an error driver did not load and I hardly can imagine how any load to GPU can be possible. Maybe you have few HW components in the pipeline and for some of them error happens for some - not. And that's possible that on the error ffmpeg simply falls to the software mode only partly utilizing GPU. Could you, please, pay attention on how you run your workloads. I mean, pay attention 1) under which user you run, 2) which environment variables you have setup, when the workload failed and when it succeeded. I would guess something like: you run is good under root, but it fails under non-privileged user.
>> What you mean with "SW memory"?
I meant system memory. That's why you may have inefficiencies in the pipeline. If you have 2 components one working on CPU another on GPU, they will need to exchange memory. There will be copy operation between system memory and video memory. That's expensive. In the light of your undefined symbol error that can be the reason: for example, if in MSS-2016 both components worked on GPU and in MSS-2017 decoder failed to initialize and felt back to CPU, then we will still see GPU load, but we may meet inefficiency as well.
Dmitry.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page