- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I can run openvino CPU inference on E3950, but GPU inference is not working. (ubuntu 16.04)
OPENCL supports opencl 1.2, but openvino needs 2.1.
[ INFO ] InferenceEngine:
API version ............ 2.1
Build .................. custom_releases/2019/R3_ac8584cb714a697a12f1f30b7a3b78a5b9ac5e05
Description ....... API
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/openvino/deployment_tools/demo/car_1.bmp
[ INFO ] Loading device GPU
[ ERROR ] Failed to create plugin /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNNPlugin.so for device GPU
Please, check your environment
Cannot load library '/opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNNPlugin.so': /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNN64.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference
clinfo: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms 1
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
Platform Extensions function suffix INTEL
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Gen9 HD Graphics NEO
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2 NEO
Driver Version 19.04.12237
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 18
Max clock frequency 650MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 32, Little-Endian
Global memory size 3178455040 (2.96GiB)
Error Correction support No
Max memory allocation 1589227520 (1.48GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 131072
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 99326720 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 1589227520 (1.48GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 52ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Motion Estimation accelerator version (Intel) 2
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
Is there another way to activate GPU inference?
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Just to update here, run on my E3950 the latest OpenVino release, benchmarked an SSD FP16 and no issues on GPU device.
The CPU device run much slower (as expected on this slow Atom) than -d GPU
My clinfo shows OpenCL 1.2 as in your case so this not an issue.
I believe you have an issue with multi GPU OpenCL installation - please get the Intel OpenCL so in your path and try again.
Cheers,
nikos
Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Gen9 HD Graphics NEO Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 1.2 NEO Driver Version 19.13.12717 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile FULL_PROFILE
[ INFO ] Loading Inference Engine [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 32974
Count: 3368 iterations Duration: 60080.1 ms Latency: 68.2936 ms Throughput: 56.0585 FPS
Device: GPU Metrics: AVAILABLE_DEVICES : [ ] SUPPORTED_METRICS : [ AVAILABLE_DEVICES SUPPORTED_METRICS FULL_DEVICE_NAME OPTIMIZATION_CAPABILITIES SUPPORTED_CONFIG_KEYS NUMBER_OF_WAITING_INFER_REQUESTS NUMBER_OF_EXEC_INFER_REQUESTS RANGE_FOR_ASYNC_INFER_REQUESTS RANGE_FOR_STREAMS ] FULL_DEVICE_NAME : Intel(R) Gen9 HD Graphics OPTIMIZATION_CAPABILITIES : [ FP32 BIN FP16 ] SUPPORTED_CONFIG_KEYS : [ CLDNN_INT8_ENABLED CLDNN_MEM_POOL CLDNN_PLUGIN_PRIORITY CLDNN_PLUGIN_THROTTLE DUMP_KERNELS DYN_BATCH_ENABLED EXCLUSIVE_ASYNC_REQUESTS GPU_THROUGHPUT_STREAMS PERF_COUNT TUNING_MODE ] NUMBER_OF_WAITING_INFER_REQUESTS : 0 NUMBER_OF_EXEC_INFER_REQUESTS : 0 RANGE_FOR_ASYNC_INFER_REQUESTS : { 1, 2, 1 } RANGE_FOR_STREAMS : { 1, 2 } Default values for device configuration keys: CLDNN_INT8_ENABLED : NO CLDNN_MEM_POOL : YES CLDNN_PLUGIN_PRIORITY : 0 CLDNN_PLUGIN_THROTTLE : 0 DUMP_KERNELS : NO DYN_BATCH_ENABLED : NO EXCLUSIVE_ASYNC_REQUESTS : NO GPU_THROUGHPUT_STREAMS : 1 PERF_COUNT : NO TUNING_MODE : TUNING_DISABLED
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like the issue may have to do with some CUDA 8.0 installation
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so
Do you also have an NVIDIA GPU?
How many libOpenCL.so in the system?
Cheers,
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW E3950 has an HD505 GPU which was fine when I run OpenVino on GPU a while ago.
HD505 also has Out-of-order execution Yes
so should be fine if you fix OpenCl env.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos wrote:Looks like the issue may have to do with some CUDA 8.0 installation
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so
Do you also have an NVIDIA GPU?
How many libOpenCL.so in the system?
Cheers,
nikos
Thanks for your reply.
Yes, I have NVIDA GPU also, but I think maybe the key problem is OPENCL verison on E3950 is only 1.2.
How to update OPENCL to 2.0 or 2.1?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need to try.. I have a E3950 system I can try at some point but probably will be over the weekend as I am busy on other projects.
Will update here as soon as I set it up.
Cheers,
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW clDNN
clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for Intel® HD Graphics 505
based on
https://github.com/intel/clDNN
---
Codename Skylake:
Intel® HD Graphics 510 (GT1, client market)
Intel® HD Graphics 515 (GT2, client market)
Intel® HD Graphics 520 (GT2, client market)
Intel® HD Graphics 530 (GT2, client market)
Intel® Iris® Graphics 540 (GT3e, client market)
Intel® Iris® Graphics 550 (GT3e, client market)
Intel® Iris® Pro Graphics 580 (GT4e, client market)
Intel® HD Graphics P530 (GT2, server market)
Intel® Iris® Pro Graphics P555 (GT3e, server market)
Intel® Iris® Pro Graphics P580 (GT4e, server market)
Codename Apollolake:
Intel® HD Graphics 500
Intel® HD Graphics 505
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Just to update here, run on my E3950 the latest OpenVino release, benchmarked an SSD FP16 and no issues on GPU device.
The CPU device run much slower (as expected on this slow Atom) than -d GPU
My clinfo shows OpenCL 1.2 as in your case so this not an issue.
I believe you have an issue with multi GPU OpenCL installation - please get the Intel OpenCL so in your path and try again.
Cheers,
nikos
Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Gen9 HD Graphics NEO Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 1.2 NEO Driver Version 19.13.12717 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile FULL_PROFILE
[ INFO ] Loading Inference Engine [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 32974
Count: 3368 iterations Duration: 60080.1 ms Latency: 68.2936 ms Throughput: 56.0585 FPS
Device: GPU Metrics: AVAILABLE_DEVICES : [ ] SUPPORTED_METRICS : [ AVAILABLE_DEVICES SUPPORTED_METRICS FULL_DEVICE_NAME OPTIMIZATION_CAPABILITIES SUPPORTED_CONFIG_KEYS NUMBER_OF_WAITING_INFER_REQUESTS NUMBER_OF_EXEC_INFER_REQUESTS RANGE_FOR_ASYNC_INFER_REQUESTS RANGE_FOR_STREAMS ] FULL_DEVICE_NAME : Intel(R) Gen9 HD Graphics OPTIMIZATION_CAPABILITIES : [ FP32 BIN FP16 ] SUPPORTED_CONFIG_KEYS : [ CLDNN_INT8_ENABLED CLDNN_MEM_POOL CLDNN_PLUGIN_PRIORITY CLDNN_PLUGIN_THROTTLE DUMP_KERNELS DYN_BATCH_ENABLED EXCLUSIVE_ASYNC_REQUESTS GPU_THROUGHPUT_STREAMS PERF_COUNT TUNING_MODE ] NUMBER_OF_WAITING_INFER_REQUESTS : 0 NUMBER_OF_EXEC_INFER_REQUESTS : 0 RANGE_FOR_ASYNC_INFER_REQUESTS : { 1, 2, 1 } RANGE_FOR_STREAMS : { 1, 2 } Default values for device configuration keys: CLDNN_INT8_ENABLED : NO CLDNN_MEM_POOL : YES CLDNN_PLUGIN_PRIORITY : 0 CLDNN_PLUGIN_THROTTLE : 0 DUMP_KERNELS : NO DYN_BATCH_ENABLED : NO EXCLUSIVE_ASYNC_REQUESTS : NO GPU_THROUGHPUT_STREAMS : 1 PERF_COUNT : NO TUNING_MODE : TUNING_DISABLED
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to add some more information - here is the GPU load when running E3950 HD505 GPU inference (using command intel_gpu_top)
Let us know if you need any more help yo resolve the OpenCL env issue.
render busy: 94%: ██████████████████▉ render space: 122/16384 task percent busy CS: 94%: ██████████████████▉ vert fetch: 0 (0/sec) GAM: 91%: ██████████████████▎ prim fetch: 0 (0/sec) TSG: 88%: █████████████████▋ VS invocations: 0 (0/sec) VFE: 80%: ████████████████ GS invocations: 0 (0/sec) GAFS: 10%: ██ GS prims: 0 (0/sec) TDG: 5%: █ CL invocations: 0 (0/sec) SF: 1%: ▎ CL prims: 0 (0/sec) VS: 1%: ▎ PS invocations: 0 (0/sec) URBM: 1%: ▎ PS depth pass: 0 (0/sec) SVG: 0%: VF: 0%: CL: 0%: SDE: 0%: GAFM: 0%:
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page