Solved: Looks like the issue may have

Yannis__James · ‎01-07-2020

Hello,

I can run openvino CPU inference on E3950, but GPU inference is not working. (ubuntu 16.04)

OPENCL supports opencl 1.2, but openvino needs 2.1.

[ INFO ] InferenceEngine:
   API version ............ 2.1
   Build .................. custom_releases/2019/R3_ac8584cb714a697a12f1f30b7a3b78a5b9ac5e05
   Description ....... API
[ INFO ] Files were added: 1
[ INFO ]     /opt/intel/openvino/deployment_tools/demo/car_1.bmp
[ INFO ] Loading device GPU
[ ERROR ] Failed to create plugin /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNNPlugin.so for device GPU
Please, check your environment
Cannot load library '/opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNNPlugin.so': /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNN64.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

clinfo: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms                               1
Platform Name                                   Intel(R) OpenCL HD Graphics
Platform Vendor                                 Intel(R) Corporation
Platform Version                                OpenCL 1.2
Platform Profile                                FULL_PROFILE
Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
Platform Extensions function suffix             INTEL

Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
Device Name                                     Intel(R) Gen9 HD Graphics NEO
Device Vendor                                   Intel(R) Corporation
Device Vendor ID                                0x8086
Device Version                                  OpenCL 1.2 NEO
Driver Version                                  19.04.12237
Device OpenCL C Version                         OpenCL C 1.2
Device Type                                     GPU
Device Profile                                  FULL_PROFILE
Max compute units                               18
Max clock frequency                             650MHz
Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
Max work item dimensions                        3
Max work item sizes                             256x256x256
Max work group size                             256
Preferred work group size multiple              32
Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations No
Address bits                                    32, Little-Endian
Global memory size                              3178455040 (2.96GiB)
Error Correction support                        No
Max memory allocation                           1589227520 (1.48GiB)
Unified memory for Host and Device              Yes
Minimum alignment for any data type             128 bytes
Alignment of base address                       1024 bits (128 bytes)
Global Memory cache type                        Read/Write
Global Memory cache size                        131072
Global Memory cache line                        64 bytes
Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            99326720 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
Local memory type                               Local
Local memory size                               65536 (64KiB)
Max constant buffer size                        1589227520 (1.48GiB)
Max number of constant args                     8
Max size of kernel argument                     1024
Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
Prefer user sync for interop                    Yes
Profiling timer resolution                      52ns
Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    SPIR versions                                 1.2
printf() buffer size                            4194304 (4MiB)
Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Motion Estimation accelerator version   (Intel)   2
Device Available                                Yes
Compiler Available                              Yes
Linker Available                                Yes
Device Extensions                               cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
clCreateContext(NULL, ...) [default]            No platform
clCreateContext(NULL, ...) [other]              Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform

Is there another way to activate GPU inference?

thanks.

nikos1 · ‎01-11-2020

Hi, Just to update here, run on my E3950 the latest OpenVino release, benchmarked an SSD FP16 and no issues on GPU device.

The CPU device run much slower (as expected on this slow Atom) than -d GPU

My clinfo shows OpenCL 1.2 as in your case so this not an issue.

I believe you have an issue with multi GPU OpenCL installation - please get the Intel OpenCL so in your path and try again.

Cheers,

nikos

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 NEO 
  Driver Version                                  19.13.12717
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE

[ INFO ] Loading Inference Engine [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 32974

Count:      3368 iterations
Duration:   60080.1 ms
Latency:    68.2936 ms
Throughput: 56.0585 FPS

	Device: GPU
	Metrics: 
		AVAILABLE_DEVICES : [  ]
		SUPPORTED_METRICS : [ AVAILABLE_DEVICES SUPPORTED_METRICS FULL_DEVICE_NAME OPTIMIZATION_CAPABILITIES SUPPORTED_CONFIG_KEYS NUMBER_OF_WAITING_INFER_REQUESTS NUMBER_OF_EXEC_INFER_REQUESTS RANGE_FOR_ASYNC_INFER_REQUESTS RANGE_FOR_STREAMS ]
		FULL_DEVICE_NAME : Intel(R) Gen9 HD Graphics
		OPTIMIZATION_CAPABILITIES : [ FP32 BIN FP16 ]
		SUPPORTED_CONFIG_KEYS : [ CLDNN_INT8_ENABLED CLDNN_MEM_POOL CLDNN_PLUGIN_PRIORITY CLDNN_PLUGIN_THROTTLE DUMP_KERNELS DYN_BATCH_ENABLED EXCLUSIVE_ASYNC_REQUESTS GPU_THROUGHPUT_STREAMS PERF_COUNT TUNING_MODE ]
		NUMBER_OF_WAITING_INFER_REQUESTS : 0
		NUMBER_OF_EXEC_INFER_REQUESTS : 0
		RANGE_FOR_ASYNC_INFER_REQUESTS : { 1, 2, 1 }
		RANGE_FOR_STREAMS : { 1, 2 }
	Default values for device configuration keys: 
		CLDNN_INT8_ENABLED : NO
		CLDNN_MEM_POOL : YES
		CLDNN_PLUGIN_PRIORITY : 0
		CLDNN_PLUGIN_THROTTLE : 0
		DUMP_KERNELS : NO
		DYN_BATCH_ENABLED : NO
		EXCLUSIVE_ASYNC_REQUESTS : NO
		GPU_THROUGHPUT_STREAMS : 1
		PERF_COUNT : NO
		TUNING_MODE : TUNING_DISABLED

View solution in original post

nikos1 · ‎01-08-2020

Looks like the issue may have to do with some CUDA 8.0 installation

/usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so

Do you also have an NVIDIA GPU?

How many libOpenCL.so in the system?

Cheers,

nikos

nikos1 · ‎01-08-2020

FWIW E3950 has an HD505 GPU which was fine when I run OpenVino on GPU a while ago.

HD505 also has Out-of-order execution Yes

so should be fine if you fix OpenCl env.

Yannis__James · ‎01-08-2020

nikos wrote:
Looks like the issue may have to do with some CUDA 8.0 installation
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so
Do you also have an NVIDIA GPU?
How many libOpenCL.so in the system?
Cheers,
nikos

Thanks for your reply.

Yes, I have NVIDA GPU also, but I think maybe the key problem is OPENCL verison on E3950 is only 1.2.

How to update OPENCL to 2.0 or 2.1?

nikos1 · ‎01-08-2020

Need to try.. I have a E3950 system I can try at some point but probably will be over the weekend as I am busy on other projects.

Will update here as soon as I set it up.

Cheers,

nikos

nikos1 · ‎01-08-2020

FWIW clDNN

clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for Intel® HD Graphics 505

based on

https://github.com/intel/clDNN

---

Codename Skylake:

Intel® HD Graphics 510 (GT1, client market)

Intel® HD Graphics 515 (GT2, client market)

Intel® HD Graphics 520 (GT2, client market)

Intel® HD Graphics 530 (GT2, client market)

Intel® Iris® Graphics 540 (GT3e, client market)

Intel® Iris® Graphics 550 (GT3e, client market)

Intel® Iris® Pro Graphics 580 (GT4e, client market)

Intel® HD Graphics P530 (GT2, server market)

Intel® Iris® Pro Graphics P555 (GT3e, server market)

Intel® Iris® Pro Graphics P580 (GT4e, server market)

Codename Apollolake:

Intel® HD Graphics 500

Intel® HD Graphics 505

nikos1 · ‎01-11-2020

Hi, Just to update here, run on my E3950 the latest OpenVino release, benchmarked an SSD FP16 and no issues on GPU device.

The CPU device run much slower (as expected on this slow Atom) than -d GPU

My clinfo shows OpenCL 1.2 as in your case so this not an issue.

I believe you have an issue with multi GPU OpenCL installation - please get the Intel OpenCL so in your path and try again.

Cheers,

nikos

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 NEO 
  Driver Version                                  19.13.12717
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE

[ INFO ] Loading Inference Engine [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 32974

Count:      3368 iterations
Duration:   60080.1 ms
Latency:    68.2936 ms
Throughput: 56.0585 FPS

	Device: GPU
	Metrics: 
		AVAILABLE_DEVICES : [  ]
		SUPPORTED_METRICS : [ AVAILABLE_DEVICES SUPPORTED_METRICS FULL_DEVICE_NAME OPTIMIZATION_CAPABILITIES SUPPORTED_CONFIG_KEYS NUMBER_OF_WAITING_INFER_REQUESTS NUMBER_OF_EXEC_INFER_REQUESTS RANGE_FOR_ASYNC_INFER_REQUESTS RANGE_FOR_STREAMS ]
		FULL_DEVICE_NAME : Intel(R) Gen9 HD Graphics
		OPTIMIZATION_CAPABILITIES : [ FP32 BIN FP16 ]
		SUPPORTED_CONFIG_KEYS : [ CLDNN_INT8_ENABLED CLDNN_MEM_POOL CLDNN_PLUGIN_PRIORITY CLDNN_PLUGIN_THROTTLE DUMP_KERNELS DYN_BATCH_ENABLED EXCLUSIVE_ASYNC_REQUESTS GPU_THROUGHPUT_STREAMS PERF_COUNT TUNING_MODE ]
		NUMBER_OF_WAITING_INFER_REQUESTS : 0
		NUMBER_OF_EXEC_INFER_REQUESTS : 0
		RANGE_FOR_ASYNC_INFER_REQUESTS : { 1, 2, 1 }
		RANGE_FOR_STREAMS : { 1, 2 }
	Default values for device configuration keys: 
		CLDNN_INT8_ENABLED : NO
		CLDNN_MEM_POOL : YES
		CLDNN_PLUGIN_PRIORITY : 0
		CLDNN_PLUGIN_THROTTLE : 0
		DUMP_KERNELS : NO
		DYN_BATCH_ENABLED : NO
		EXCLUSIVE_ASYNC_REQUESTS : NO
		GPU_THROUGHPUT_STREAMS : 1
		PERF_COUNT : NO
		TUNING_MODE : TUNING_DISABLED

nikos1 · ‎01-12-2020

Just to add some more information - here is the GPU load when running E3950 HD505 GPU inference (using command intel_gpu_top)

Let us know if you need any more help yo resolve the OpenCL env issue.

                   render busy:  94%: ██████████████████▉                    render space: 122/16384
                          task  percent busy
                            CS:  94%: ██████████████████▉     vert fetch: 0 (0/sec)
                           GAM:  91%: ██████████████████▎     prim fetch: 0 (0/sec)
                           TSG:  88%: █████████████████▋   VS invocations: 0 (0/sec)
                           VFE:  80%: ████████████████     GS invocations: 0 (0/sec)
                          GAFS:  10%: ██                        GS prims: 0 (0/sec)
                           TDG:   5%: █                    CL invocations: 0 (0/sec)
                            SF:   1%: ▎                         CL prims: 0 (0/sec)
                            VS:   1%: ▎                    PS invocations: 0 (0/sec)
                          URBM:   1%: ▎                    PS depth pass: 0 (0/sec)
                           SVG:   0%:                      
                            VF:   0%:                      
                            CL:   0%:                      
                           SDE:   0%:                      
                          GAFM:   0%:

How to run openvino GPU inference on E3950?