- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to get the examples in oneDNN run. While everything is building fine, I get the following error whien I try to run the final executable
Error in the example: Native API failed. Native API returns: -999 (Unknown OpenCL error code) -999 (Unknown OpenCL error code).
Example failed on CPU.
This is when the program is run only using cpu. When I try to get it to run on GPU, it fails silently. It doesn't return any error but stops executing. Making changes to the `simple_model` example, I figured it is exiting when executing the following line at the start of simple_net() function:
engine eng(engine_kind, 0);
First, I checked if my GPU drivers had OpenCL(it did) and then downloaded & installed CPU runtime for OpenCL. But still the issue persists.
I have installed oneAPI base toolkit instead of individual components. So, I am unsure where the issue is. The following is my system information:
OS: Windows 10 Home - 19043.985
CPU: Intel(R) Core(TM) i5-7200 CPU 2.5 GHz-2.7GHz
GPU: Intel(R) HD 620 Graphics
GPU driver version: 27.20.100.8854 (I faced issues of OpenGL not working when I installed 30.*.*.*)
Toolkit version: Base Toolkit 2021.2.0.2871
Irrelevant to the above query, is there a way to use GPU using python? And if we can, can you please point to the right resource on how to get it running? Also, is there a way we can run onednn in python as standlaone? AFAIK, we can use python oneDNN only as a backend to popular frameworks like PyTorch and Tensorflow. Please correct me if I am wrong.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for reaching out and providing the necessary information.
1. For the opencl error that you’re getting in CPU, please share the outputs of the following checks
o output of sycl-ls or clinfo and ensure that CPU and GPU are detected.
o Also please try running any simple dpcpp application. This is to ensure that the installation is correct.
2. We were able to run the simple_model sample. It executed without any errors. Please ensure you’ve followed the same steps as us to run the sample in Windows.
o Open a OneAPI command prompt (setvars.bat would be already sourced).
o Execute the below command in the desired folder where your project needs to be built.
o oneapi-cli
o Select Create a project-> cpp-> Toolkit-> oneAPI Libraries-> oneDNN-> simple_model
o Go to the simple_model folder and do the following steps
cd simple_model
mkdir build
cd build
cmake -G Ninja ..
cmake --build .
o To execute the sample : bin\cnn-inference-f32-cpp.exe (CPU) or bin\cnn-inference-f32-cpp.exe gpu (GPU)
Sample output GPU
Use time: 182.05 ms per iteration.
Example passed on GPU.
Sample output CPU
Use time: 27.94 ms per iteration.
Example passed on CPU.
3. Is there a way to use GPU using python? And if we can, can you please point to the right resource on how to get it running? Also, is there a way we can run onednn in python as standalone?
o We’ll discuss this with the internal team and get back to you
4. AFAIK, we can use python oneDNN only as a backend to popular frameworks like PyTorch and Tensorflow. Please correct me if I am wrong.
o Yes, you are right. TensorFlow has been directly optimized for Intel® architecture using the primitives of Intel® oneAPI Deep Neural Network Library (oneDNN) to maximize performance.
Regards
Gopika
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
C:\Program Files (x86)\Intel\oneAPI>clinfo Number of platforms: 5 Platform Profile: EMBEDDED_PROFILE Platform Version: OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3 Platform Name: Intel(R) FPGA Emulation Platform for OpenCL(TM) Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 WINDOWS Platform Name: Intel(R) OpenCL Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 Platform Name: Intel(R) OpenCL HD Graphics Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_unified_sharing cl_intel_simultaneous_sharing Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (3240.6) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 WINDOWS Platform Name: Intel(R) OpenCL Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer Platform Name: Intel(R) FPGA Emulation Platform for OpenCL(TM) Number of devices: 1 Device Type: CL_DEVICE_TYPE_ACCRLERATOR Vendor ID: 1172h Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 67108864 Max work items[1]: 67108864 Max work items[2]: 67108864 Max work group size: 67108864 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 32 Native vector width short: 16 Native vector width int: 8 Native vector width long: 4 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 2500Mhz Address bits: 64 Max memory allocation: 3186008064 Image support: No Max size of kernel argument: 3840 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 262144 Global memory size: 12744032256 Constant buffer size: 131072 Max number of constant args: 480 Local memory type: Global Local memory size: 262144 Kernel Preferred work group size multiple: 128 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 100 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0000019A3462EFB8 Name: Intel(R) FPGA Emulation Device Vendor: Intel(R) Corporation Device OpenCL C version: OpenCL C 1.2 Driver version: 2021.11.3.0.17_160000 Profile: EMBEDDED_PROFILE Version: OpenCL 1.2 Extensions: cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics Platform Name: Intel(R) OpenCL Number of devices: 1 Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 8086h Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 8192 Max work items[1]: 8192 Max work items[2]: 8192 Max work group size: 8192 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 32 Native vector width short: 16 Native vector width int: 8 Native vector width long: 4 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 2500Mhz Address bits: 64 Max memory allocation: 3186008064 Image support: Yes Max number of images read arguments: 480 Max number of images write arguments: 480 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 480 Max size of kernel argument: 3840 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 262144 Global memory size: 12744032256 Constant buffer size: 131072 Max number of constant args: 480 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 65535 Max pipe packet size: 1024 Max global variable size: 65536 Max global variable preferred total size: 65536 Max read/write image args: 480 Max on device events: 4294967295 Queue on device max size: 4294967295 Max on device queues: 4294967295 Queue on device preferred size: 4294967295 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: Yes Atomics: Yes Preferred platform atomic alignment: 64 Preferred global atomic alignment: 64 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 128 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 100 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties: Out-of-Order: Yes Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0000019A34656728 Name: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz Vendor: Intel(R) Corporation Device OpenCL C version: OpenCL C 2.0 Driver version: 2021.11.3.0.17_160000 Profile: FULL_PROFILE Version: OpenCL 2.1 (Build 0) Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer Platform Name: Intel(R) OpenCL HD Graphics Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 8086h Max compute units: 24 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1000Mhz Address bits: 64 Max memory allocation: 2548805632 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 128 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 16384 Max image 3D height: 16384 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 2048 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 524288 Global memory size: 5097611264 Constant buffer size: 2548805632 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Max pipe arguments: 16 Max pipe active reservations: 1 Max pipe packet size: 1024 Max global variable size: 65536 Max global variable preferred total size: 2548805632 Max read/write image args: 128 Max on device events: 1024 Queue on device max size: 67108864 Max on device queues: 1 Queue on device preferred size: 131072 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: Yes Preferred platform atomic alignment: 64 Preferred global atomic alignment: 64 Preferred local atomic alignment: 64 Kernel Preferred work group size multiple: 32 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 83 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: Yes Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0000019A3467F070 Name: Intel(R) HD Graphics 620 Vendor: Intel(R) Corporation Device OpenCL C version: OpenCL C 2.0 Driver version: 27.20.100.8854 Profile: FULL_PROFILE Version: OpenCL 2.1 NEO Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_unified_sharing cl_intel_simultaneous_sharing Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: AMD Radeon (TM) R5 M330 Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 5 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 400Mhz Address bits: 32 Max memory allocation: 1597190963 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 0 Max pipe active reservations: 0 Max pipe packet size: 0 Max global variable size: 0 Max global variable preferred total size: 0 Max read/write image args: 0 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: No Profiling : No Platform ID: 00007FFDDE4DF000 Name: Hainan Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.2 Driver version: 3240.6 Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (3240.6) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash Platform Name: Intel(R) OpenCL Number of devices: 1 Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 8086h Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 8192 Max work items[1]: 8192 Max work items[2]: 8192 Max work group size: 8192 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 32 Native vector width short: 16 Native vector width int: 8 Native vector width long: 4 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 2500Mhz Address bits: 64 Max memory allocation: 3186008064 Image support: Yes Max number of images read arguments: 480 Max number of images write arguments: 480 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 480 Max size of kernel argument: 3840 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 262144 Global memory size: 12744032256 Constant buffer size: 131072 Max number of constant args: 480 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 65535 Max pipe packet size: 1024 Max global variable size: 65536 Max global variable preferred total size: 65536 Max read/write image args: 480 Max on device events: 4294967295 Queue on device max size: 4294967295 Max on device queues: 4294967295 Queue on device preferred size: 4294967295 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: Yes Atomics: Yes Preferred platform atomic alignment: 64 Preferred global atomic alignment: 64 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 128 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 100 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties: Out-of-Order: Yes Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0000019A37A87CA8 Name: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz Vendor: Intel(R) Corporation Device OpenCL C version: OpenCL C 2.0 Driver version: 2021.11.3.0.17_160000 Profile: FULL_PROFILE Version: OpenCL 2.1 (Build 0) Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
This is the result I got from running `clinfo`(there seems to be some problem with formatting. I have attached the text file too). Two things I notice here are that
- there are two CPU type devices. I am assuming this because of dual core.
- An FPGA is being detected even though there is no FPGA attached to my laptop.
'sycl-ls' didn't work. It completed executing without giving any output.
I executed the dpc++ vector example the first I installed base toolkit on my system. This is 10 days back. But right now, when I tried, it is throwing an exception. The following is the output log from Visual Studio
'vector-add-usm.exe' (Win32): Loaded 'C:\Users\4667r\Source\Repos\Base_Vector_Add1\x64\Debug\vector-add-usm.exe'. Symbols loaded.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ntdll.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\kernel32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\KernelBase.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\bin\sycld.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\shlwapi.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msvcrt.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msvcp140d.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140d.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbased.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\OpenCL.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\cfgmgr32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbase.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\combase.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\rpcrt4.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ole32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\gdi32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\win32u.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\gdi32full.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msvcp_win.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\user32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\advapi32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\sechost.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\redist\intel64_win\compiler\svml_dispmd.dll'. Symbols loaded.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\redist\intel64_win\compiler\libmmdd.dll'. Symbols loaded.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140_1d.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\imm32.dll'.
The thread 0x28d8 has exited with code 0 (0x0).
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\bin\pi_opencl.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\bin\pi_level_zero.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ze_loader.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\emu\intelocl64_emu.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\emu\task_executor64_emu.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\emu\cpu_device64_emu.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\x64\intelocl64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\x64\task_executor64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\x64\cpu_device64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\kernel.appcore.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\bcryptprimitives.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\clbcatq.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\AppXDeploymentClient.dll'.
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\AppXDeploymentClient.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\dxgi.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ResourcePolicyClient.dll'.
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\ResourcePolicyClient.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\windows.storage.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\wldp.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\SHCore.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdhdl64.dll'.
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdhdl64.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igdrcl64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ws2_32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igdgmm64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DXCore.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\opengl32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\glu32.dll'.
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\glu32.dll'
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\opengl32.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igdfcl64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\shell32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igc64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdocl64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\setupapi.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\bcrypt.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\opengl32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\glu32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\version.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\atiadlxx.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\psapi.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\propsys.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\pdh.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\devobj.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\wintrust.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\crypt32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msasn1.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\winmm.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\dwmapi.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\atig6txx.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdocl12cl64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\dbghelp.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amd_comgr.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\oleaut32.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\Shared Libraries\intel64\intelocl64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\Shared Libraries\intel64\task_executor64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\Shared Libraries\intel64\cpu_device64.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\ze_intel_gpu64.dll'.
Exception thrown at 0x00007FFE276A4B89 in vector-add-usm.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x00000029A873F028.
Debug Error!
Program: ...7r\source\repos\Base_Vector_Add1\x64\Debug\vector-add-usm.exe
abort() has been called
(Press Retry to debug the application)
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\TextShaping.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\uxtheme.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msctf.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\TextInputFramework.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\CoreUIComponents.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\CoreMessaging.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ntmarta.dll'.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\WinTypes.dll'.
The thread 0xe84 has exited with code 3 (0x3).
The thread 0x2630 has exited with code 3 (0x3).
The thread 0x10c0 has exited with code 3 (0x3).
The thread 0x1e34 has exited with code 3 (0x3).
The thread 0x1d18 has exited with code 3 (0x3).
The program '[10444] vector-add-usm.exe' has exited with code 3 (0x3).
Coming to running the oneDNN sample, I followed the exact same instructions which gave the unexpected error result that opened the thread for. Now, it is evident that something is messed up in DPC++ itself. But I am unable to understand what it is from the log.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the update. Please try running the dpcpp sample in CPU. If it is running, then try running that same dpcpp on gpu. Ensure that the sample is running in gpu by printing the selected device as below:
std::cout << "Running on device: " << q.get_device().get_info<info::device::name>() << "\n";
And if the selected device is intel GPU and if the dpccp sample fails, then reinstall Intel OneAPI Base Toolkit and try running the OneDNN and DPCPP samples again.
Please go to this link, for downloading the latest Intel OneAPI Base Toolkit: https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html
Answers to your first query are given below:
>Is there a way to use GPU using python? And if we can, can you please point to the right resource on how to get it running?
Popular frameworks like Pytorch and Tensorflow are the ways to use GPU using Python. These options are not available for public use as of now. Currently it is available only for NDA customers
> Also, is there a way we can run onednn in python as standalone?
OneDNN does not have a python wrapper as of now.
Regards
Gopika
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I finally reinstalled base toolkit. But no change. The DPCPP example is not working. The same error keeps cropping up again and again for both CPU and GPU.
Error in the example: Native API failed. Native API returns: -999 (Unknown OpenCL error code) -999 (Unknown OpenCL error code). Example failed on CPU.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the update. Please reinstall the Intel OneAPI Base Toolkit after deleting the existing oneAPI directory and then try running the samples. If you still face issues after the clean install, we have a dedicated forum to handle basekit queries and issues, we recommend you raise your issue in Intel OneAPI Basekit forum saying that the sample is not working.
The oneAPI directory to be deleted can be found in this path: C:\Program Files (x86)\Intel\
Intel OneAPI Base kit forum: https://community.intel.com/t5/Intel-oneAPI-Base-Toolkit/bd-p/oneapi-base-toolkit
Regards
Gopika
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did that while reinstalling previously. So, will post it on base toolkit forum.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the update. As you are raising the query in Intel OneAPI Base toolkit forum, can we discontinue monitoring this thread?
Regards
Gopika
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure. No problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the confirmation. If you need any additional information, please submit a new question as this thread will no longer be monitored.
Regards
Gopika

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page