Intel® oneAPI DL Framework Developer Toolkit
Get answers for developing new or customizing existing frameworks using common APIs.
16 Discussions

Unknown OpenCL error when executing oneAPI DL examples

rahulv4667
Beginner
1,342 Views

Hello,

 

I am trying to get the examples in oneDNN run. While everything is building fine, I get the following error whien I try to run the final executable

Error in the example: Native API failed. Native API returns: -999 (Unknown OpenCL error code) -999 (Unknown OpenCL error code).
Example failed on CPU.

 

This is when the program is run only using cpu. When I try to get it to run on GPU, it fails silently. It doesn't return any error but stops executing. Making changes to the `simple_model` example, I figured it is exiting when executing the following line at the start of simple_net() function:

engine eng(engine_kind, 0);

 

First, I checked if my GPU drivers had OpenCL(it did) and then downloaded & installed CPU runtime for OpenCL. But still the issue persists. 

 

I have installed oneAPI base toolkit instead of individual components. So, I am unsure where the issue is. The following is my system information:

OS: Windows 10 Home - 19043.985

CPU:  Intel(R) Core(TM) i5-7200 CPU 2.5 GHz-2.7GHz

GPU: Intel(R) HD 620 Graphics

GPU driver version: 27.20.100.8854 (I faced issues of OpenGL not working when I installed 30.*.*.*)

Toolkit version: Base Toolkit 2021.2.0.2871

 

 

Irrelevant to the above query, is there a way to use GPU using python? And if we can, can you please point to the right resource on how to get it running? Also, is there a way we can run onednn in python as standlaone? AFAIK, we can use python oneDNN only as a backend to popular frameworks like PyTorch and Tensorflow. Please correct me if I am  wrong.

0 Kudos
9 Replies
Gopika_Intel
Moderator
1,317 Views

Hi,

Thank you for reaching out and providing the necessary information.

1. For the opencl error that you’re getting in CPU, please share the outputs of the following checks

o  output of sycl-ls or clinfo and ensure that CPU and GPU are detected.

o  Also please try running any simple dpcpp application. This is to ensure that the installation is correct.

2. We were able to run the simple_model sample. It executed without any errors. Please ensure you’ve followed the same steps as us to run the sample in Windows.

o  Open a OneAPI command prompt (setvars.bat would be already sourced).

o  Execute the below command in the desired folder where your project needs to be built.

o  oneapi-cli

o  Select Create a project-> cpp-> Toolkit-> oneAPI Libraries-> oneDNN-> simple_model

o  Go to the simple_model folder and do the following steps

cd simple_model
mkdir build
cd build
cmake -G Ninja ..
cmake --build .

 

o  To execute the sample : bin\cnn-inference-f32-cpp.exe   (CPU) or bin\cnn-inference-f32-cpp.exe gpu   (GPU)

Sample output GPU

Use time: 182.05 ms per iteration.

Example passed on GPU.

Sample output CPU

Use time: 27.94 ms per iteration.

Example passed on CPU.

3. Is there a way to use GPU using python? And if we can, can you please point to the right resource on how to get it running? Also, is there a way we can run onednn in python as standalone?

o  We’ll discuss this with the internal team and get back to you

4. AFAIK, we can use python oneDNN only as a backend to popular frameworks like PyTorch and Tensorflow. Please correct me if I am wrong.

o  Yes, you are right. TensorFlow has been directly optimized for Intel® architecture using the primitives of Intel® oneAPI Deep Neural Network Library (oneDNN) to maximize performance.

Regards

Gopika

 

rahulv4667
Beginner
1,307 Views

 

C:\Program Files (x86)\Intel\oneAPI>clinfo                                                                                                        Number of platforms:                             5                                                                                                                        Platform Profile:                              EMBEDDED_PROFILE                                                                                                         Platform Version:                              OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3                                                                Platform Name:                                 Intel(R) FPGA Emulation Platform for OpenCL(TM)                                                                          Platform Vendor:                               Intel(R) Corporation                                                                                                     Platform Extensions:                           cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics                                                      Platform Profile:                              FULL_PROFILE                                                                                                             Platform Version:                              OpenCL 2.1 WINDOWS                                                                                                       Platform Name:                                 Intel(R) OpenCL                                                                                                          Platform Vendor:                               Intel(R) Corporation                                                                                                     Platform Extensions:                           cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer                                       Platform Profile:                              FULL_PROFILE                                                                                                             Platform Version:                              OpenCL 2.1                                                                                                               Platform Name:                                 Intel(R) OpenCL HD Graphics                                                                                              Platform Vendor:                               Intel(R) Corporation                                                                                                     Platform Extensions:                           cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_unified_sharing cl_intel_simultaneous_sharing                                                                                                                                                       Platform Profile:                              FULL_PROFILE                                                                                                             Platform Version:                              OpenCL 2.1 AMD-APP (3240.6)                                                                                              Platform Name:                                 AMD Accelerated Parallel Processing                                                                                      Platform Vendor:                               Advanced Micro Devices, Inc.                                                                                             Platform Extensions:                           cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices                                                                                                                                                                       Platform Profile:                              FULL_PROFILE                                                                                                             Platform Version:                              OpenCL 2.1 WINDOWS                                                                                                       Platform Name:                                 Intel(R) OpenCL                                                                                                          Platform Vendor:                               Intel(R) Corporation                                                                                                     Platform Extensions:                           cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer                                                                                                                                                                                                                                                                                                                                                                                       Platform Name:                                 Intel(R) FPGA Emulation Platform for OpenCL(TM)                                                                        Number of devices:                               1                                                                                                                        Device Type:                                   CL_DEVICE_TYPE_ACCRLERATOR                                                                                               Vendor ID:                                     1172h                                                                                                                    Max compute units:                             4                                                                                                                        Max work items dimensions:                     3                                                                                                                          Max work items[0]:                           67108864                                                                                                                   Max work items[1]:                           67108864                                                                                                                   Max work items[2]:                           67108864                                                                                                                 Max work group size:                           67108864                                                                                                                 Preferred vector width char:                   1                                                                                                                        Preferred vector width short:                  1                                                                                                                        Preferred vector width int:                    1                                                                                                                        Preferred vector width long:                   1                                                                                                                        Preferred vector width float:                  1                                                                                                                        Preferred vector width double:                 1                                                                                                                        Native vector width char:                      32                                                                                                                       Native vector width short:                     16                                                                                                                       Native vector width int:                       8                                                                                                                        Native vector width long:                      4                                                                                                                        Native vector width float:                     8                                                                                                                        Native vector width double:                    4                                                                                                                        Max clock frequency:                           2500Mhz                                                                                                                  Address bits:                                  64                                                                                                                       Max memory allocation:                         3186008064                                                                                                               Image support:                                 No                                                                                                                       Max size of kernel argument:                   3840                                                                                                                     Alignment (bits) of base address:              1024                                                                                                                     Minimum alignment (bytes) for any datatype:    128                                                                                                                      Single precision floating point capability                                                                                                                                Denorms:                                     Yes                                                                                                                        Quiet NaNs:                                  Yes                                                                                                                        Round to nearest even:                       Yes                                                                                                                        Round to zero:                               No                                                                                                                         Round to +ve and infinity:                   No                                                                                                                         IEEE754-2008 fused multiply-add:             No                                                                                                                       Cache type:                                    Read/Write                                                                                                               Cache line size:                               64                                                                                                                       Cache size:                                    262144                                                                                                                   Global memory size:                            12744032256                                                                                                              Constant buffer size:                          131072                                                                                                                   Max number of constant args:                   480                                                                                                                      Local memory type:                             Global                                                                                                                   Local memory size:                             262144                                                                                                                   Kernel Preferred work group size multiple:     128                                                                                                                      Error correction support:                      0                                                                                                                        Unified memory for Host and Device:            1                                                                                                                        Profiling timer resolution:                    100                                                                                                                      Device endianess:                              Little                                                                                                                   Available:                                     Yes                                                                                                                      Compiler available:                            Yes                                                                                                                      Execution capabilities:                                                                                                                                                   Execute OpenCL kernels:                      Yes                                                                                                                        Execute native function:                     Yes                                                                                                                      Queue on Host properties:                                                                                                                                                 Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Platform ID:                                   0000019A3462EFB8                                                                                                         Name:                                          Intel(R) FPGA Emulation Device                                                                                           Vendor:                                        Intel(R) Corporation                                                                                                     Device OpenCL C version:                       OpenCL C 1.2                                                                                                             Driver version:                                2021.11.3.0.17_160000                                                                                                    Profile:                                       EMBEDDED_PROFILE                                                                                                         Version:                                       OpenCL 1.2                                                                                                               Extensions:                                    cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics                                                                                                                                                                                                                                                                                                                                                                                                      Platform Name:                                 Intel(R) OpenCL                                                                                                        Number of devices:                               1                                                                                                                        Device Type:                                   CL_DEVICE_TYPE_CPU                                                                                                       Vendor ID:                                     8086h                                                                                                                    Max compute units:                             4                                                                                                                        Max work items dimensions:                     3                                                                                                                          Max work items[0]:                           8192                                                                                                                       Max work items[1]:                           8192                                                                                                                       Max work items[2]:                           8192                                                                                                                     Max work group size:                           8192                                                                                                                     Preferred vector width char:                   1                                                                                                                        Preferred vector width short:                  1                                                                                                                        Preferred vector width int:                    1                                                                                                                        Preferred vector width long:                   1                                                                                                                        Preferred vector width float:                  1                                                                                                                        Preferred vector width double:                 1                                                                                                                        Native vector width char:                      32                                                                                                                       Native vector width short:                     16                                                                                                                       Native vector width int:                       8                                                                                                                        Native vector width long:                      4                                                                                                                        Native vector width float:                     8                                                                                                                        Native vector width double:                    4                                                                                                                        Max clock frequency:                           2500Mhz                                                                                                                  Address bits:                                  64                                                                                                                       Max memory allocation:                         3186008064                                                                                                               Image support:                                 Yes                                                                                                                      Max number of images read arguments:           480                                                                                                                      Max number of images write arguments:          480                                                                                                                      Max image 2D width:                            16384                                                                                                                    Max image 2D height:                           16384                                                                                                                    Max image 3D width:                            2048                                                                                                                     Max image 3D height:                           2048                                                                                                                     Max image 3D depth:                            2048                                                                                                                     Max samplers within kernel:                    480                                                                                                                      Max size of kernel argument:                   3840                                                                                                                     Alignment (bits) of base address:              1024                                                                                                                     Minimum alignment (bytes) for any datatype:    128                                                                                                                      Single precision floating point capability                                                                                                                                Denorms:                                     Yes                                                                                                                        Quiet NaNs:                                  Yes                                                                                                                        Round to nearest even:                       Yes                                                                                                                        Round to zero:                               No                                                                                                                         Round to +ve and infinity:                   No                                                                                                                         IEEE754-2008 fused multiply-add:             No                                                                                                                       Cache type:                                    Read/Write                                                                                                               Cache line size:                               64                                                                                                                       Cache size:                                    262144                                                                                                                   Global memory size:                            12744032256                                                                                                              Constant buffer size:                          131072                                                                                                                   Max number of constant args:                   480                                                                                                                      Local memory type:                             Global                                                                                                                   Local memory size:                             32768                                                                                                                    Max pipe arguments:                            16                                                                                                                       Max pipe active reservations:                  65535                                                                                                                    Max pipe packet size:                          1024                                                                                                                     Max global variable size:                      65536                                                                                                                    Max global variable preferred total size:      65536                                                                                                                    Max read/write image args:                     480                                                                                                                      Max on device events:                          4294967295                                                                                                               Queue on device max size:                      4294967295                                                                                                               Max on device queues:                          4294967295                                                                                                               Queue on device preferred size:                4294967295                                                                                                               SVM capabilities:                                                                                                                                                         Coarse grain buffer:                         Yes                                                                                                                        Fine grain buffer:                           Yes                                                                                                                        Fine grain system:                           Yes                                                                                                                        Atomics:                                     Yes                                                                                                                      Preferred platform atomic alignment:           64                                                                                                                       Preferred global atomic alignment:             64                                                                                                                       Preferred local atomic alignment:              0                                                                                                                        Kernel Preferred work group size multiple:     128                                                                                                                      Error correction support:                      0                                                                                                                        Unified memory for Host and Device:            1                                                                                                                        Profiling timer resolution:                    100                                                                                                                      Device endianess:                              Little                                                                                                                   Available:                                     Yes                                                                                                                      Compiler available:                            Yes                                                                                                                      Execution capabilities:                                                                                                                                                   Execute OpenCL kernels:                      Yes                                                                                                                        Execute native function:                     Yes                                                                                                                      Queue on Host properties:                                                                                                                                                 Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Queue on Device properties:                                                                                                                                               Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Platform ID:                                   0000019A34656728                                                                                                         Name:                                          Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz                                                                                 Vendor:                                        Intel(R) Corporation                                                                                                     Device OpenCL C version:                       OpenCL C 2.0                                                                                                             Driver version:                                2021.11.3.0.17_160000                                                                                                    Profile:                                       FULL_PROFILE                                                                                                             Version:                                       OpenCL 2.1 (Build 0)                                                                                                     Extensions:                                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer                                                                                                                                                                                                                                                                                                                                                                                       Platform Name:                                 Intel(R) OpenCL HD Graphics                                                                                            Number of devices:                               1                                                                                                                        Device Type:                                   CL_DEVICE_TYPE_GPU                                                                                                       Vendor ID:                                     8086h                                                                                                                    Max compute units:                             24                                                                                                                       Max work items dimensions:                     3                                                                                                                          Max work items[0]:                           256                                                                                                                        Max work items[1]:                           256                                                                                                                        Max work items[2]:                           256                                                                                                                      Max work group size:                           256                                                                                                                      Preferred vector width char:                   16                                                                                                                       Preferred vector width short:                  8                                                                                                                        Preferred vector width int:                    4                                                                                                                        Preferred vector width long:                   1                                                                                                                        Preferred vector width float:                  1                                                                                                                        Preferred vector width double:                 1                                                                                                                        Native vector width char:                      16                                                                                                                       Native vector width short:                     8                                                                                                                        Native vector width int:                       4                                                                                                                        Native vector width long:                      1                                                                                                                        Native vector width float:                     1                                                                                                                        Native vector width double:                    1                                                                                                                        Max clock frequency:                           1000Mhz                                                                                                                  Address bits:                                  64                                                                                                                       Max memory allocation:                         2548805632                                                                                                               Image support:                                 Yes                                                                                                                      Max number of images read arguments:           128                                                                                                                      Max number of images write arguments:          128                                                                                                                      Max image 2D width:                            16384                                                                                                                    Max image 2D height:                           16384                                                                                                                    Max image 3D width:                            16384                                                                                                                    Max image 3D height:                           16384                                                                                                                    Max image 3D depth:                            2048                                                                                                                     Max samplers within kernel:                    16                                                                                                                       Max size of kernel argument:                   2048                                                                                                                     Alignment (bits) of base address:              1024                                                                                                                     Minimum alignment (bytes) for any datatype:    128                                                                                                                      Single precision floating point capability                                                                                                                                Denorms:                                     Yes                                                                                                                        Quiet NaNs:                                  Yes                                                                                                                        Round to nearest even:                       Yes                                                                                                                        Round to zero:                               Yes                                                                                                                        Round to +ve and infinity:                   Yes                                                                                                                        IEEE754-2008 fused multiply-add:             Yes                                                                                                                      Cache type:                                    Read/Write                                                                                                               Cache line size:                               64                                                                                                                       Cache size:                                    524288                                                                                                                   Global memory size:                            5097611264                                                                                                               Constant buffer size:                          2548805632                                                                                                               Max number of constant args:                   8                                                                                                                        Local memory type:                             Scratchpad                                                                                                               Local memory size:                             65536                                                                                                                    Max pipe arguments:                            16                                                                                                                       Max pipe active reservations:                  1                                                                                                                        Max pipe packet size:                          1024                                                                                                                     Max global variable size:                      65536                                                                                                                    Max global variable preferred total size:      2548805632                                                                                                               Max read/write image args:                     128                                                                                                                      Max on device events:                          1024                                                                                                                     Queue on device max size:                      67108864                                                                                                                 Max on device queues:                          1                                                                                                                        Queue on device preferred size:                131072                                                                                                                   SVM capabilities:                                                                                                                                                         Coarse grain buffer:                         Yes                                                                                                                        Fine grain buffer:                           Yes                                                                                                                        Fine grain system:                           No                                                                                                                         Atomics:                                     Yes                                                                                                                      Preferred platform atomic alignment:           64                                                                                                                       Preferred global atomic alignment:             64                                                                                                                       Preferred local atomic alignment:              64                                                                                                                       Kernel Preferred work group size multiple:     32                                                                                                                       Error correction support:                      0                                                                                                                        Unified memory for Host and Device:            1                                                                                                                        Profiling timer resolution:                    83                                                                                                                       Device endianess:                              Little                                                                                                                   Available:                                     Yes                                                                                                                      Compiler available:                            Yes                                                                                                                      Execution capabilities:                                                                                                                                                   Execute OpenCL kernels:                      Yes                                                                                                                        Execute native function:                     No                                                                                                                       Queue on Host properties:                                                                                                                                                 Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Queue on Device properties:                                                                                                                                               Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Platform ID:                                   0000019A3467F070                                                                                                         Name:                                          Intel(R) HD Graphics 620                                                                                                 Vendor:                                        Intel(R) Corporation                                                                                                     Device OpenCL C version:                       OpenCL C 2.0                                                                                                             Driver version:                                27.20.100.8854                                                                                                           Profile:                                       FULL_PROFILE                                                                                                             Version:                                       OpenCL 2.1 NEO                                                                                                           Extensions:                                    cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_unified_sharing cl_intel_simultaneous_sharing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Platform Name:                                 AMD Accelerated Parallel Processing                                                                                    Number of devices:                               1                                                                                                                        Device Type:                                   CL_DEVICE_TYPE_GPU                                                                                                       Vendor ID:                                     1002h                                                                                                                    Board name:                                    AMD Radeon (TM) R5 M330                                                                                                  Device Topology:                               PCI[ B#1, D#0, F#0 ]                                                                                                     Max compute units:                             5                                                                                                                        Max work items dimensions:                     3                                                                                                                          Max work items[0]:                           1024                                                                                                                       Max work items[1]:                           1024                                                                                                                       Max work items[2]:                           1024                                                                                                                     Max work group size:                           256                                                                                                                      Preferred vector width char:                   4                                                                                                                        Preferred vector width short:                  2                                                                                                                        Preferred vector width int:                    1                                                                                                                        Preferred vector width long:                   1                                                                                                                        Preferred vector width float:                  1                                                                                                                        Preferred vector width double:                 1                                                                                                                        Native vector width char:                      4                                                                                                                        Native vector width short:                     2                                                                                                                        Native vector width int:                       1                                                                                                                        Native vector width long:                      1                                                                                                                        Native vector width float:                     1                                                                                                                        Native vector width double:                    1                                                                                                                        Max clock frequency:                           400Mhz                                                                                                                   Address bits:                                  32                                                                                                                       Max memory allocation:                         1597190963                                                                                                               Image support:                                 Yes                                                                                                                      Max number of images read arguments:           128                                                                                                                      Max number of images write arguments:          8                                                                                                                        Max image 2D width:                            16384                                                                                                                    Max image 2D height:                           16384                                                                                                                    Max image 3D width:                            2048                                                                                                                     Max image 3D height:                           2048                                                                                                                     Max image 3D depth:                            2048                                                                                                                     Max samplers within kernel:                    16                                                                                                                       Max size of kernel argument:                   1024                                                                                                                     Alignment (bits) of base address:              2048                                                                                                                     Minimum alignment (bytes) for any datatype:    128                                                                                                                      Single precision floating point capability                                                                                                                                Denorms:                                     No                                                                                                                         Quiet NaNs:                                  Yes                                                                                                                        Round to nearest even:                       Yes                                                                                                                        Round to zero:                               Yes                                                                                                                        Round to +ve and infinity:                   Yes                                                                                                                        IEEE754-2008 fused multiply-add:             Yes                                                                                                                      Cache type:                                    Read/Write                                                                                                               Cache line size:                               64                                                                                                                       Cache size:                                    16384                                                                                                                    Global memory size:                            2147483648                                                                                                               Constant buffer size:                          65536                                                                                                                    Max number of constant args:                   8                                                                                                                        Local memory type:                             Scratchpad                                                                                                               Local memory size:                             32768                                                                                                                    Max pipe arguments:                            0                                                                                                                        Max pipe active reservations:                  0                                                                                                                        Max pipe packet size:                          0                                                                                                                        Max global variable size:                      0                                                                                                                        Max global variable preferred total size:      0                                                                                                                        Max read/write image args:                     0                                                                                                                        Max on device events:                          0                                                                                                                        Queue on device max size:                      0                                                                                                                        Max on device queues:                          0                                                                                                                        Queue on device preferred size:                0                                                                                                                        SVM capabilities:                                                                                                                                                         Coarse grain buffer:                         No                                                                                                                         Fine grain buffer:                           No                                                                                                                         Fine grain system:                           No                                                                                                                         Atomics:                                     No                                                                                                                       Preferred platform atomic alignment:           0                                                                                                                        Preferred global atomic alignment:             0                                                                                                                        Preferred local atomic alignment:              0                                                                                                                        Kernel Preferred work group size multiple:     64                                                                                                                       Error correction support:                      0                                                                                                                        Unified memory for Host and Device:            0                                                                                                                        Profiling timer resolution:                    1                                                                                                                        Device endianess:                              Little                                                                                                                   Available:                                     Yes                                                                                                                      Compiler available:                            Yes                                                                                                                      Execution capabilities:                                                                                                                                                   Execute OpenCL kernels:                      Yes                                                                                                                        Execute native function:                     No                                                                                                                       Queue on Host properties:                                                                                                                                                 Out-of-Order:                                No                                                                                                                         Profiling :                                  Yes                                                                                                                      Queue on Device properties:                                                                                                                                               Out-of-Order:                                No                                                                                                                         Profiling :                                  No                                                                                                                       Platform ID:                                   00007FFDDE4DF000                                                                                                         Name:                                          Hainan                                                                                                                   Vendor:                                        Advanced Micro Devices, Inc.                                                                                             Device OpenCL C version:                       OpenCL C 1.2                                                                                                             Driver version:                                3240.6                                                                                                                   Profile:                                       FULL_PROFILE                                                                                                             Version:                                       OpenCL 1.2 AMD-APP (3240.6)                                                                                              Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash                                                                                                                                                                                                                                                                                                                                                                                                                                          Platform Name:                                 Intel(R) OpenCL                                                                                                        Number of devices:                               1                                                                                                                        Device Type:                                   CL_DEVICE_TYPE_CPU                                                                                                       Vendor ID:                                     8086h                                                                                                                    Max compute units:                             4                                                                                                                        Max work items dimensions:                     3                                                                                                                          Max work items[0]:                           8192                                                                                                                       Max work items[1]:                           8192                                                                                                                       Max work items[2]:                           8192                                                                                                                     Max work group size:                           8192                                                                                                                     Preferred vector width char:                   1                                                                                                                        Preferred vector width short:                  1                                                                                                                        Preferred vector width int:                    1                                                                                                                        Preferred vector width long:                   1                                                                                                                        Preferred vector width float:                  1                                                                                                                        Preferred vector width double:                 1                                                                                                                        Native vector width char:                      32                                                                                                                       Native vector width short:                     16                                                                                                                       Native vector width int:                       8                                                                                                                        Native vector width long:                      4                                                                                                                        Native vector width float:                     8                                                                                                                        Native vector width double:                    4                                                                                                                        Max clock frequency:                           2500Mhz                                                                                                                  Address bits:                                  64                                                                                                                       Max memory allocation:                         3186008064                                                                                                               Image support:                                 Yes                                                                                                                      Max number of images read arguments:           480                                                                                                                      Max number of images write arguments:          480                                                                                                                      Max image 2D width:                            16384                                                                                                                    Max image 2D height:                           16384                                                                                                                    Max image 3D width:                            2048                                                                                                                     Max image 3D height:                           2048                                                                                                                     Max image 3D depth:                            2048                                                                                                                     Max samplers within kernel:                    480                                                                                                                      Max size of kernel argument:                   3840                                                                                                                     Alignment (bits) of base address:              1024                                                                                                                     Minimum alignment (bytes) for any datatype:    128                                                                                                                      Single precision floating point capability                                                                                                                                Denorms:                                     Yes                                                                                                                        Quiet NaNs:                                  Yes                                                                                                                        Round to nearest even:                       Yes                                                                                                                        Round to zero:                               No                                                                                                                         Round to +ve and infinity:                   No                                                                                                                         IEEE754-2008 fused multiply-add:             No                                                                                                                       Cache type:                                    Read/Write                                                                                                               Cache line size:                               64                                                                                                                       Cache size:                                    262144                                                                                                                   Global memory size:                            12744032256                                                                                                              Constant buffer size:                          131072                                                                                                                   Max number of constant args:                   480                                                                                                                      Local memory type:                             Global                                                                                                                   Local memory size:                             32768                                                                                                                    Max pipe arguments:                            16                                                                                                                       Max pipe active reservations:                  65535                                                                                                                    Max pipe packet size:                          1024                                                                                                                     Max global variable size:                      65536                                                                                                                    Max global variable preferred total size:      65536                                                                                                                    Max read/write image args:                     480                                                                                                                      Max on device events:                          4294967295                                                                                                               Queue on device max size:                      4294967295                                                                                                               Max on device queues:                          4294967295                                                                                                               Queue on device preferred size:                4294967295                                                                                                               SVM capabilities:                                                                                                                                                         Coarse grain buffer:                         Yes                                                                                                                        Fine grain buffer:                           Yes                                                                                                                        Fine grain system:                           Yes                                                                                                                        Atomics:                                     Yes                                                                                                                      Preferred platform atomic alignment:           64                                                                                                                       Preferred global atomic alignment:             64                                                                                                                       Preferred local atomic alignment:              0                                                                                                                        Kernel Preferred work group size multiple:     128                                                                                                                      Error correction support:                      0                                                                                                                        Unified memory for Host and Device:            1                                                                                                                        Profiling timer resolution:                    100                                                                                                                      Device endianess:                              Little                                                                                                                   Available:                                     Yes                                                                                                                      Compiler available:                            Yes                                                                                                                      Execution capabilities:                                                                                                                                                   Execute OpenCL kernels:                      Yes                                                                                                                        Execute native function:                     Yes                                                                                                                      Queue on Host properties:                                                                                                                                                 Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Queue on Device properties:                                                                                                                                               Out-of-Order:                                Yes                                                                                                                        Profiling :                                  Yes                                                                                                                      Platform ID:                                   0000019A37A87CA8                                                                                                         Name:                                          Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz                                                                                 Vendor:                                        Intel(R) Corporation                                                                                                     Device OpenCL C version:                       OpenCL C 2.0                                                                                                             Driver version:                                2021.11.3.0.17_160000                                                                                                    Profile:                                       FULL_PROFILE                                                                                                             Version:                                       OpenCL 2.1 (Build 0)                                                                                                     Extensions:                                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

 

This is the result I got from running `clinfo`(there seems to be some problem with formatting. I have attached the text file too). Two things I notice here are that

  • there are two CPU type devices. I am assuming this because of dual core.
  • An FPGA is being detected even though there is no FPGA attached to my laptop.

 

'sycl-ls' didn't work. It completed executing without giving any output.

 

I executed the dpc++ vector example the first I installed base toolkit on my system. This is 10 days back. But right now, when I tried, it is throwing an exception. The following is the output log from Visual Studio 

 

'vector-add-usm.exe' (Win32): Loaded 'C:\Users\4667r\Source\Repos\Base_Vector_Add1\x64\Debug\vector-add-usm.exe'. Symbols loaded.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ntdll.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\kernel32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\KernelBase.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\bin\sycld.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\shlwapi.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msvcrt.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msvcp140d.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140d.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbased.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\OpenCL.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\cfgmgr32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbase.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\combase.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\rpcrt4.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ole32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\gdi32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\win32u.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\gdi32full.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msvcp_win.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\user32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\advapi32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\sechost.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\redist\intel64_win\compiler\svml_dispmd.dll'. Symbols loaded.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\redist\intel64_win\compiler\libmmdd.dll'. Symbols loaded.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140_1d.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\imm32.dll'. 
The thread 0x28d8 has exited with code 0 (0x0).
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\bin\pi_opencl.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\bin\pi_level_zero.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ze_loader.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\emu\intelocl64_emu.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\emu\task_executor64_emu.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\emu\cpu_device64_emu.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\x64\intelocl64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\x64\task_executor64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Intel\oneAPI\compiler\2021.2.0\windows\lib\x64\cpu_device64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\kernel.appcore.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\bcryptprimitives.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\clbcatq.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\AppXDeploymentClient.dll'. 
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\AppXDeploymentClient.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\dxgi.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ResourcePolicyClient.dll'. 
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\ResourcePolicyClient.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\windows.storage.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\wldp.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\SHCore.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdhdl64.dll'. 
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdhdl64.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igdrcl64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ws2_32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igdgmm64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DXCore.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\opengl32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\glu32.dll'. 
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\glu32.dll'
'vector-add-usm.exe' (Win32): Unloaded 'C:\Windows\System32\opengl32.dll'
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igdfcl64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\shell32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\igc64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdocl64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\setupapi.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\bcrypt.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\opengl32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\glu32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\version.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\atiadlxx.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\psapi.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\propsys.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\pdh.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\devobj.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\wintrust.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\crypt32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msasn1.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\winmm.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\dwmapi.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\atig6txx.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amdocl12cl64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\dbghelp.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\u0366400.inf_amd64_4021c2cb607d5b92\B366217\amd_comgr.dll'. Module was built without symbols.
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\oleaut32.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\Shared Libraries\intel64\intelocl64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\Shared Libraries\intel64\task_executor64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\Shared Libraries\intel64\cpu_device64.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\igdlh64.inf_amd64_25477efa0de18af8\ze_intel_gpu64.dll'. 
Exception thrown at 0x00007FFE276A4B89 in vector-add-usm.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x00000029A873F028.
Debug Error!

Program: ...7r\source\repos\Base_Vector_Add1\x64\Debug\vector-add-usm.exe

abort() has been called

(Press Retry to debug the application)
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\TextShaping.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\uxtheme.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\msctf.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\TextInputFramework.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\CoreUIComponents.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\CoreMessaging.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\ntmarta.dll'. 
'vector-add-usm.exe' (Win32): Loaded 'C:\Windows\System32\WinTypes.dll'. 
The thread 0xe84 has exited with code 3 (0x3).
The thread 0x2630 has exited with code 3 (0x3).
The thread 0x10c0 has exited with code 3 (0x3).
The thread 0x1e34 has exited with code 3 (0x3).
The thread 0x1d18 has exited with code 3 (0x3).
The program '[10444] vector-add-usm.exe' has exited with code 3 (0x3).

 

 

Coming to running the oneDNN sample, I followed the exact same instructions which gave the unexpected error result that opened the thread for. Now, it is evident that something is messed up in DPC++ itself. But I am unable to understand what it is from the log.

Gopika_Intel
Moderator
1,277 Views

Hi,

Thank you for the update. Please try running the dpcpp sample in CPU. If it is running, then try running that same dpcpp on gpu. Ensure that the sample is running in gpu by printing the selected device as below:

 

std::cout << "Running on device: " << q.get_device().get_info<info::device::name>() << "\n";

 

And if the selected device is intel GPU and if the dpccp sample fails, then reinstall Intel OneAPI Base Toolkit and try running the OneDNN and DPCPP samples again.

Please go to this link, for downloading the latest Intel OneAPI Base Toolkit: https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html

 

Answers to your first query are given below:

>Is there a way to use GPU using python? And if we can, can you please point to the right resource on how to get it running?

 

Popular frameworks like Pytorch and Tensorflow are the ways to use GPU using Python. These options are not available for public use as of now. Currently it is available only for NDA customers

 

> Also, is there a way we can run onednn in python as standalone?

 

OneDNN does not have a python wrapper as of now.

 

Regards

Gopika

 

rahulv4667
Beginner
1,255 Views

Hello,

 

I finally reinstalled base toolkit. But no change. The DPCPP example is not working. The same error keeps cropping up again and again for both CPU and GPU.

 

Error in the example: Native API failed. Native API returns: -999 (Unknown OpenCL error code) -999 (Unknown OpenCL error code). Example failed on CPU.
Gopika_Intel
Moderator
1,239 Views

Hi,

Thank you for the update. Please reinstall the Intel OneAPI Base Toolkit after deleting the existing oneAPI directory and then try running the samples. If you still face issues after the clean install, we have a dedicated forum to handle basekit queries and issues, we recommend you raise your issue in Intel OneAPI Basekit forum saying that the sample is not working.

The oneAPI directory to be deleted can be found in this path: C:\Program Files (x86)\Intel\

Intel OneAPI Base kit forum: https://community.intel.com/t5/Intel-oneAPI-Base-Toolkit/bd-p/oneapi-base-toolkit

 

Regards

Gopika

 

rahulv4667
Beginner
1,233 Views

I did that while reinstalling previously. So, will post it on base toolkit forum.

 

Thank you.

Gopika_Intel
Moderator
1,216 Views

Hi,

Thank you for the update. As you are raising the query in Intel OneAPI Base toolkit forum, can we discontinue monitoring this thread?

Regards

Gopika

 

rahulv4667
Beginner
1,205 Views

Sure. No problem. 

Gopika_Intel
Moderator
1,204 Views

Hi,

Thank you for the confirmation. If you need any additional information, please submit a new question as this thread will no longer be monitored.

Regards

Gopika


Reply