Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
1829 Discussions

Fail to build openmp_reduction of oneAPI samples

LaurentPlagne
Novice
492 Views

Hi,

I have installed oneAPI Base and HPC Toolkits (2021.1.1) and make (after cmake) does not work. I use

Ubuntu 20.04 with Gen9 GPU.

 

```

~/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/build$ make
[ 50%] Building CXX object src/CMakeFiles/openmp_reduction.dir/main.cpp.o
In file included from /home/lolo/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/src/main.cpp:11:
In file included from /opt/intel/oneapi/dev-utilities/2021.1.1/include/dpc_common.hpp:15:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl.hpp:11:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/ONEAPI/atomic.hpp:11:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/ONEAPI/atomic_accessor.hpp:11:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/ONEAPI/atomic_enums.hpp:12:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/access/access.hpp:10:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/detail/common.hpp:121:
In file included from /opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/exception.hpp:15:
/opt/intel/oneapi/compiler/2021.1.1/linux/bin/../include/sycl/CL/sycl/detail/pi.h:255:37: error: use of undeclared identifier 'CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL'
PI_DEVICE_INFO_USM_HOST_SUPPORT = CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL,

```

 

0 Kudos
7 Replies
AbhishekD_Intel
Moderator
463 Views

Hi Laurent,


Please confirm for us that have you tried the below openmp_reduction OneAPI Sample.

https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2B/ParallelPatterns/...


We also tried this sample using the same HPC toolkit version you are using and not getting errors as mentioned by you. We can even generate an executable.

Please try running some other samples and let us know if you are getting the same errors with them. It seems that this issue is specific to your environment.


Update us with the findings so that we will get more details for this issue.



Warm Regards,

Abhishek


LaurentPlagne
Novice
459 Views

Dear Abhishek, thank you for your message.

I did pull the latest version of oneAPI samples. I was able to compile and run this on the devcloud and I only have the problem on my local machine. I also believe that is is an environment problem (although dpcpp examples work fine) and I suspect that it may be related to opencl installation (driver ?).

 

I hoped that my error message may help you to help me to spot the problem...

 

use of undeclared identifier 'CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL' 

 

LaurentPlagne
Novice
429 Views

Hi, I just resinstalled a fresh system and I can compile open_mp reduction.

Unfortunately,  the execution gets stuck (I use Ctrl-C after 1 min) :

./src/openmp_reduction 
Number of steps is 1000000

 

Note that, dpcpp samples run properly.

Can this be related to opencl driver issue (the level0 being fine) ?

AbhishekD_Intel
Moderator
423 Views

Hi Laurent,

 

Regarding your 1st question (use of undeclared identifier 'CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL') there might be some installation problem in your environment with your toolkit. So please try reinstalling the toolkits. After installing the toolkit also install the latest GPU drivers. Please refer to the below link for installing GPU software packages.

https://dgpu-docs.intel.com/installation-guides/index.html

 

For the 2nd question (the execution gets stuck), we are also getting the same issue while running it on Intel GPU using the default Level0 runtime. The issue does not seem to be coming from OpenCL drivers, it is more related to Level0. So for now you may try using OpenCL runtime as a workaround to offload it on GPU and let us know if it's working for you.

Please export LIBOMPTARGET_PLUGIN=OPENCL for using OpenCL runtime for GPU offloads.

Meanwhile, we will look into the issue of Level0 and will let you know.

 

Also, please send us your debug logs for getting more insight, export LIBOMPTARGET_DEBUG=2, and LIBOMPTARGET_PLUGIN=LEVEL0 before the execution for logs.

 

 

Warm Regards,

Abhishek

 

LaurentPlagne
Novice
387 Views

Thank you for your answer.

 

The OpenCL driver works (although th speed up is not impressive) :

 

formation6@formation6-Inspiron-7590:~/Projects/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/build$ export LIBOMPTARGET_PLUGIN=OPENCL
formation6@formation6-Inspiron-7590:~/Projects/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/build$ make run
Number of steps is 1000000
Cpu Seq calc: 		PI =3.14 in 0.00105 seconds
Host OpenMP:		PI =3.14 in 0.000873 seconds
Offload OpenMP:		PI =3.14 in 0.000889 seconds
success
Built target run

 

Here are the debug logs with the Level0 driver :

 

ormation6@formation6-Inspiron-7590:~/Projects/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/build$ export LIBOMPTARGET_PLUGIN=LEVEL0
formation6@formation6-Inspiron-7590:~/Projects/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/build$ export LIBOMPTARGET_DEBUG=2
formation6@formation6-Inspiron-7590:~/Projects/oneAPI-samples/DirectProgramming/C++/ParallelPatterns/openmp_reduction/build$ make run
Libomptarget --> TargetOffloadPolicy = DEFAULT
Libomptarget --> Initialized OMPT
Libomptarget --> Loading RTLs...
Libomptarget --> Checking user-specified plugin 'libomptarget.rtl.level0.so'...
Libomptarget --> Loading library 'libomptarget.rtl.level0.so'...
Target LEVEL0 RTL --> Target device type is set to GPU
Target LEVEL0 RTL --> omp_get_thread_limit() returned 2147483647
Target LEVEL0 RTL --> omp_get_max_teams() returned 0
Libomptarget --> Successfully loaded library 'libomptarget.rtl.level0.so'!
Libomptarget --> Optional interface: __tgt_rtl_data_alloc_base
Libomptarget --> Optional interface: __tgt_rtl_data_alloc_user
Libomptarget --> Optional interface: __tgt_rtl_data_alloc_explicit
Libomptarget --> Optional interface: __tgt_rtl_data_alloc_managed
Libomptarget --> Optional interface: __tgt_rtl_data_delete_managed
Libomptarget --> Optional interface: __tgt_rtl_data_submit_nowait
Libomptarget --> Optional interface: __tgt_rtl_data_retrieve_nowait
Libomptarget --> Optional interface: __tgt_rtl_create_buffer
Libomptarget --> Optional interface: __tgt_rtl_release_buffer
Libomptarget --> Optional interface: __tgt_rtl_create_offload_queue
Libomptarget --> Optional interface: __tgt_rtl_release_offload_queue
Libomptarget --> Optional interface: __tgt_rtl_get_platform_handle
Libomptarget --> Optional interface: __tgt_rtl_get_device_handle
Libomptarget --> Optional interface: __tgt_rtl_init_ompt
Libomptarget --> Optional interface: __tgt_rtl_is_managed_ptr
Libomptarget --> Optional interface: __tgt_rtl_manifest_data_for_region
Libomptarget --> Optional interface: __tgt_rtl_run_target_team_nd_region
Libomptarget --> Optional interface: __tgt_rtl_run_target_region_nowait
Libomptarget --> Optional interface: __tgt_rtl_run_target_team_region_nowait
Libomptarget --> Optional interface: __tgt_rtl_run_target_team_nd_region_nowait
Target LEVEL0 RTL --> Looking for Level0 devices...
Target LEVEL0 RTL --> ZE_CALLER: zeInit ( ZE_INIT_FLAG_GPU_ONLY )
Target LEVEL0 RTL --> ZE_CALLEE: zeInit (
Target LEVEL0 RTL -->     flags = 1
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Initialized L0, API 65536
Target LEVEL0 RTL --> ZE_CALLER: zeDriverGet ( &numDrivers, nullptr )
Target LEVEL0 RTL --> ZE_CALLEE: zeDriverGet (
Target LEVEL0 RTL -->     pCount = 0x00007ffdd3e7d1b4
Target LEVEL0 RTL -->     phDrivers = 0x0000000000000000
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> ZE_CALLER: zeDriverGet ( &numDrivers, driverHandles.data() )
Target LEVEL0 RTL --> ZE_CALLEE: zeDriverGet (
Target LEVEL0 RTL -->     pCount = 0x00007ffdd3e7d1b4
Target LEVEL0 RTL -->     phDrivers = 0x0000000001460730
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Found 1 driver(s)!
Target LEVEL0 RTL --> ZE_CALLER: zeDeviceGet ( driverHandles[i], &numDevices, nullptr )
Target LEVEL0 RTL --> ZE_CALLEE: zeDeviceGet (
Target LEVEL0 RTL -->     hDriver = 0x0000000001460750
Target LEVEL0 RTL -->     pCount = 0x00007ffdd3e7d1d4
Target LEVEL0 RTL -->     phDevices = 0x0000000000000000
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> ZE_CALLER: zeDeviceGet ( driverHandles[i], &numDevices, devices.data() )
Target LEVEL0 RTL --> ZE_CALLEE: zeDeviceGet (
Target LEVEL0 RTL -->     hDriver = 0x0000000001460750
Target LEVEL0 RTL -->     pCount = 0x00007ffdd3e7d1d4
Target LEVEL0 RTL -->     phDevices = 0x00000000016863a0
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> ZE_CALLER: zeDeviceGetProperties ( device, &properties )
Target LEVEL0 RTL --> ZE_CALLEE: zeDeviceGetProperties (
Target LEVEL0 RTL -->     hDevice = 0x0000000001460910
Target LEVEL0 RTL -->     pDeviceProperties = 0x00007ffdd3e7d250
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Found a GPU device, Name = Intel(R) Graphics [0x3e9b]
Target LEVEL0 RTL --> ZE_CALLER: zeDeviceGetComputeProperties ( device, &computeProperties )
Target LEVEL0 RTL --> ZE_CALLEE: zeDeviceGetComputeProperties (
Target LEVEL0 RTL -->     hDevice = 0x0000000001460910
Target LEVEL0 RTL -->     pComputeProperties = 0x00007ffdd3e7d1f8
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Found 1 available devices.
Target LEVEL0 RTL --> ZE_CALLER: zeContextCreate ( Driver, &contextDesc, &context )
Target LEVEL0 RTL --> ZE_CALLEE: zeContextCreate (
Target LEVEL0 RTL -->     hDriver = 0x0000000001460750
Target LEVEL0 RTL -->     desc = 0x00007ffdd3e7d250
Target LEVEL0 RTL -->     phContext = 0x00007ffdd3e7d1f8
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Initialized OMPT
Libomptarget --> Registering RTL libomptarget.rtl.level0.so supporting 1 devices!
Libomptarget --> RTLs loaded!
Target LEVEL0 RTL --> Target binary is VALID
Libomptarget --> Image 0x000000000040c9a0 is compatible with RTL libomptarget.rtl.level0.so!
Libomptarget --> RTL 0x00000000013480f0 has index 0!
Libomptarget --> Registering image 0x000000000040c9a0 with RTL libomptarget.rtl.level0.so!
Libomptarget --> Done registering entries!
Number of steps is 1000000
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)
Libomptarget --> Entering target region with entry point 0x000000000040c0a0 and device Id -1
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target LEVEL0 RTL --> Initialize requires flags to 1
Target LEVEL0 RTL --> Initialized Level0 device 0
Libomptarget --> Device 0 is ready to use.
Target LEVEL0 RTL --> Device 0: Loading binary from 0x000000000040c9a0
Target LEVEL0 RTL --> Expecting to have 1 entries defined
Target LEVEL0 RTL --> Module compilation options: -cl-std=CL2.0 
Target LEVEL0 RTL --> ZE_CALLER: zeModuleCreate ( Context, Device, &moduleDesc, &module, &buildLog )
Target LEVEL0 RTL --> ZE_CALLEE: zeModuleCreate (
Target LEVEL0 RTL -->     hContext = 0x00000000016866a0
Target LEVEL0 RTL -->     hDevice = 0x0000000001460910
Target LEVEL0 RTL -->     desc = 0x00007ffdd3e7d5a0
Target LEVEL0 RTL -->     phModule = 0x00007ffdd3e7d570
Target LEVEL0 RTL -->     phBuildLog = 0x00007ffdd3e7d530
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> ZE_CALLER: zeModuleBuildLogDestroy ( buildLog )
Target LEVEL0 RTL --> ZE_CALLEE: zeModuleBuildLogDestroy (
Target LEVEL0 RTL -->     hModuleBuildLog = 0x0000000001686d20
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Looking up device global variable '__omp_offloading_entries_table_size' of size 8 bytes on device 0.
Target LEVEL0 RTL --> ZE_CALLER: zeModuleGetGlobalPointer ( FuncGblEntries[DeviceId].Modules[0], Name, &TgtSize, &TgtAddr )
Target LEVEL0 RTL --> ZE_CALLEE: zeModuleGetGlobalPointer (
Target LEVEL0 RTL -->     hModule = 0x00000000016a2220
Target LEVEL0 RTL -->     pGlobalName = 0x00007f5bc9c32137
Target LEVEL0 RTL -->     pSize = 0x00007ffdd3e7d3f8
Target LEVEL0 RTL -->     pptr = 0x00007ffdd3e7d400
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Warning: requested size 8 does not match 0
Target LEVEL0 RTL --> Global variable lookup succeeded.
Target LEVEL0 RTL --> Copied 8 bytes (tgt:0x00000000024ef068) -> (hst:0x00007ffdd3e7d490)
Target LEVEL0 RTL --> Looking up device global variable '__omp_offloading_entries_table' of size 40 bytes on device 0.
Target LEVEL0 RTL --> ZE_CALLER: zeModuleGetGlobalPointer ( FuncGblEntries[DeviceId].Modules[0], Name, &TgtSize, &TgtAddr )
Target LEVEL0 RTL --> ZE_CALLEE: zeModuleGetGlobalPointer (
Target LEVEL0 RTL -->     hModule = 0x00000000016a2220
Target LEVEL0 RTL -->     pGlobalName = 0x00007f5bc9c32223
Target LEVEL0 RTL -->     pSize = 0x00007ffdd3e7d3f8
Target LEVEL0 RTL -->     pptr = 0x00007ffdd3e7d400
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> Warning: requested size 40 does not match 0
Target LEVEL0 RTL --> Global variable lookup succeeded.
Target LEVEL0 RTL --> Copied 40 bytes (tgt:0x00000000024ef040) -> (hst:0x00000000029d6160)
Target LEVEL0 RTL --> Copied 62 bytes (tgt:0x00000000024ef000) -> (hst:0x0000000001686330)
Target LEVEL0 RTL --> Device offload table loaded:
Target LEVEL0 RTL --> 	0:	__omp_offloading_10302_1ac0ffd__Z21openmp_device_calc_pii_l60
Target LEVEL0 RTL --> ZE_CALLER: zeKernelCreate ( mainModule, &kernelDesc, &kernels[i] )
Target LEVEL0 RTL --> ZE_CALLEE: zeKernelCreate (
Target LEVEL0 RTL -->     hModule = 0x00000000016a2220
Target LEVEL0 RTL -->     desc = 0x00007ffdd3e7d550
Target LEVEL0 RTL -->     phKernel = 0x00000000024e5260
Target LEVEL0 RTL --> )
Target LEVEL0 RTL --> ZE_CALLER: zeKernelGetProperties ( kernels[i], &kernelProperties )
Target LEVEL0 RTL --> ZE_CALLEE: zeKernelGetProperties (
Target LEVEL0 RTL -->     hKernel = 0x0000000001babf40
Target LEVEL0 RTL -->     pKernelProperties = 0x00007ffdd3e7d5a0
Target LEVEL0 RTL --> )
^Cmake[3]: *** [src/CMakeFiles/run.dir/build.make:57: src/CMakeFiles/run] Interrompre
make[2]: *** [CMakeFiles/Makefile2:95: src/CMakeFiles/run.dir/all] Interrompre
make[1]: *** [CMakeFiles/Makefile2:102: src/CMakeFiles/run.dir/rule] Interrompre
make: *** [Makefile:118: run] Interrompre

 

Laurent

 

AbhishekD_Intel
Moderator
376 Views

Hi Laurent,


Thanks for the confirmation.

We have already reported that issue of Level0 to the concerned team and it will get fixed very soon.


Please confirm if we can stop monitoring this thread? You are always welcome to post a new thread if you have any other issues.



Warm Regards,

Abhishek


AbhishekD_Intel
Moderator
350 Views

Hi,


As your issue related to this thread is resolved we will no longer monitor this thread. Please post a new thread if you have any other issues.


Warm Regards,

Abhishek


Reply