Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
790 Discussions

Unable to offload to integrated or dedicated GPUs, number of devices returns 0

Alexandre1999
Beginner
749 Views

Hi, I was experimenting with the icpx compiler and openMP, attempting to offload a simple workload to either an integrated GPU (Intel® UHD Graphics 750, part of a Core i7-11700) or a dedicated GPU (an RTX 4070ti), but have been unsuccessful in both. My distro is Linux Mint 21 (Vanessa)

 

Following tutorial instructions, I made a simple SAXPY program, and I compile it as such, making sure to enable the openMP features:

 

 

 

icpx -fiopenmp -fopenmp-targets=spir64 -g saxpy.cpp -o saxpy

 

 

Given the target is set to spir64, I expect it to be offloaded to the dedicated GPU.

I believe the issue isn't there, but here is the source code:

 

 

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

#define SAXPY_SIZE 1024*1024*1024

void saxpy(float a, float* x, float* y, int sz) {
    double runtime = omp_get_wtime();

    #pragma omp target device(1) map(to:x[0:sz]) map(tofrom:y[0:sz])
    {
        #pragma omp parallel for simd firstprivate(a)
        for (size_t i = 0; i < sz; i++)
        {
            y[i] = a * x[i] + y[i];
        }
        printf("%d\n", omp_is_initial_device());
    }

    runtime = omp_get_wtime() - runtime;
    printf("%8d kb SAXPY Runtime: %lf\n", sz/1024, runtime);
}

int main(void) {
    float a, *x, *y;
    x = (float*)malloc(sizeof(float) * SAXPY_SIZE);
    y = (float*)malloc(sizeof(float) * SAXPY_SIZE);

    printf("%d devices\n", omp_get_num_devices());

    for (size_t i = 4096; i <= SAXPY_SIZE; i *= 2)
    {
        saxpy(1.0, x, y, i);
    }
    
    return 0;
}

 

 

omp_get_num_devices() returns 0, and indeed the omp_is_initial_device() call returns 1, indicating the code ends up running on the CPU. Furthermore, setting the OMP_TARGET_OFFLOAD env variable to MANDATORY makes the program crash, again indicating the workload is not offloaded. I have set the debug env variable and gotten more detailed debug information:

 

 

omptarget --> Init offload library!
OMPT --> Entering connectLibrary (libomp)
OMPT --> OMPT: Trying to load library libiomp5.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x7fc4ebb003c0
omptarget --> Callback to __tgt_register_ptask_services with handlers 0x00007fc4ebaeef00 0x00007fc4ebaee7c0
OMPT --> Exiting connectLibrary (libomp)
omptarget --> Loading RTLs...
omptarget --> Attempting to load library 'libomptarget.rtl.level0.so'...
omptarget --> Unable to load library 'libomptarget.rtl.level0.so': libze_loader.so.1: cannot open shared object file: No such file or directory!
omptarget --> Attempting to load library 'libomptarget.rtl.opencl.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.opencl.so'!
Target OPENCL RTL --> Init OpenCL plugin!
Target OPENCL RTL --> Target device type is set to GPU
Target OPENCL RTL --> OMPT: Entering connectLibrary (libomptarget)
OMPT --> OMPT: Trying to load library libomptarget.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomptarget_connect
OMPT --> OMPT: Library connection handle = 0x7fc4eb64c610
OMPT --> Enter ompt_libomptarget_connect
OMPT --> Leave ompt_libomptarget_connect
Target OPENCL RTL --> OMPT: Exiting connectLibrary (libomptarget)
Target OPENCL RTL --> Start initializing OpenCL
Target OPENCL RTL --> Platform OpenCL 3.0 CUDA 12.3.68 has 1 Devices
Target OPENCL RTL --> Warning: Extension clGetMemAllocInfoINTEL is not found.
Target OPENCL RTL --> Warning: Extension clHostMemAllocINTEL is not found.
Target OPENCL RTL --> Warning: Extension clDeviceMemAllocINTEL is not found.
Target OPENCL RTL --> Warning: Extension clSharedMemAllocINTEL is not found.
Target OPENCL RTL --> Warning: Extension clMemFreeINTEL is not found.
Target OPENCL RTL --> Warning: Extension clMemBlockingFreeINTEL is not found.
Target OPENCL RTL --> Warning: Extension clSetKernelArgMemPointerINTEL is not found.
Target OPENCL RTL --> Warning: Extension clEnqueueMemcpyINTEL is not found.
Target OPENCL RTL --> Warning: Extension clEnqueueMemFillINTEL is not found.
Target OPENCL RTL --> Warning: Extension clGetDeviceGlobalVariablePointerINTEL is not found.
Target OPENCL RTL --> Warning: Extension clGetKernelSuggestedLocalWorkSizeKHR is not found.
Target OPENCL RTL --> Warning: Extension clGitsIndirectAllocationOffsets is not found.
omptarget --> Registered 'libomptarget.rtl.opencl.so' with 1 plugin visible devices!
omptarget --> Attempting to load library 'libomptarget.rtl.x86_64.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
omptarget --> Registered 'libomptarget.rtl.x86_64.so' with 4 plugin visible devices!
omptarget --> RTLs loaded!
Target OPENCL RTL --> Target binary is a valid oneAPI OpenMP image.
omptarget --> Image 0x0000000000402140 is compatible with RTL libomptarget.rtl.opencl.so!
Target OPENCL RTL --> Initialize requires flags to 1
Target OPENCL RTL --> Initialize OpenCL device
Target OPENCL RTL --> Getting extensions for device 0
Target OPENCL RTL --> Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
Target OPENCL RTL --> Extension clGetDeviceGlobalVariablePointerINTEL enabled.
Target OPENCL RTL --> Error: Required USM extension is not found
omptarget --> Skip plugin known device 0: Failed to initialize device 0

omptarget --> Plugin adaptor 0x00000000016281c0 has index 0, exposes 0 out of 1 devices!
omptarget --> Registering image 0x0000000000402140 with RTL libomptarget.rtl.opencl.so!
omptarget --> Done registering entries!
omptarget --> Call to omp_get_num_devices returning 0
0 devices
omptarget --> Call to omp_get_num_devices returning 0
omptarget --> Entering target region for device 1 with entry point 0x0000000000402050
omptarget --> Call to omp_get_num_devices returning 0
omptarget --> omp_get_num_devices() == 0 but offload is manadatory
omptarget error: Run with
omptarget error: LIBOMPTARGET_DEBUG=1 to display basic debug information.
omptarget error: LIBOMPTARGET_DEBUG=2 to display calls to the compute runtime.
omptarget error: LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
make: *** [Makefile:8: run] Aborted (core dumped)

 

 

I believe the crucial part is lines 48-49, which claim a required USM extension is not found. I more or less get what this means, but I have no clue how to address it.

As that didn't work, I attempted to target the integrated GPU instead, by setting -fopenmp-targets to spir64_gen. This first prompted me to install ocloc, after which another error is shown instead:

 

 

$ icpx -fiopenmp -fopenmp-targets=spir64_gen -g saxpy.cpp -o saxpy
icpx: remark: Note that use of '-g' without any optimization-level option will turn off most compiler optimizations similar to use of '-O0'; use '-Rno-debug-disables-optimization' to disable this remark [-Rdebug-disables-optimization]
Error: Device name missing.
Command was: /usr/bin/ocloc -output /tmp/icpx-e2b4119f67/saxpy-9d0072.out -file /tmp/icpx-e2b4119f67/saxpy-31c585.spv -output_no_suffix -spirv_input -options "-g -cl-take-global-address -cl-match-sincospi"
icpx: error: gen compiler command failed with exit code 226 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.2/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.2/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).
make: *** [Makefile:2: saxpy] Error 1

 

 

Again, I am not sure how to proceed.

I would like to get at least one of the two devices working, and would preferably like to experiment with both. I appreciate any help you can provide, and am available to provide additional information.

0 Kudos
1 Reply
Alex_Y_Intel
Moderator
659 Views

1. Your dedicated GPU (an RTX 4070ti) is a Nvidia GPU, it won't work. Your error message says it can't find level zero library. 

2. If you want to use -fopenmp-targets=spir64_gen, it means you're trying to use AOT compilation so you need to give it a target device name -Xs "-device <device name>" 

ex:

JIT compilation
icpx –fiopenmp –fopenmp-targets=spir64 source.cpp

AOT compilation
icpx -fiopenmp -fopenmp-targets=spir64_gen -Xs "-device <dev>” src.cpp

 

<dev> is your target, use ‘ocloc compile --help’ for list of targets

 

3. I tried your code on both Intel integrated and discrete GPUs, and it's compiled and run fine. 

 

0 Kudos
Reply