Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
804 Discussions

Unable to offload to integrated or dedicated GPUs, number of devices returns 0

Alexandre1999
Beginner
785 Views

Hi, I was experimenting with the icpx compiler and openMP, attempting to offload a simple workload to either an integrated GPU (Intel® UHD Graphics 750, part of a Core i7-11700) or a dedicated GPU (an RTX 4070ti), but have been unsuccessful in both. My distro is Linux Mint 21 (Vanessa)

 

Following tutorial instructions, I made a simple SAXPY program, and I compile it as such, making sure to enable the openMP features:

 

 

 

icpx -fiopenmp -fopenmp-targets=spir64 -g saxpy.cpp -o saxpy

 

 

Given the target is set to spir64, I expect it to be offloaded to the dedicated GPU.

I believe the issue isn't there, but here is the source code:

 

 

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

#define SAXPY_SIZE 1024*1024*1024

void saxpy(float a, float* x, float* y, int sz) {
    double runtime = omp_get_wtime();

    #pragma omp target device(1) map(to:x[0:sz]) map(tofrom:y[0:sz])
    {
        #pragma omp parallel for simd firstprivate(a)
        for (size_t i = 0; i < sz; i++)
        {
            y[i] = a * x[i] + y[i];
        }
        printf("%d\n", omp_is_initial_device());
    }

    runtime = omp_get_wtime() - runtime;
    printf("%8d kb SAXPY Runtime: %lf\n", sz/1024, runtime);
}

int main(void) {
    float a, *x, *y;
    x = (float*)malloc(sizeof(float) * SAXPY_SIZE);
    y = (float*)malloc(sizeof(float) * SAXPY_SIZE);

    printf("%d devices\n", omp_get_num_devices());

    for (size_t i = 4096; i <= SAXPY_SIZE; i *= 2)
    {
        saxpy(1.0, x, y, i);
    }
    
    return 0;
}

 

 

omp_get_num_devices() returns 0, and indeed the omp_is_initial_device() call returns 1, indicating the code ends up running on the CPU. Furthermore, setting the OMP_TARGET_OFFLOAD env variable to MANDATORY makes the program crash, again indicating the workload is not offloaded. I have set the debug env variable and gotten more detailed debug information:

 

 

omptarget --> Init offload library!
OMPT --> Entering connectLibrary (libomp)
OMPT --> OMPT: Trying to load library libiomp5.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x7fc4ebb003c0
omptarget --> Callback to __tgt_register_ptask_services with handlers 0x00007fc4ebaeef00 0x00007fc4ebaee7c0
OMPT --> Exiting connectLibrary (libomp)
omptarget --> Loading RTLs...
omptarget --> Attempting to load library 'libomptarget.rtl.level0.so'...
omptarget --> Unable to load library 'libomptarget.rtl.level0.so': libze_loader.so.1: cannot open shared object file: No such file or directory!
omptarget --> Attempting to load library 'libomptarget.rtl.opencl.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.opencl.so'!
Target OPENCL RTL --> Init OpenCL plugin!
Target OPENCL RTL --> Target device type is set to GPU
Target OPENCL RTL --> OMPT: Entering connectLibrary (libomptarget)
OMPT --> OMPT: Trying to load library libomptarget.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomptarget_connect
OMPT --> OMPT: Library connection handle = 0x7fc4eb64c610
OMPT --> Enter ompt_libomptarget_connect
OMPT --> Leave ompt_libomptarget_connect
Target OPENCL RTL --> OMPT: Exiting connectLibrary (libomptarget)
Target OPENCL RTL --> Start initializing OpenCL
Target OPENCL RTL --> Platform OpenCL 3.0 CUDA 12.3.68 has 1 Devices
Target OPENCL RTL --> Warning: Extension clGetMemAllocInfoINTEL is not found.
Target OPENCL RTL --> Warning: Extension clHostMemAllocINTEL is not found.
Target OPENCL RTL --> Warning: Extension clDeviceMemAllocINTEL is not found.
Target OPENCL RTL --> Warning: Extension clSharedMemAllocINTEL is not found.
Target OPENCL RTL --> Warning: Extension clMemFreeINTEL is not found.
Target OPENCL RTL --> Warning: Extension clMemBlockingFreeINTEL is not found.
Target OPENCL RTL --> Warning: Extension clSetKernelArgMemPointerINTEL is not found.
Target OPENCL RTL --> Warning: Extension clEnqueueMemcpyINTEL is not found.
Target OPENCL RTL --> Warning: Extension clEnqueueMemFillINTEL is not found.
Target OPENCL RTL --> Warning: Extension clGetDeviceGlobalVariablePointerINTEL is not found.
Target OPENCL RTL --> Warning: Extension clGetKernelSuggestedLocalWorkSizeKHR is not found.
Target OPENCL RTL --> Warning: Extension clGitsIndirectAllocationOffsets is not found.
omptarget --> Registered 'libomptarget.rtl.opencl.so' with 1 plugin visible devices!
omptarget --> Attempting to load library 'libomptarget.rtl.x86_64.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
omptarget --> Registered 'libomptarget.rtl.x86_64.so' with 4 plugin visible devices!
omptarget --> RTLs loaded!
Target OPENCL RTL --> Target binary is a valid oneAPI OpenMP image.
omptarget --> Image 0x0000000000402140 is compatible with RTL libomptarget.rtl.opencl.so!
Target OPENCL RTL --> Initialize requires flags to 1
Target OPENCL RTL --> Initialize OpenCL device
Target OPENCL RTL --> Getting extensions for device 0
Target OPENCL RTL --> Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
Target OPENCL RTL --> Extension clGetDeviceGlobalVariablePointerINTEL enabled.
Target OPENCL RTL --> Error: Required USM extension is not found
omptarget --> Skip plugin known device 0: Failed to initialize device 0

omptarget --> Plugin adaptor 0x00000000016281c0 has index 0, exposes 0 out of 1 devices!
omptarget --> Registering image 0x0000000000402140 with RTL libomptarget.rtl.opencl.so!
omptarget --> Done registering entries!
omptarget --> Call to omp_get_num_devices returning 0
0 devices
omptarget --> Call to omp_get_num_devices returning 0
omptarget --> Entering target region for device 1 with entry point 0x0000000000402050
omptarget --> Call to omp_get_num_devices returning 0
omptarget --> omp_get_num_devices() == 0 but offload is manadatory
omptarget error: Run with
omptarget error: LIBOMPTARGET_DEBUG=1 to display basic debug information.
omptarget error: LIBOMPTARGET_DEBUG=2 to display calls to the compute runtime.
omptarget error: LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
make: *** [Makefile:8: run] Aborted (core dumped)

 

 

I believe the crucial part is lines 48-49, which claim a required USM extension is not found. I more or less get what this means, but I have no clue how to address it.

As that didn't work, I attempted to target the integrated GPU instead, by setting -fopenmp-targets to spir64_gen. This first prompted me to install ocloc, after which another error is shown instead:

 

 

$ icpx -fiopenmp -fopenmp-targets=spir64_gen -g saxpy.cpp -o saxpy
icpx: remark: Note that use of '-g' without any optimization-level option will turn off most compiler optimizations similar to use of '-O0'; use '-Rno-debug-disables-optimization' to disable this remark [-Rdebug-disables-optimization]
Error: Device name missing.
Command was: /usr/bin/ocloc -output /tmp/icpx-e2b4119f67/saxpy-9d0072.out -file /tmp/icpx-e2b4119f67/saxpy-31c585.spv -output_no_suffix -spirv_input -options "-g -cl-take-global-address -cl-match-sincospi"
icpx: error: gen compiler command failed with exit code 226 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.2/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.2/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).
make: *** [Makefile:2: saxpy] Error 1

 

 

Again, I am not sure how to proceed.

I would like to get at least one of the two devices working, and would preferably like to experiment with both. I appreciate any help you can provide, and am available to provide additional information.

0 Kudos
1 Reply
Alex_Y_Intel
Moderator
695 Views

1. Your dedicated GPU (an RTX 4070ti) is a Nvidia GPU, it won't work. Your error message says it can't find level zero library. 

2. If you want to use -fopenmp-targets=spir64_gen, it means you're trying to use AOT compilation so you need to give it a target device name -Xs "-device <device name>" 

ex:

JIT compilation
icpx –fiopenmp –fopenmp-targets=spir64 source.cpp

AOT compilation
icpx -fiopenmp -fopenmp-targets=spir64_gen -Xs "-device <dev>” src.cpp

 

<dev> is your target, use ‘ocloc compile --help’ for list of targets

 

3. I tried your code on both Intel integrated and discrete GPUs, and it's compiled and run fine. 

 

0 Kudos
Reply