- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I am trying to learn how to compile and run GPU code using the the Intel fortran compiler ifx together with OpenMP. I compile the following program
program GPUtest
implicit none
!$omp target
!$omp end target
end program
with ifx version 2024.0 like so
ifx -o GPUtest.x GPUtest -qopenmp -fopenmp-targets=spir64
and turn on the debug environment variable and run the program
export LIBOMPTARGET_DEBUG=1
./GPUtest.x
I get
Libomptarget --> Init target library!
Libomptarget --> Callback to __tgt_register_ptask_services with handlers 0x00001484d1b14640 0x00001484d1b148c0
Libomptarget --> Initialized OMPT
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.level0.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.level0.so': libze_loader.so.1: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.opencl.so'...
Target OPENCL RTL --> Init OpenCL plugin!
Target OPENCL RTL --> Target device type is set to GPU
Libomptarget --> Successfully loaded library 'libomptarget.rtl.opencl.so'!
Target OPENCL RTL --> Start initializing OpenCL
Target OPENCL RTL --> Platform OpenCL 3.0 has 1 Devices
Target OPENCL RTL --> Extension clGetMemAllocInfoINTEL is found.
Target OPENCL RTL --> Extension clHostMemAllocINTEL is found.
Target OPENCL RTL --> Extension clDeviceMemAllocINTEL is found.
Target OPENCL RTL --> Extension clSharedMemAllocINTEL is found.
Target OPENCL RTL --> Extension clMemFreeINTEL is found.
Target OPENCL RTL --> Extension clMemBlockingFreeINTEL is found.
Target OPENCL RTL --> Extension clSetKernelArgMemPointerINTEL is found.
Target OPENCL RTL --> Extension clEnqueueMemcpyINTEL is found.
Target OPENCL RTL --> Extension clSetProgramSpecializationConstant is found.
Target OPENCL RTL --> Extension clGetDeviceGlobalVariablePointerINTEL is found.
Target OPENCL RTL --> Extension clGetKernelSuggestedLocalWorkSizeINTEL is found.
Target OPENCL RTL --> Warning: Extension clGitsIndirectAllocationOffsets is not found.
Libomptarget --> Registering RTL libomptarget.rtl.opencl.so supporting 1 devices!
Libomptarget --> Optional interface: __tgt_rtl_data_alloc_base
Libomptarget --> Optional interface: __tgt_rtl_data_realloc
Libomptarget --> Optional interface: __tgt_rtl_data_aligned_alloc
Libomptarget --> Optional interface: __tgt_rtl_get_device_name
Libomptarget --> Optional interface: __tgt_rtl_get_context_handle
Libomptarget --> Optional interface: __tgt_rtl_get_data_alloc_info
Libomptarget --> Optional interface: __tgt_rtl_init_ompt
Libomptarget --> Optional interface: __tgt_rtl_requires_mapping
Libomptarget --> Optional interface: __tgt_rtl_manifest_data_for_region
Libomptarget --> Optional interface: __tgt_rtl_add_build_options
Libomptarget --> Optional interface: __tgt_rtl_is_supported_device
Libomptarget --> Optional interface: __tgt_rtl_create_interop
Libomptarget --> Optional interface: __tgt_rtl_release_interop
Libomptarget --> Optional interface: __tgt_rtl_use_interop
Libomptarget --> Optional interface: __tgt_rtl_get_num_interop_properties
Libomptarget --> Optional interface: __tgt_rtl_get_interop_property_value
Libomptarget --> Optional interface: __tgt_rtl_get_interop_property_info
Libomptarget --> Optional interface: __tgt_rtl_get_interop_rc_desc
Libomptarget --> Optional interface: __tgt_rtl_is_accessible_addr_range
Libomptarget --> Optional interface: __tgt_rtl_notify_indirect_access
Libomptarget --> Optional interface: __tgt_rtl_is_private_arg_on_host
Libomptarget --> Optional interface: __tgt_rtl_set_function_ptr_map
Libomptarget --> Optional interface: __tgt_rtl_run_target_team_nd_region
Libomptarget --> Optional interface: __tgt_rtl_get_device_info
Libomptarget --> Optional interface: __tgt_rtl_get_device_from_ptr
Libomptarget --> Optional interface: __tgt_rtl_flush_queue
Libomptarget --> Optional interface: __tgt_rtl_sync_barrier
Libomptarget --> Optional interface: __tgt_rtl_async_barrier
Target OPENCL RTL --> Initialized OMPT
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
Libomptarget --> Optional interface: __tgt_rtl_data_alloc_base
Libomptarget --> Optional interface: __tgt_rtl_requires_mapping
Libomptarget --> Optional interface: __tgt_rtl_set_function_ptr_map
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.cuda.so': libomptarget.rtl.cuda.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.ve.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ve.so': libomptarget.rtl.ve.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.amdgpu.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.amdgpu.so': libomptarget.rtl.amdgpu.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.rpc.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.rpc.so': libomptarget.rtl.rpc.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Target OPENCL RTL --> Target binary is a valid oneAPI OpenMP image.
Libomptarget --> Image 0x000000000047ff10 is compatible with RTL libomptarget.rtl.opencl.so!
Libomptarget --> RTL 0x0000000001aa9580 has index 0!
Libomptarget --> Registering image 0x000000000047ff10 with RTL libomptarget.rtl.opencl.so!
Libomptarget --> Done registering entries!
Libomptarget --> Entering target region for device 0 with entry point 0x000000000046a010
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Call to omp_get_initial_device returning 1
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target OPENCL RTL --> Initialize requires flags to 0
Target OPENCL RTL --> Initialize OpenCL device
Target OPENCL RTL --> Getting extensions for device 0
Target OPENCL RTL --> Device extensions: cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
Target OPENCL RTL --> Extension UnifiedSharedMemory enabled.
Target OPENCL RTL --> Extension DeviceAttributeQuery enabled.
Target OPENCL RTL --> Extension clGetDeviceGlobalVariablePointerINTEL enabled.
Target OPENCL RTL --> Extension clGetKernelSuggestedLocalWorkSizeINTEL enabled.
Target OPENCL RTL --> Device Properties:
Target OPENCL RTL --> -- Name : Intel(R) HD Graphics 620
Target OPENCL RTL --> -- PCI ID : 0x5916
Target OPENCL RTL --> -- Number of total EUs : 24
Target OPENCL RTL --> -- Number of threads per EU : 7
Target OPENCL RTL --> -- Number of EUs per subslice : 8
Target OPENCL RTL --> -- Number of subslices per slice: 3
Target OPENCL RTL --> -- Number of slices : 1
Target OPENCL RTL --> -- Local memory size (bytes) : 65536
Target OPENCL RTL --> -- Global memory size (bytes) : 13226147840
Target OPENCL RTL --> -- Cache size (bytes) : 786432
Target OPENCL RTL --> -- Max clock frequency (MHz) : 1050
Target OPENCL RTL --> -- Max workgroup size : 256
Target OPENCL RTL --> -- Max allocation size (bytes) : 4294959104
Libomptarget --> Device 0 is ready to use.
Target OPENCL RTL --> Device 0: Loading binary from 0x000000000047ff10
Target OPENCL RTL --> Expecting to have 1 entries defined
Target OPENCL RTL --> Base OpenCL compilation options: -cl-std=CL2.0
Target OPENCL RTL --> Base OpenCL linking options:
Target OPENCL RTL --> Created offload program from image #0.
Target OPENCL RTL --> Successfully linked 1 programs.
Target OPENCL RTL --> Warning: number of entries in host and device offload tables mismatch (1 != 3).
Target OPENCL RTL --> Device offload table loaded:
Target OPENCL RTL --> 0: _ZL14name_val_table_a630efcc66b7fb2188ebba28026c9b4c
Target OPENCL RTL --> 1: _ZL7pone_ld_929edd6e1a86c68f9f4403ca5efd8d7a
Target OPENCL RTL --> 2: __omp_offloading_10302_108040b_MAIN___l3
Target OPENCL RTL --> Kernel 0: Name = __omp_offloading_10302_108040b_MAIN___l3, NumArgs = 0
Libomptarget --> loop trip count is 0.
Libomptarget --> Launching target execution __omp_offloading_10302_108040b_MAIN___l3 with pointer 0x0000000001ad5500 (index=0).
Libomptarget --> Manifesting used target pointers:
Target OPENCL RTL --> omp_get_thread_limit() returned 2147483647
Target OPENCL RTL --> omp_get_max_teams() returned 0
Target OPENCL RTL --> Assumed kernel SIMD width is 32
Target OPENCL RTL --> Preferred team size is multiple of 32
Target OPENCL RTL --> Max number of teams is set to 1 (num_teams clause or no teams construct)
Target OPENCL RTL --> Team sizes = {32, 1, 1}
Target OPENCL RTL --> Number of teams = {1, 1, 1}
Target OPENCL RTL --> Started executing kernel.
Target OPENCL RTL --> Successfully finished kernel execution.
Libomptarget --> Unloading target library!
Libomptarget --> Clearing Interop Table
Libomptarget --> Unregistered image 0x000000000047ff10 from RTL 0x0000000001aa9580!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x000000000047fef0
Libomptarget --> Done unregistering library!
Target OPENCL RTL --> Deinit OpenCL plugin!
Target OPENCL RTL --> Closed RTL successfully
Libomptarget --> Deinit target library!
Why does it say
Libomptarget --> Unable to load library 'libomptarget.rtl.level0.so': libze_loader.so.1: cannot open shared object file: No such file or directory!
?
The library exists on my PC at /opt/intel/oneapi/compiler/2024.0/lib/libomptarget.rtl.level0.so
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved the problem by installing necessary GPU drivers. Go to https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/install-gpu-drivers.html#INSTALL-INTEL-GPU-DRIVERS
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had the same issue on a shared cluster and the administrator let me know that this is because IFX (and other Intel compilers) only support Intel GPUs. The missing libraries flagged in lines 62-70 should take care of other cards, like NVIDIA.
In the IFX man page, under the entry "-fopenmp-targets=triple (L*X only)" no support for anything other than Intel devices is mentioned.
Perhaps an Intel technician can confirm this? if it is true, this should be printed in bold letters at the top of all the tutorials on the "target" construct as it is a huge restriction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved the problem by installing necessary GPU drivers. Go to https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/install-gpu-drivers.html#INSTALL-INTEL-GPU-DRIVERS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That sounds very promising, but could you be a bit more specific?
What libraries exactly did you install?
On what machine (desktop/shared cluster/laptop) and OS?
What graphics card did you manage to use with the "target" construct after installing those libraries? NVIDIA? AMD?
What compiler options/paths did you need to set to get it to work?
The link in your post takes me to the top of a very beefy manual that list hardware support only for
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have Debian and integrated Intel GPU. I used the installation steps on https://dgpu-docs.intel.com/driver/installation.html#ubuntu-install-steps for Ubuntu to configure the the APT package manager to install GPU drivers.
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" | sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
Ubuntu Jammy Jellyfish is not Debian, but it worked. Then I installed the following drivers.
sudo apt update
sudo apt install intel-level-zero-gpu level-zero
There are other drivers as well like intel-opencl-icd. Which one you need I don't know. But I think with the correct driver(s) you should be able to compile OpenMP Offload code with ifx and run on GPUs from several vendors. On the webpage it is described how it is done for other OS-s. Hope it helps!
Rasmus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot, that clears things up! I still hope to hear from Intel to confirm whether the construct can work on non-Intel GPU cards. I work on several large, shared, clusters with thousands of GPU cards and not a single one was manufactured by Intel!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Solution:
1. Pull out credit card
2. Call New egg or equivalent and order GPU card
3. Wait 5 to 10 days depending on the USPS and their ability to find a place in the USA
4. Install GPU
5. Come back here and tell us the errors and then wait......
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page