Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29277 Discussions

Questions regarding GPUs and OCLOC

Arjen_Markus
Honored Contributor II
8,058 Views

I want to experiment a bit with GPUs, but I am getting lost wrt the actual hardware and the support from ifx. Here is the situation:

I work with a laptop running Windows. According to the task manager it has two GPUs, Intel UHD Graphics and NVIDIA RTX A1000 Laptop GPU. I have no idea if ifx supports the first (the second certainly is not supported). So, I try to build a program that exploits GPUs via OpenMP offloading. So far so good.

The option -Qopenmp-targets:spir64 does have an effect, in that with the environment variable LIBOMPTARGET_DEBUG set to 1 I get a lot of debugging information. If I unset that variable the program hangs and after an interruption via control-C, I get the message:

forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
KERNELBASE.dll     00007FFE76522943  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE76FB7614  Unknown               Unknown  Unknown
ntdll.dll          00007FFE787E26F1  Unknown               Unknown  Unknown
Libomptarget error: Host ptr 0x00007ff6f3f795ec does not have a matching target pointer.
Libomptarget error: Run with
Libomptarget error: LIBOMPTARGET_DEBUG=1 to display basic debug information.
Libomptarget error: LIBOMPTARGET_DEBUG=2 to display calls to the compute runtime.
Libomptarget error: LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
Libomptarget error: Source location information not present. Compile with -g or -gline-tables-only.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory

My interpretation is that the Intel GPU is not actually used or cannot be connected or is simply not supported. WEll, that can happen. But looking for an alternative (or better: looking for the list of devices that are supported), I cane across the option -Qopenmp-targets:spir64_gen.

If I try that, I get:

Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2023.1.0 Build 20230320
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

ifx: warning #10441: The OpenCL offline compiler could not be found and is required for AOT compilation.See "https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html" for more information.
ifx: error #10037: could not find 'ocloc'
ifx: error #10401: error running 'Offline Compiler'

So I try to find out how to get ocloc. For Windows it ought to be part of the Intel DPC++/C++ installation. As far as I can tell from the output of icx on my laptop, that has been installed:

Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2023.1.0 Build 20230320
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

icx: error: no input files

But I cannot find a program "ocloc.exe" on the laptop. Or anything that resembles that name.

So I am left with a couple of questions:

  • Is the Intel GPU I have supported by ifx or icx?
  • What do I need to do to get ocloc and thereby enable "spir64_gen", if that would be a solution?
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
7,678 Views

>> Intel® HD Graphics 630

That "GPU" has native support for double precision, many other GPU's do not. IIF the simulate douple precision via software, then expect a slowdown. IIF they do not have software support for double precision, then they should report this if failing.

Arjen, you could experiment with setting your real's to real(4). If this works, then it indicates lack of (simulated) double precision.

Jim Dempsey

View solution in original post

26 Replies
jimdempseyatthecove
Honored Contributor III
7,679 Views

>> Intel® HD Graphics 630

That "GPU" has native support for double precision, many other GPU's do not. IIF the simulate douple precision via software, then expect a slowdown. IIF they do not have software support for double precision, then they should report this if failing.

Arjen, you could experiment with setting your real's to real(4). If this works, then it indicates lack of (simulated) double precision.

Jim Dempsey

Arjen_Markus
Honored Contributor II
2,114 Views

Spot on! When I changed the kind to 4, the program worked fine. When I reset the size to its original value, the program crashed again, but n = 1600 does do the job, though with slight differences in the result.

Well, that is a valuable lesson learned.

0 Kudos
Barbara_P_Intel
Employee
2,100 Views

@Arjen_Markus Just to clarify... the matmul sample I provided works ok with Kind=4 and matrix size 2600. It also works ok with kind=8 and matrix size 1600. Is that correct?

I want to update the OpenMP offload tutorial, if needed, so others don't run into the same problem who don't have a team like on this Forum to help them. 

Thanks!

0 Kudos
Arjen_Markus
Honored Contributor II
2,097 Views

Ah, no, not quite. Here is the list of cases:

  • With kind=8 (so double precision) the program fails with a runtime error, no matter what value of n I use.
  • With kind=4 (so single precision) the program works with n = 1600 or less, though I have not tried to find the maximum successful value.
  • Winth kind 4, the program fails again, but apparently due to the memory consumption, not due to the more fundamental precision mismatch.

I have not yet tried to analyse the debug output, as suggested in the runtime error report.

As mentioned by @jimdempseyatthecove the original failure has to do with the use of double precision.

 

0 Kudos
Arjen_Markus
Honored Contributor II
2,084 Views

Actually, when I turned on the debug output (with double-precision matrices of a rather limited size, n=160), I got the very informative message:

Libomptarget --> Device 0 is ready to use.
Target LEVEL0 RTL --> Device 0: Loading binary from 0x00007ff63879c000
Target LEVEL0 RTL --> Expecting to have 2 entries defined
Target LEVEL0 RTL --> Base L0 module compilation options: -cl-std=CL2.0  
Target LEVEL0 RTL --> Found a single section in the image
Target LEVEL0 RTL --> Error: addModule:zeModuleCreate failed with error code 1879048196, ZE_RESULT_ERROR_MODULE_BUILD_FAILURE
Target LEVEL0 RTL --> Error: module creation failed
LEVEL0 message: Target build log:
LEVEL0 message:   ''
LEVEL0 message:   'error: Double type is not supported on this platform.'
LEVEL0 message:   'in kernel: 'MAIN__''
LEVEL0 message:   'error: backend compiler failed build.'
LEVEL0 message:   ''

I have attached the entire output, but this was the missing information that could have pointed us to the problem directly.

Barbara_P_Intel
Employee
2,074 Views
0 Kudos
Reply