Solved: OpenMP GPU offloading using Intel Fortran 19.0.5

yzh15 · ‎09-01-2020

Hi Everyone,

I'm trying to offload some computations to GPU using the OpenMP 4.5 feature. But couldn't successfully compile the code,

1>ifort: error #10036: unable to run 'C:\PROGRA~2\INTELS~1\COMPIL~4\linux\bin\intel64\ifort.exe'
1>ifort: error #10340: problem encountered when performing target compilation

I used the option /Qopenmp and /Qopenmp-offload. The compiler version is Intel(R) Visual Fortran Compiler 19.0.5.281 [Intel(R) 64] and platform is Windows 10.

I also tried /Qnextgen option, following the article here,

https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html

But I get new errors,

1>ifort: error #10408: The Intel LLVM Based compiler cannot be found in the expected location. Please check your installation or documentation for more information.
1>ifort: error #10036: unable to run 'C:\PROGRA~2\INTELS~1\COMPIL~4\windows\bin\ifx.exe'

Any suggestions would be appreciated. Thanks!

Steve_Lionel · ‎09-02-2020

Jim, Intel did already implement what you suggest, some years ago. They called it "Cluster OpenMP". Nobody used it and it was quietly retired.

Just to make it clear for @yzh15 , Intel compilers do not support OpenMP offload to GPUs (NVidia/AMD or anyone else).

The Qopenmp-offload option requires that a separate toolkit for Xeon Phi development be installed. It included a completely separate compiler that is invoked by the ifort driver along with supporting software. If you don't have that, then the option will not work. I don't think this is a bug.

Just speculating here, given that Intel is developing a new series of coprocessors under the Xe-HPC name, it's possible that offloading to one of these could be in the future. I have zero inside knowledge of this, but it would make sense to me.

View solution in original post

Steve_Lionel · ‎09-01-2020

As far as I know, the regular Intel compiler doesn't support offload to GPUs, only the (now discontinued) Intel Xeon Phi coprocessors.

/Qnextgen requires that you have the new beta HPC compiler installed. I don't think it supports GPU offload either.

Johannes_Rieke · ‎09-02-2020

Hi, out of curiosity: Can you show us a minimum (not)working example? I would be interested, which OMP pragmas you use.

With PSXE2020u2 (19.1.2.254) /Qopenmp /Qopenmp-offload:host is working fine for simple OMP pragmas:

program omp_test
  use omp_lib
  implicit none
  integer :: i
  !$omp parallel do default(none) private(i)
  do i = 1, 8
    write(*,*) omp_get_thread_num()
  end do
  !$omp end parallel do
end program omp_test

ifort /Qopenmp-offload:host /Qopenmp omp_test.f90

/Qopenmp-offload:mic requires a Xeon Phi as far as I understood.

yzh15 · ‎09-02-2020

Hi, Thanks for your time, the code is actually pretty simple,

real, allocatable :: a(:), b(:), c(:)
allocate(a(10), b(10), c(10))
a = 1.0
b = 2.0

call omp_set_num_threads(nthread)

!$omp target map(to: a, b) map(from:c)
!$omp parallel do private(i)
do i=1,10
c(i) = a(i) + b(i)
enddo
! $omp end target

The main thing is to relocate the computations to the device. My Fortran compiler also builds a regular OpenMP program fine. It's giving me error only when I use 'target' directive. I also tried the C program in the link I provided, my C compiler (icl) can also successfully compile the code even with 'target'. So the problem seems only come from the Fortran compiler.

Do you mean Intel compiler's /Qopenmp-offload only supports MIC architecture ?

Thanks,

Johannes_Rieke · ‎09-02-2020

Hi, I can reproduce the error with your code with 19.1.2, if I choose /Qopenmp-offload. With /Qopenmp-offload:host it is compiling and running. Whatever it offloads to where...

Maybe you can open a ticket for that. I assume that the target is ignored, if host is given as offload. The path in the error message containing linux on Windows OS compiler and pointing to an exe sounds strange.

The complete code:

program main
  use omp_lib
  implicit none
  integer :: i
  integer, parameter :: nthread = 8
  real, allocatable :: a(:), b(:), c(:)  
  allocate(a(10), b(10), c(10))
  a = 1.0
  b = 2.0  
  call omp_set_num_threads(nthread)
  !$omp target map(to: a, b) map(from:c)
  !$omp parallel do private(i)
  do i=1,10
    c(i) = a(i) + b(i)
    write(*,*) omp_get_thread_num()
  enddo
  !$omp end target  
end program main

The error (ifort /Qopenmp /Qopenmp-offload omp_test.f90):

1>------ Build started: Project: omp_test, Configuration: Debug x64 ------
1>Compiling with Intel(R) Visual Fortran Compiler 19.1.2.254 [Intel(R) 64]...
1>omp_test.f90
1>ifort: error #10037: could not find 'C:\PROGRA~2\INTELS~1\CO1815~1\linux\bin\intel64\ifort.exe'
1>ifort: error #10340: problem encountered when performing target compilation

jimdempseyatthecove · ‎09-02-2020

Offloading on Intel compiler is only supported for Intel Xeon Phi Knights Corner (51xx, and 71xx series) of coprocessors. These coprocessors run a version of linux and perform the compilation of the offloaded section of code using a version of the compiler inside the coprocessor. The error message indicates that the linux version of the compiler (that is to be injected into the (missing) Xeon Phi) was not found on your system.

While this subject of Offloading is presented here...

I have the following suggestion for Intel that should be relatively easy to implement, and which I think will gain popularity from the users.

Make a derivative of your KNC OpenMP offload, that offloads NOT to an installed coprocessor, but rather offloads to a fabric attached host using the MPI API (hidden in the OpenMP directives).

While the programmer can convert an application from single process to multi-process it is non-trivial.
Additionally, converting an application to make use of co-arrays is also non-trivial.

IMHO, converting an OpenMP single process into OpenMP multi-process would be near trivial

Jim Dempsey

Steve_Lionel · ‎09-02-2020

Jim, Intel did already implement what you suggest, some years ago. They called it "Cluster OpenMP". Nobody used it and it was quietly retired.

Just to make it clear for @yzh15 , Intel compilers do not support OpenMP offload to GPUs (NVidia/AMD or anyone else).

The Qopenmp-offload option requires that a separate toolkit for Xeon Phi development be installed. It included a completely separate compiler that is invoked by the ifort driver along with supporting software. If you don't have that, then the option will not work. I don't think this is a bug.

Just speculating here, given that Intel is developing a new series of coprocessors under the Xe-HPC name, it's possible that offloading to one of these could be in the future. I have zero inside knowledge of this, but it would make sense to me.

jimdempseyatthecove · ‎09-03-2020

>>Intel did already implement what you suggest, some years ago. They called it "Cluster OpenMP". Nobody used it...

I am going to assume that Intel marketing misguidedly targeted the HPC users that already had their applications written as MPI applications. IOW more work to port to use "Cluster OpenMP"

My (resurected) suggestion is targeted at the users who's applications are written for the desktop/workstation and where they may have additional desktops, workstations, and/or server(s) available .AND. they would like not to incur a large development effort to make use of the additional processing capacity.

Perhaps it is time for Intel marketing to survey their software vendors that produce OpenMP applications for use on desktop and workstations.

Jim Dempsey

Steve_Lionel · ‎09-03-2020

No, it wasn't targeted at MPI users, but at OpenMP applications that wanted to distribute across more processors than were in the local system without recoding. Given that OpenMP assumes a shared address space, it is not clear to me this is worth the effort. I would rather see people use coarrays.

JohnNichols · ‎09-03-2020

I like C# method of wrapping up a set of code and offloading it to another thread, with some timing it is easy to balance

using the GPU should be automatic - our problem here is competition gets in the way of real advancement.

I blame Borland myself for his cheap compilers.

yzh15 · ‎09-02-2020

Hi,

So you mean none of Intel compilers actually support offloading to real GPU, e.g. NVIDIA or AMD GPU ? The /Qopenmp option only supports offloading to Intel Xeon Phi device ?

Thanks very much!