Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28549 Discussions

How to enable the GPU to run a FORTRAN program?

eliopoulos
Novice
5,826 Views

How can I enable the GPU to run a parallel do loop in a FORTRAN program? I have tried: "!$omp target teams distribute parallel do" but the GPU does not run. Any advice?

0 Kudos
49 Replies
eliopoulos
Novice
797 Views

I have created a reproducer that produces the same errors with my program.

0 Kudos
Barbara_P_Intel
Employee
724 Views

Ah! You use the matmul() instrinsic. When you compile with /Qopenmp on Windows or -qopenmp on Linux the matmul() intrinsic is automatically parallelized using OpenMP.

That can be disabled with /Qopt-matmul- on Windows or -no-opt-matmul on LInux. See the Developer Guide.

So far I've just run this on CPU. iGPU is next.

Attached is an update of z.for called z.r1.for. Here's the output I got for CPU:

Q:\06129705>ifx /Qopenmp z.r1.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240103
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.38.33130.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:z.r1.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
z.r1.obj

Q:\06129705>z.r1.exe
 Number of procs is            8
   6000000000000.00        6000000000000.00        6000000000000.00
   6000000000000.00        6000000000000.00        6000000000000.00
   6000000000000.00        6000000000000.00        6000000000000.00

 

0 Kudos
Barbara_P_Intel
Employee
689 Views

I have the Intel GPU version running and printing the same answers as the CPU version. It's attached as z.r2.for.

Two MAJOR changes:

  • double precision is not supported on Intel(R) Iris(R) Xe Graphics. I made the arrays real.
  • It's poor coding practice to use the same variable for the loop index and the end of the loop. This loop construct yields the wrong answer on Intel GPU. 

!   do i=1,i

! Using this construct gives the same answer as running on CPU.

   do j=1,i

 

When I run it in a oneAPI command window I type:

set LIBOMPTARGET_PLUGIN_PROFILE=T

to get a profile printout. I set that to be sure it runs on Intel GPU. No profile, it didn't offload.

 

>set LIBOMPTARGET_PLUGIN_PROFILE=T
>ifx /Qopenmp /Qopenmp-targets=spir64 z.r2.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240103
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.38.33130.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:z.r2.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
C:\Users\bperz\AppData\Local\Temp\1932873.obj
C:\Users\bperz\AppData\Local\Temp\19328414.o
-defaultlib:omptarget.lib

>z.r2.exe
 Number of procs is            8
  6.0000001E+12  6.0000001E+12  6.0000001E+12  6.0000001E+12  6.0000001E+12
  6.0000001E+12  6.0000001E+12  6.0000001E+12  6.0000001E+12
======================================================================================================================
LIBOMPTARGET_PLUGIN_PROFILE(LEVEL_ZERO) for OMP DEVICE(0) Intel(R) Iris(R) Xe Graphics, Thread 0
----------------------------------------------------------------------------------------------------------------------
Kernel 0                  : __omp_offloading_80ff4b02_5bc1261f_MAIN___l27
----------------------------------------------------------------------------------------------------------------------
                          : Host Time (msec)                        Device Time (msec)
Name                      :      Total   Average       Min       Max     Total   Average       Min       Max     Count
----------------------------------------------------------------------------------------------------------------------
Compiling                 :     397.03    397.03    397.03    397.03      0.00      0.00      0.00      0.00      1.00
DataAlloc                 :       0.58      0.06      0.00      0.42      0.00      0.00      0.00      0.00      9.00
DataRead (Device to Host) :       0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      1.00
DataWrite (Host to Device):       0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      1.00
Kernel 0                  :       0.51      0.51      0.51      0.51      0.10      0.10      0.10      0.10      1.00
Linking                   :       0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      1.00
OffloadEntriesInit        :       3.88      3.88      3.88      3.88      0.00      0.00      0.00      0.00      1.00
======================================================================================================================

 

 

There are some other comments in z.r2.for.

@eliopoulos, let me know how this works for you.

0 Kudos
eliopoulos
Novice
673 Views

It works. I have the same output. Do better Intel GPU support double precision?

0 Kudos
Barbara_P_Intel
Employee
668 Views

Yes, the higher end Intel GPUs support double precision.

The Intel(R) Iris(R) Xe Graphics GPUs are designed for gaming, not HPC workloads. Gamers don't need those extra bits for their graphics.


0 Kudos
eliopoulos
Novice
664 Views

But /Qmkl is not supported because /size-llp64 is not supported by ifx, I guess.

0 Kudos
Barbara_P_Intel
Employee
656 Views

I don't know how MKL works on Intel(R) Iris(R) Xe Graphics. You might ask on the MKL Forum. Have a simple reproducer handy. Here's the link to the MKL Forum.

ifx definitely supports double precision on CPU and Intel GPUs that support double precision.

0 Kudos
eliopoulos
Novice
652 Views
0 Kudos
Reply