- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
How can I enable the GPU to run a parallel do loop in a FORTRAN program? I have tried: "!$omp target teams distribute parallel do" but the GPU does not run. Any advice?
Link kopiert
- « Vorherige
- Nächste »
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Ah! You use the matmul() instrinsic. When you compile with /Qopenmp on Windows or -qopenmp on Linux the matmul() intrinsic is automatically parallelized using OpenMP.
That can be disabled with /Qopt-matmul- on Windows or -no-opt-matmul on LInux. See the Developer Guide.
So far I've just run this on CPU. iGPU is next.
Attached is an update of z.for called z.r1.for. Here's the output I got for CPU:
Q:\06129705>ifx /Qopenmp z.r1.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240103
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.38.33130.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:z.r1.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
z.r1.obj
Q:\06129705>z.r1.exe
Number of procs is 8
6000000000000.00 6000000000000.00 6000000000000.00
6000000000000.00 6000000000000.00 6000000000000.00
6000000000000.00 6000000000000.00 6000000000000.00
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I have the Intel GPU version running and printing the same answers as the CPU version. It's attached as z.r2.for.
Two MAJOR changes:
- double precision is not supported on Intel(R) Iris(R) Xe Graphics. I made the arrays real.
- It's poor coding practice to use the same variable for the loop index and the end of the loop. This loop construct yields the wrong answer on Intel GPU.
! do i=1,i
! Using this construct gives the same answer as running on CPU.
do j=1,i
When I run it in a oneAPI command window I type:
set LIBOMPTARGET_PLUGIN_PROFILE=T
to get a profile printout. I set that to be sure it runs on Intel GPU. No profile, it didn't offload.
>set LIBOMPTARGET_PLUGIN_PROFILE=T
>ifx /Qopenmp /Qopenmp-targets=spir64 z.r2.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240103
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.38.33130.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:z.r2.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
C:\Users\bperz\AppData\Local\Temp\1932873.obj
C:\Users\bperz\AppData\Local\Temp\19328414.o
-defaultlib:omptarget.lib
>z.r2.exe
Number of procs is 8
6.0000001E+12 6.0000001E+12 6.0000001E+12 6.0000001E+12 6.0000001E+12
6.0000001E+12 6.0000001E+12 6.0000001E+12 6.0000001E+12
======================================================================================================================
LIBOMPTARGET_PLUGIN_PROFILE(LEVEL_ZERO) for OMP DEVICE(0) Intel(R) Iris(R) Xe Graphics, Thread 0
----------------------------------------------------------------------------------------------------------------------
Kernel 0 : __omp_offloading_80ff4b02_5bc1261f_MAIN___l27
----------------------------------------------------------------------------------------------------------------------
: Host Time (msec) Device Time (msec)
Name : Total Average Min Max Total Average Min Max Count
----------------------------------------------------------------------------------------------------------------------
Compiling : 397.03 397.03 397.03 397.03 0.00 0.00 0.00 0.00 1.00
DataAlloc : 0.58 0.06 0.00 0.42 0.00 0.00 0.00 0.00 9.00
DataRead (Device to Host) : 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
DataWrite (Host to Device): 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
Kernel 0 : 0.51 0.51 0.51 0.51 0.10 0.10 0.10 0.10 1.00
Linking : 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
OffloadEntriesInit : 3.88 3.88 3.88 3.88 0.00 0.00 0.00 0.00 1.00
======================================================================================================================
There are some other comments in z.r2.for.
@eliopoulos, let me know how this works for you.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
It works. I have the same output. Do better Intel GPU support double precision?
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Yes, the higher end Intel GPUs support double precision.
The Intel(R) Iris(R) Xe Graphics GPUs are designed for gaming, not HPC workloads. Gamers don't need those extra bits for their graphics.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
But /Qmkl is not supported because /size-llp64 is not supported by ifx, I guess.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I don't know how MKL works on Intel(R) Iris(R) Xe Graphics. You might ask on the MKL Forum. Have a simple reproducer handy. Here's the link to the MKL Forum.
ifx definitely supports double precision on CPU and Intel GPUs that support double precision.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite
- « Vorherige
- Nächste »