链接已复制
Ah! You use the matmul() instrinsic. When you compile with /Qopenmp on Windows or -qopenmp on Linux the matmul() intrinsic is automatically parallelized using OpenMP.
That can be disabled with /Qopt-matmul- on Windows or -no-opt-matmul on LInux. See the Developer Guide.
So far I've just run this on CPU. iGPU is next.
Attached is an update of z.for called z.r1.for. Here's the output I got for CPU:
Q:\06129705>ifx /Qopenmp z.r1.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240103
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.38.33130.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:z.r1.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
z.r1.obj
Q:\06129705>z.r1.exe
Number of procs is 8
6000000000000.00 6000000000000.00 6000000000000.00
6000000000000.00 6000000000000.00 6000000000000.00
6000000000000.00 6000000000000.00 6000000000000.00
I have the Intel GPU version running and printing the same answers as the CPU version. It's attached as z.r2.for.
Two MAJOR changes:
- double precision is not supported on Intel(R) Iris(R) Xe Graphics. I made the arrays real.
- It's poor coding practice to use the same variable for the loop index and the end of the loop. This loop construct yields the wrong answer on Intel GPU.
! do i=1,i
! Using this construct gives the same answer as running on CPU.
do j=1,i
When I run it in a oneAPI command window I type:
set LIBOMPTARGET_PLUGIN_PROFILE=T
to get a profile printout. I set that to be sure it runs on Intel GPU. No profile, it didn't offload.
>set LIBOMPTARGET_PLUGIN_PROFILE=T
>ifx /Qopenmp /Qopenmp-targets=spir64 z.r2.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240103
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.38.33130.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:z.r2.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
C:\Users\bperz\AppData\Local\Temp\1932873.obj
C:\Users\bperz\AppData\Local\Temp\19328414.o
-defaultlib:omptarget.lib
>z.r2.exe
Number of procs is 8
6.0000001E+12 6.0000001E+12 6.0000001E+12 6.0000001E+12 6.0000001E+12
6.0000001E+12 6.0000001E+12 6.0000001E+12 6.0000001E+12
======================================================================================================================
LIBOMPTARGET_PLUGIN_PROFILE(LEVEL_ZERO) for OMP DEVICE(0) Intel(R) Iris(R) Xe Graphics, Thread 0
----------------------------------------------------------------------------------------------------------------------
Kernel 0 : __omp_offloading_80ff4b02_5bc1261f_MAIN___l27
----------------------------------------------------------------------------------------------------------------------
: Host Time (msec) Device Time (msec)
Name : Total Average Min Max Total Average Min Max Count
----------------------------------------------------------------------------------------------------------------------
Compiling : 397.03 397.03 397.03 397.03 0.00 0.00 0.00 0.00 1.00
DataAlloc : 0.58 0.06 0.00 0.42 0.00 0.00 0.00 0.00 9.00
DataRead (Device to Host) : 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
DataWrite (Host to Device): 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
Kernel 0 : 0.51 0.51 0.51 0.51 0.10 0.10 0.10 0.10 1.00
Linking : 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
OffloadEntriesInit : 3.88 3.88 3.88 3.88 0.00 0.00 0.00 0.00 1.00
======================================================================================================================
There are some other comments in z.r2.for.
@eliopoulos, let me know how this works for you.
It works. I have the same output. Do better Intel GPU support double precision?
But /Qmkl is not supported because /size-llp64 is not supported by ifx, I guess.
I don't know how MKL works on Intel(R) Iris(R) Xe Graphics. You might ask on the MKL Forum. Have a simple reproducer handy. Here's the link to the MKL Forum.
ifx definitely supports double precision on CPU and Intel GPUs that support double precision.