Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

ifx OpenMP matrix multiplication

Alessandro_D_
New Contributor I
554 Views

I am trying to run this example about a matrix multiplication done in parallel on the GPU with OpenMP. Here is the code:

 

 

 

include "mkl_omp_offload.f90"

program matrix_multiply
use omp_lib
implicit none
integer :: i, j, k, myid, m, n, istat
real :: sup_norm, tmp
integer, parameter :: dp = kind(1.0d0)
real(dp) :: elapsed
integer(8) :: t2, t1, rate
character(16) :: str
real, allocatable, dimension(:,:) :: a, b, c, c_serial
!
! Different Intel GPUs have varying amounts of memory. If the program
! fails at runtime, try decreasing the value of "n".
!
n = 50

call system_clock(t1, rate)

myid = OMP_GET_THREAD_NUM()
if (myid .eq. 0) then
print *, 'matrix size ', n
print *, 'Number of CPU procs is ', OMP_GET_NUM_THREADS()
print *, 'Number of OpenMP Device Available:', omp_get_num_devices()
!$omp target
if (OMP_IS_INITIAL_DEVICE()) then
print *, ' Running on CPU'
else
print *, ' Running on GPU'
endif
!$omp end target
endif

allocate( a(n,n), b(n,n), c(n,n), c_serial(n,n), stat=istat)
if (istat/=0) error stop "Allocation of matrices FAILED!"

! Initialize matrices
do j=1,n
do i=1,n
a(i,j) = real(i + j - 1)/n
b(i,j) = real(i - j + 1)/n
enddo
enddo
c = 0.0
c_serial = 0.0

!
! parallel device matrix multiplication.
!

call system_clock(t1)

!$omp target data map(to: a, b) map(tofrom: c)
!$omp target teams distribute parallel do collapse(2) private(j,i,k,tmp)
do j=1,n
do i=1,n
tmp = 0.0
do k=1,n
tmp = tmp + a(i,k) * b(k,j)
enddo
c(i,j) = tmp
enddo
enddo
!$omp end target data

call system_clock(t2)
elapsed = real(t2 - t1,dp)/real(rate,dp)
write(*,'("GPU Device time (s) ",F7.3)') elapsed

call system_clock(t1)
! serial compute matrix multiplication
do j=1,n
do i=1,n
tmp = 0.0
do k=1,n
tmp = tmp + a(i,k) * b(k,j)
enddo
c_serial(i,j) = tmp
enddo
enddo

call system_clock(t2)
elapsed = real(t2 - t1,dp)/real(rate,dp)
write(*,'("CPU Device time (s) ",F7.3)') elapsed

! verify result
do j=1,n
do i=1,n
if (.not. isclose(c(i,j),c_serial(i,j),atol=1.0e-2) ) then
print *,'FAILED, i, j, c_serial(i,j), c(i,j) ', i, j, c_serial(i,j), c(i,j)
stop
endif
enddo
enddo

sup_norm = maxval(abs(c-c_serial))

print *,'PASSED'

write(*,*) "||c-c_serial|| = ", sup_norm


contains

! See https://numpy.org/doc/stable/reference/generated/numpy.isclose.html
elemental function isclose(a,b,atol,rtol)
real, intent(in) :: a, b
real, intent(in), optional :: atol, rtol
logical :: isclose

real :: atol_, rtol_

atol_ = 1.0e-5
rtol_ = 1.0e-9

if (present(atol)) atol_ = atol
if (present(rtol)) rtol_ = rtol

isclose = abs(a - b) <= (atol_ + rtol_*abs(b))
end function

end program matrix_multiply

 

 

I compile it with the following command:

ifx -fpp /Qopenmp /Qopenmp-targets:spir64 /Qmkl src\03_mm_GPU.f90 -o exe\run_win.exe

Then I get this error:

ifx: warning #10148: option '/size-llp64' not supported
ifx: warning #10148: option '/size-llp64' not supported
NMAKE : fatal error U1077: 'ifx -fpp /Qopenmp /Qopenmp-targets:spir64 /Qmkl src\03_mm_GPU.f90 -o exe\run_win.exe' : return code '0xc0000374'
Stop.

The code should run on my Intel graphics card, which is 

GPU 0
 
Intel(R) UHD Graphics
 
Driver version: 31.0.101.4502
Driver date: 15/06/2023
DirectX version: 12 (FL 12.1)
Physical location: PCI bus 0, device 2, function 0
 
Utilization 2%
Dedicated GPU memory
Shared GPU memory 0,6/15,8 GB
GPU Memory 0,6/15,8 GB
 
The version of ifx that I am using is:
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.2 Build 20231213
 
Any help solving this problem would be greatly appreciated, thanks!

 

0 Kudos
1 Reply
Barbara_P_Intel
Employee
464 Views

I removed the first line, the include statement, since you don't have it here in this post. I removed the compiler option, /Qmkl, since it's not needed.

I compiled successfully with

ifx /Qopenmp /Qopenmp-targets:spir64 matmul.f90

and it ran on my laptop.

Q:\tmp>matmul.exe
matrix size 50
Number of CPU procs is 1
Number of OpenMP Device Available: 1
Running on GPU
GPU Device time (s) 0.001
CPU Device time (s) 0.000
PASSED
||c-c_serial|| = 9.5367432E-06

 

0 Kudos
Reply