- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to run this example about a matrix multiplication done in parallel on the GPU with OpenMP. Here is the code:
include "mkl_omp_offload.f90"
program matrix_multiply
use omp_lib
implicit none
integer :: i, j, k, myid, m, n, istat
real :: sup_norm, tmp
integer, parameter :: dp = kind(1.0d0)
real(dp) :: elapsed
integer(8) :: t2, t1, rate
character(16) :: str
real, allocatable, dimension(:,:) :: a, b, c, c_serial
!
! Different Intel GPUs have varying amounts of memory. If the program
! fails at runtime, try decreasing the value of "n".
!
n = 50
call system_clock(t1, rate)
myid = OMP_GET_THREAD_NUM()
if (myid .eq. 0) then
print *, 'matrix size ', n
print *, 'Number of CPU procs is ', OMP_GET_NUM_THREADS()
print *, 'Number of OpenMP Device Available:', omp_get_num_devices()
!$omp target
if (OMP_IS_INITIAL_DEVICE()) then
print *, ' Running on CPU'
else
print *, ' Running on GPU'
endif
!$omp end target
endif
allocate( a(n,n), b(n,n), c(n,n), c_serial(n,n), stat=istat)
if (istat/=0) error stop "Allocation of matrices FAILED!"
! Initialize matrices
do j=1,n
do i=1,n
a(i,j) = real(i + j - 1)/n
b(i,j) = real(i - j + 1)/n
enddo
enddo
c = 0.0
c_serial = 0.0
!
! parallel device matrix multiplication.
!
call system_clock(t1)
!$omp target data map(to: a, b) map(tofrom: c)
!$omp target teams distribute parallel do collapse(2) private(j,i,k,tmp)
do j=1,n
do i=1,n
tmp = 0.0
do k=1,n
tmp = tmp + a(i,k) * b(k,j)
enddo
c(i,j) = tmp
enddo
enddo
!$omp end target data
call system_clock(t2)
elapsed = real(t2 - t1,dp)/real(rate,dp)
write(*,'("GPU Device time (s) ",F7.3)') elapsed
call system_clock(t1)
! serial compute matrix multiplication
do j=1,n
do i=1,n
tmp = 0.0
do k=1,n
tmp = tmp + a(i,k) * b(k,j)
enddo
c_serial(i,j) = tmp
enddo
enddo
call system_clock(t2)
elapsed = real(t2 - t1,dp)/real(rate,dp)
write(*,'("CPU Device time (s) ",F7.3)') elapsed
! verify result
do j=1,n
do i=1,n
if (.not. isclose(c(i,j),c_serial(i,j),atol=1.0e-2) ) then
print *,'FAILED, i, j, c_serial(i,j), c(i,j) ', i, j, c_serial(i,j), c(i,j)
stop
endif
enddo
enddo
sup_norm = maxval(abs(c-c_serial))
print *,'PASSED'
write(*,*) "||c-c_serial|| = ", sup_norm
contains
! See https://numpy.org/doc/stable/reference/generated/numpy.isclose.html
elemental function isclose(a,b,atol,rtol)
real, intent(in) :: a, b
real, intent(in), optional :: atol, rtol
logical :: isclose
real :: atol_, rtol_
atol_ = 1.0e-5
rtol_ = 1.0e-9
if (present(atol)) atol_ = atol
if (present(rtol)) rtol_ = rtol
isclose = abs(a - b) <= (atol_ + rtol_*abs(b))
end function
end program matrix_multiply
I compile it with the following command:
ifx -fpp /Qopenmp /Qopenmp-targets:spir64 /Qmkl src\03_mm_GPU.f90 -o exe\run_win.exe
Then I get this error:
ifx: warning #10148: option '/size-llp64' not supported
ifx: warning #10148: option '/size-llp64' not supported
NMAKE : fatal error U1077: 'ifx -fpp /Qopenmp /Qopenmp-targets:spir64 /Qmkl src\03_mm_GPU.f90 -o exe\run_win.exe' : return code '0xc0000374'
Stop.
The code should run on my Intel graphics card, which is
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I removed the first line, the include statement, since you don't have it here in this post. I removed the compiler option, /Qmkl, since it's not needed.
I compiled successfully with
ifx /Qopenmp /Qopenmp-targets:spir64 matmul.f90
and it ran on my laptop.
Q:\tmp>matmul.exe
matrix size 50
Number of CPU procs is 1
Number of OpenMP Device Available: 1
Running on GPU
GPU Device time (s) 0.001
CPU Device time (s) 0.000
PASSED
||c-c_serial|| = 9.5367432E-06

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page