- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello! I am trying to implement sgemm matrix multiplication on multiple physical cores and I am a little confused on how to do so.
Say I have obtained 9 physical cores from an HPC system and I want sgemm to use all of these cores to do the matrix multiplication. In this case I do not want to use multithreading on these 9 cores, only these 9 cores as a whole. So in a way I guess you could say that the 9 cores are the threads to be used by sgemm. Below is some code I have created, which I believe implements what I want to do. Is this implementation correct?
program mkl_matrixmul
use mpi
implicit none
integer :: N,max_threads,mkl_get_max_threads
real, allocatable, dimension(:,:) :: A,B,C
integer :: ierror,num_cores,my_rank
double precision :: time1,time2
CALL MPI_Init(ierror) !Flag for error
CALL MPI_COMM_Size(MPI_COMM_WORLD,num_cores,ierror) !puts in the number of cores into num_cores
CALL MPI_Comm_rank(MPI_COMM_WORLD,my_rank,ierror) !defining the variable for the rank of the core
CALL MPI_BARRIER(MPI_COMM_WORLD,ierror)
if(my_rank == 0)then
!starting the timer
time1 = MPI_Wtime()
end if
N = 61740
Allocate(A(N,N),B(N,N),C(N,N))
A = 1.0
B = 2.0
C = 0.0
call mkl_set_num_threads(num_cores)
call sgemm('N','N',N,N,N,1.0,A,N,B,N,1.0,C,N)
CALL MPI_BARRIER(MPI_COMM_WORLD,ierror)
if(my_rank == 0)then
!printing the elapsed time
time2 = MPI_Wtime()
print *, 'elapsed time' , time2 - time1
print *, C(1,2)
end if
CALL MPI_Finalize(ierror)
end program mkl_matrixmul
Also if it helps, I am using a Sandy Bridge node with 256 GB of memory.
Thank you,
Brandon
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Brandon,
Not sure if I understand about your question correctly.
1) In generally, as MKL was mulithreaded by OpenMP run-time library, if you call mkl sgemm directly as the MKL fortran sample, https://software.intel.com/sites/default/files/mkl_fortran_samples_05162016.zip
and compile it if with ifort your.for -mkl. The sgemm can run with 9 physical cores on one nodes automatically. User don't need write threading code.
2) You can use MPI process, then let process 0 to call sgemm (please note, there is MPI version sgemm psgemm() in MKL). Then let sgemm use 9 OpenMP threads, say mkl_set_num_threads(9)
I guess the two ways should be ok. but they looks still being using 9 threads on 9 cores. In your case, may you want to cooridinate the OpenMP threads with MPI Process, (or we call it as OpenMP Affinity)?
If yes, you may consider the OpenMP Place or Affinity in OpenMP documentation, please see MKL user guide, like
https://software.intel.com/en-us/node/599522 => control MPI and OpenMP number
https://software.intel.com/en-us/node/528552 => control OpenMP thread affinity to core.
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Option number 2 is the one I believe I wanted. Thank you very much for you help.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page