Hello! I am trying to implement sgemm matrix multiplication on multiple physical cores and I am a little confused on how to do so.
Say I have obtained 9 physical cores from an HPC system and I want sgemm to use all of these cores to do the matrix multiplication. In this case I do not want to use multithreading on these 9 cores, only these 9 cores as a whole. So in a way I guess you could say that the 9 cores are the threads to be used by sgemm. Below is some code I have created, which I believe implements what I want to do. Is this implementation correct?
integer :: N,max_threads,mkl_get_max_threads
real, allocatable, dimension(:,:) :: A,B,C
integer :: ierror,num_cores,my_rank
double precision :: time1,time2
CALL MPI_Init(ierror) !Flag for error
CALL MPI_COMM_Size(MPI_COMM_WORLD,num_cores,ierror) !puts in the number of cores into num_cores
CALL MPI_Comm_rank(MPI_COMM_WORLD,my_rank,ierror) !defining the variable for the rank of the core
if(my_rank == 0)then
!starting the timer
time1 = MPI_Wtime()
N = 61740
A = 1.0
B = 2.0
C = 0.0
if(my_rank == 0)then
!printing the elapsed time
time2 = MPI_Wtime()
print *, 'elapsed time' , time2 - time1
print *, C(1,2)
end program mkl_matrixmul
Also if it helps, I am using a Sandy Bridge node with 256 GB of memory.
Not sure if I understand about your question correctly.
1) In generally, as MKL was mulithreaded by OpenMP run-time library, if you call mkl sgemm directly as the MKL fortran sample, https://software.intel.com/sites/default/files/mkl_fortran_samples_05162016.zip
and compile it if with ifort your.for -mkl. The sgemm can run with 9 physical cores on one nodes automatically. User don't need write threading code.
2) You can use MPI process, then let process 0 to call sgemm (please note, there is MPI version sgemm psgemm() in MKL). Then let sgemm use 9 OpenMP threads, say mkl_set_num_threads(9)
I guess the two ways should be ok. but they looks still being using 9 threads on 9 cores. In your case, may you want to cooridinate the OpenMP threads with MPI Process, (or we call it as OpenMP Affinity)?
If yes, you may consider the OpenMP Place or Affinity in OpenMP documentation, please see MKL user guide, like
https://software.intel.com/en-us/node/599522 => control MPI and OpenMP number
https://software.intel.com/en-us/node/528552 => control OpenMP thread affinity to core.