- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code below behaves differently when built with the /Qmkl:parallel and /Qmkl:cluster. In both cases the code is built for Win 7 64-bit, using the latest Intel compiler and libraries. It is launched as a mpi process with mpiexec.exe -n 2 (that is, using only two ranks) on a dual 6-core workstation.
When /Qmkl:parallel is used, the call to the MKL functions on rank 0 do take advantage of the 6 OpenMP threads there.
When /Qmkl:cluster is used, only one thread on rank 0 is being used and therefore it is six times slower.
Any idea on how to have threaded behavior of /Qmkl:cluster?
Also, why is LWORK double in the /Qmkl:parallel case??
PROGRAM MAIN
USE OMP_LIB
USE MPI
IMPLICIT NONE
INTEGER(KIND=4) :: N,ALLOC_ERROR,INFO,LWORK,SEED_SIZE,I,IERR
INTEGER(KIND=8) :: CLOCK_START,CLOCK_STOP,CLOCK_RATE,CLOCK_MAX
INTEGER(KIND=4),ALLOCATABLE :: SEEDS(:)
LOGICAL :: MPI_IS_INITIALIZED
REAL(KIND=8) :: W(1)
REAL(KIND=8),ALLOCATABLE :: A(:,:),TAU(:),WORK(:)
CALL MPI_INITIALIZED(MPI_IS_INITIALIZED,IERR)
IF (.NOT.MPI_IS_INITIALIZED) THEN
CALL MPI_INIT(IERR)
END IF
WRITE(*,*) 'I am image ',THIS_IMAGE(),' and I can span ',OMP_GET_MAX_THREADS(),' OpenMP threads.'
IF (THIS_IMAGE()==1) THEN
N = 3000
WRITE(*,*) 'N = ',N
ALLOCATE(A(N,N),STAT=ALLOC_ERROR)
IF (ALLOC_ERROR/=0) THEN
ERROR STOP
END IF
CALL RANDOM_SEED(SIZE=SEED_SIZE)
ALLOCATE(SEEDS(SEED_SIZE))
SEEDS=123456
CALL RANDOM_SEED(PUT=SEEDS)
CALL RANDOM_NUMBER(A)
ALLOCATE(TAU(N),STAT=ALLOC_ERROR)
LWORK = -1
CALL DGEQRF(N,N,A,N,TAU,W,LWORK,INFO)
WRITE(*,*) 'LWORK = ',W(1)
LWORK = INT(W(1))
ALLOCATE(WORK(LWORK),STAT=ALLOC_ERROR)
CALL SYSTEM_CLOCK(CLOCK_START,CLOCK_RATE,CLOCK_MAX)
CALL DGEQRF(N,N,A,N,TAU,WORK,LWORK,INFO)
CALL DORGQR(N,N,N,A,N,TAU,WORK,LWORK,INFO)
CALL SYSTEM_CLOCK(CLOCK_STOP,CLOCK_RATE,CLOCK_MAX)
WRITE(*,*) 'INFO = ',INFO
WRITE(*,*) 'TIME = ',(REAL(CLOCK_STOP-CLOCK_START,KIND=8))/REAL(CLOCK_RATE,KIND=8)
END IF
END PROGRAM MAIN
Here is the output when using /Qmkl:cluster
I am image 2 and I can span 6 OpenMP threads. I am image 1 and I can span 6 OpenMP threads. N = 3000 LWORK = 288096 INFO = 0 TIME = 6.38000000000000 A(N,N) = -2.110006751937421E-002
Here is the output when using /Qmkl:parallel
I am image 2 and I can span 6 OpenMP threads. I am image 1 and I can span 6 OpenMP threads. N = 3000 LWORK = 742977 INFO = 0 TIME = 0.920000000000000 A(N,N) = -2.110006751937324E-002
Here is the build log (when using /Qmkl:cluster)
Compiling with Intel(R) Visual Fortran Compiler 17.0 [Intel(R) 64]... ifort /nologo /O2 /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\include" /Qopenmp /standard-semantics /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /Qmkl:cluster /c /Qcoarray:single /Qlocation,link,"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\\bin\amd64" /Qm64 "D:\TEMP\QR_PERFORMANCE\MAIN.F90" Linking... Link /OUT:"x64\Release\QR_PERFORMANCE.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\lib\release_mt" /MANIFEST /MANIFESTFILE:"x64\Release\QR_PERFORMANCE.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.lib" impi.lib -qm64 /qoffload-ldopts="-mkl=cluster" "x64\Release\MAIN.obj" Embedding manifest... mt.exe /nologo /outputresource:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.exe;#1" /manifest "x64\Release\QR_PERFORMANCE.exe.intermediate.manifest" QR_PERFORMANCE - 0 error(s), 0 warning(s)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Edit: I found the proper threading libraries to solve the problem:
mkl_scalapack_lp64_dll.lib mkl_intel_lp64_dll.lib mkl_core_dll.lib mkl_intel_thread_dll.lib mkl_blacs_lp64_dll.lib impi.lib
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page