How to use Parallel MKL from linux?

Letian_W_ · ‎11-05-2013

Folks, Very new to parallel programming so need your help. I'm trying to solve a very large symmetric sparse general eigenvalue problem using extended eigen solver. I have no problem to do a smaller scale problem by one thread using "dfeat_scsrgv" subroutine. However, I have no clue on how to increase the speed by utilizing the parallel capability. My system: Linux intel 64 Software: Intel ComposerXE 2013, mpich compiled by XE 2013 Here is what I did: 1. Compile: mpif90 -mkl=parallel -o test_mpi.x test_sparse_solver.f90 2. Run: mpiexec -np 8 ./test_mpi.x The running itself was ok but my concern is if I really used the parallel capability. For a smaller problem with 2000 equations, using "-np 8" took longer time than "-np 1". I realized I might need to change the source code, but have no clue on where to start. Could you give me some quick reference to get it run parallelly? very much appriciate and thanks in advance. Letian Here is my source code: (FORTRAN 90) !this routine test MKL sparse eigen solver implicit real*8 (a-h,o-z) real*8,allocatable::a(:),b(:) integer,allocatable::cola(:),rowa(:),colb(:),rowb(:) real*8,allocatable::e(:), x(:,:) integer fpm(128) real time_begin, time_end m0=50 emin=0.0 emax=2e7 fpm=0 open(98,file='ifort98.dat',form='unformatted') read(98) n, na allocate (a(na),cola(na),rowa(n+1)) read(98) (a(i),i=1,na) read(98) (cola(i),i=1,na) read(98) (rowa(i),i=1,n+1) read(98) n, nb allocate (b(nb),colb(nb),rowb(n+1)) read(98) (b(i),i=1,nb) read(98) (colb(i),i=1,nb) read(98) (rowb(i),i=1,n+1) close(98) call CPU_time(time_begin) allocate (e(m0), x(n,m0)) call feastinit(fpm) print*,fpm call dfeast_scsrgv('U',n,a,rowa,cola,b,rowb,colb,fpm,epsout,loop,emin,emax,m0,e,x,m,res,info) print*,'info=',info print*,'m=',m print*,'loop=',loop print*,'epsout=',epsout open(10,file='test.out') do i=1,m write(10,*) 'mode',i,' Freq=', sqrt(e(i))*0.5/3.1415926535897932 enddo close(10) deallocate (a,b,cola,rowa,colb,rowb,e,x) call cpu_time(time_end) print*,'Total CPU time=', time_end-time_begin stop end

Ying_H_Intel · ‎11-06-2013

Hi Letian,

It looks a big question. I may suggest you to start with MKL internal parallel.

As for most of case, MKL have explored the best parallel performance on multi-core based on your system configuration and problem size. If you call threaded MKL library, your application will get parallel automatically.

For example, you may try the pardiso first to see the performance change with export MKL_NUM_THREADS=1/2/4/8, also with command

> ifort -mkl your.f90

>export MKL_NUM_THREADS=1

>a.out

( I'm not sure how mpi process influence the MKL thread ,which is based on OpenMP)

Then if you really need parallelize your application yourself, you may need to learn all kind parallel method, typically, OpenMP as

http://software.intel.com/en-us/forums/topic/487697

and pThread on Linux.

+ threaded MKL library (-lmkl_intel_thread -lmkl_core -liomp5) .

You may search in the forum or mkl userguide. Here is one documentation about this for your reference.

http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications

Best Regards,

Ying

TimP · ‎11-06-2013

People who are interested in cpu_time for parallel benchmarks usually consider an increase as a favorable result, using it along with the elapsed time (e.g. from system_clock) to calculate "concurrency" (the ratio of cpu time to elapsed time).

The new compiler feature !$omp parallel do simd is particularly hoggish in terms of making big increases in CPU time, on the assumption that enough threads will be used to make a reduction in elapsed time.

Hyperthreading enthusiasts don't always care even about a reduction in elapsed time; they simply like to see a large concurrency figure.