Compile FORTRAN using parallel MKL - how to do it in linux?

Letian_W_ · ‎11-05-2013

Folks, Totally new to parallel programming so I need your help. I'm trying to solve a sparse eigenvalue problem using MKL extended eigen solver. It works fine for a small testing problem in sequential mode. However, my matrix size is about 6 million and it takes forever in single core. But I got stucked on how to use the parallel capability, I know the MKL library can run parallel, correct?

My system: Intel 64 linux Software: Intel ComposerXE 2013, mpich installed and compiled by XE 2013 Here is what I did: 1. Compile: mpif90 -mkl=parallel -o test_mpi.x test_sparse_eigen.f90 2. Run: mpiexec -np 8 ./test_mpi.x However, for a smaller testing solution, -np 8 used longer time than -np 1. And when I print out something, it prints out 8 times when I used -np 8 option. I know I might need to add some lines into my code to use the parallel capability, but reall has no idea where to start. Does anybody has a quick instruction and sample file? Very much approiciated and thanks in advance. Attached is my source code (FORTRAN 90). !*********************************************** !this routine test MKL sparse eigen solver implicit real*8 (a-h,o-z) real*8,allocatable::a(:),b(:) integer,allocatable::cola(:),rowa(:),colb(:),rowb(:) real*8,allocatable::e(:), x(:,:) integer fpm(128) real time_begin, time_end m0=50 emin=0.0 emax=2e7 fpm=0 open(98,file='ifort98.dat',form='unformatted') read(98) n, na allocate (a(na),cola(na),rowa(n+1)) read(98) (a(i),i=1,na) read(98) (cola(i),i=1,na) read(98) (rowa(i),i=1,n+1) read(98) n, nb allocate (b(nb),colb(nb),rowb(n+1)) read(98) (b(i),i=1,nb) read(98) (colb(i),i=1,nb) read(98) (rowb(i),i=1,n+1) close(98) call CPU_time(time_begin) allocate (e(m0), x(n,m0)) call feastinit(fpm) print*,fpm call dfeast_scsrgv('U',n,a,rowa,cola,b,rowb,colb,fpm,epsout,loop,emin,emax,m0,e,x,m,res,info) print*,'info=',info print*,'m=',m print*,'loop=',loop print*,'epsout=',epsout open(10,file='test.out') do i=1,m write(10,*) 'mode',i,' Freq=', sqrt(e(i))*0.5/3.1415926535897932 enddo close(10) deallocate (a,b,cola,rowa,colb,rowb,e,x) call cpu_time(time_end) print*,'Total CPU time=', time_end-time_begin stop end

Ying_H_Intel · ‎11-06-2013

Hi Letian,

what fortran compiler are you using?

As your code is not related to mpi, you can build with ifort directly. > ifort -mkl *.f90

you may try pardiso dirctly. see the performance tips in http://software.intel.com/en-us/articles/introduction-to-the-intel-mkl-extended-eigensolver

Best Regards,

Ying

TimP · ‎11-06-2013

If you are using MPI to run 8 separate copies of your test simultaneously, it may not be surprising if CPU time of each copy of the test increases relative to a single copy.

Letian_W_ · ‎11-06-2013

Thanks, Ying/Timp.

Now I recompiled my program per your suggestions:

ifort -mkl=parallel *.f90

Then I set multiple thread by "export OMP_NUM_THREADS=16" and rerun the program. Without setting OMP_NUM_THREADS, my code ran about 20 hours. Now with 16 threads, the program has been running for almost 20 hours. It seems to me that the parallel is not working. Wondering if I need to put some options during compiling. Please suggest. Thanks.

Letian

TimP · ‎11-06-2013

MKL tries to choose the best number of threads by default (1 per core) for those MKL functions which are built with threading. So it will not be surprising if setting the number of threads doesn't improve it.

Ying_H_Intel · ‎11-12-2013

Hi Letian,

If you use top => 1 command, how many cpue are runing?

and if you'd like to see the parallel behavious of mkl function, you may try the code in http://software.intel.com/en-us/articles/intel-mkl-103-getting-started. it is dgemm with gcc or icc.

I show the latest compiler command at http://software.intel.com/comment/1768822#comment-1768822

Best Regards,

Ying