what fortran compiler are you using?
As your code is not related to mpi, you can build with ifort directly. > ifort -mkl *.f90
you may try pardiso dirctly. see the performance tips in http://software.intel.com/en-us/articles/introduction-to-the-intel-mkl-extended-eigensolver
Now I recompiled my program per your suggestions:
ifort -mkl=parallel *.f90
Then I set multiple thread by "export OMP_NUM_THREADS=16" and rerun the program. Without setting OMP_NUM_THREADS, my code ran about 20 hours. Now with 16 threads, the program has been running for almost 20 hours. It seems to me that the parallel is not working. Wondering if I need to put some options during compiling. Please suggest. Thanks.
MKL tries to choose the best number of threads by default (1 per core) for those MKL functions which are built with threading. So it will not be surprising if setting the number of threads doesn't improve it.
If you use top => 1 command, how many cpue are runing?
and if you'd like to see the parallel behavious of mkl function, you may try the code in http://software.intel.com/en-us/articles/intel-mkl-103-getting-started. it is dgemm with gcc or icc.
I show the latest compiler command at http://software.intel.com/comment/1768822#comment-1768822