Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

serial MKL with openmp

moortgatgmail_com
553 Views
Hi,
I have the following problem: I want to use MKL, mainly for the Pardiso direct solver, in a relatively large flow simulator.For a part of the code, unrelated to MKL/pardiso, I want to use openmp to parallelize certain CPU intensive loops. I can compile the code without MKL and see nearly optimal linear scaling of my parallelized loop when I compile with openmp. However, when I link to MKL, every other part of the code becomes significantly slower in terms of total CPU time. The wall clock time of a simulation may be slightly reduced, but it seems MKL uses some inefficient attempts at parallelization that result in much higher CPU cost. As a result, I want to disable all MKL parallelization, if only to test the scaling of the parallelization that I implement explicitely myself.
This MKL behavior is strange, because I'm using the sequential MKL from the link advisor, and I set bothOMP_NUM_THREADS=1 andOMP_MAX_THREADS=1 in my .profile file, and use

!$OMP PARALLEL NUM_THREADS(4) only for the loop I want to parallelize.

How can I use the -openmp flag in a makefile for individual modules/subroutines, without it somehow applying to all MKL routines. I'm pasting the full makefile below for completeness, as well as the evironment settings.

--J

.SUFFIXES: $(SUFFIXES) .f90

FC = /usr/bin/ifort

FAST = -O3 -m64 -AVX

FASTT = -O3 -m64 -AVX -openmp

OBJDIR = Obj/

MOD = Mod/

GEO = Geo/

DIFF = Diff/

COMM = Comm/

DGM = Dgm/

SOLVER = Solver/

FLUID = Fluid/

FLASH = Flash/

MKLROOT = /opt/intel/composer_xe_2011_sp1.9.289/mkl/

# MKL =-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -L$(MKLROOT)/lib $(MKLROOT)/lib/libmkl_blas95_lp64.a $(MKLROOT)/lib/libmkl_lapack95_lp64.a -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm

# MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm

MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

# MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -openmp -lpthread -lm

# MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm

OBJS = $(OBJDIR)mod_mesh.o $(OBJDIR)mod_initializations.o\\

$(OBJDIR)gas.o $(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_init_data.o $(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_peng_rob_eos.o $(OBJDIR)mod_viscosity.o\\

$(OBJDIR)mod_linear_solver.o $(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_comp_matrix.o $(OBJDIR)mod_flash.o\\

$(OBJDIR)mod_comp_fluxes.o $(OBJDIR)mod_comp_flow.o\\

$(OBJDIR)mod_slope_limiter.o\\

$(OBJDIR)mod_time.o $(OBJDIR)mod_diffusion.o

TARGET = CHOMPFRS3D.e

all : $(TARGET)

CHOMPFRS3D.e : $(OBJS)

$(FC) $(FASTT) -module $(MOD) $(MLKLIB) $(MKL) $(OBJDIR)$** -o $@

#1------------------------------------------------------

$(OBJDIR)gas.o : $(OBJDIR)mod_mesh.o $(OBJDIR)mod_initializations.o\\

$(OBJDIR)mod_fluid.o $(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_variables.o $(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_viscosity.o $(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_linear_solver.o $(OBJDIR)mod_comp_matrix.o\\

$(OBJDIR)mod_comp_fluxes.o $(OBJDIR)mod_comp_flow.o\\

$(OBJDIR)mod_diffusion.o $(OBJDIR)mod_time.o gas.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#=======================================================

$(OBJDIR)mod_initializations.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_viscosity.o\\

$(OBJDIR)mod_linear_solver.o\\

$(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_flash.o\\

$(OBJDIR)mod_comp_fluxes.o\\

$(COMM)mod_initializations.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_mesh.o : $(GEO)mod_mesh.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

$(GEO)mod_mesh.f90 : $(GEO)read_mesh.f90\\

$(GEO)comp_dist_vol.f90

#-------------------------------------------------------

$(OBJDIR)mod_fluid.o : $(OBJDIR)mod_mesh.o\\

$(FLUID)mod_fluid.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_peng_rob_eos.o : $(OBJDIR)mod_mesh.o $(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_variables.o\\

$(FLUID)mod_peng_rob_eos.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_init_data.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_variables.o\\

$(FLUID)mod_init_data.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_variables.o : $(OBJDIR)mod_mesh.o\\

$(DGM)mod_variables.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_viscosity.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(FLUID)mod_viscosity.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_flash.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_viscosity.o\\

$(OBJDIR)mod_diffusion.o\\

$(FLASH)mod_flash.f90

$(FC) $(FASTT) $? -c -module $(MOD) -o $@

$(FLASH)mod_flash.f90 : $(FLASH)stability.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash2f.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash3f.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash2f_PR.f90

$(FLASH)mod_flash.f90 : $(FLASH)PressPMV.f90

$(FLASH)mod_flash.f90 : $(FLASH)eos.f90

$(FLASH)mod_flash.f90 : $(FLASH)PressPMV_PR.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash_nodes.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash_stability.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash_stability_PR.f90

#-------------------------------------------------------

$(OBJDIR)mod_linear_solver.o : $(OBJDIR)mod_mesh.o\\

$(SOLVER)mod_linear_solver.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_slope_limiter.o : $(OBJDIR)mod_mesh.o\\

$(DGM)mod_slope_limiter.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_inv_bk.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(DGM)mod_inv_bk.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_comp_flow.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_slope_limiter.o\\

$(OBJDIR)mod_init_data.o\\

$(DGM)mod_comp_flow.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_time.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(COMM)mod_time.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_comp_fluxes.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_diffusion.o\\

$(DGM)mod_comp_fluxes.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_diffusion.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_linear_solver.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_init_data.o\\

$(DIFF)mod_diffusion.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_comp_matrix.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_inv_bk.o\\

$(DGM)mod_comp_matrix.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

clean :

@-rm Obj/*.o

@-rm CHOMPFRS3D.e

@-rm Mod/*.mod

------- .profile content below:

export OMP_NUM_THREADS=1

export OMP_MAX_THREADS=1

export TEC_RS_2009=/usr/tecRS_2009_R2

export FORT_FMT_RECL=2000

export DYLD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.9.289/compiler/lib:/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib:/opt/intel/Compiler/11.1/088/Frameworks/mkl/lib/em64t:/opt/intel/Compiler/11.1/08$

export LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib

export LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.9.289/compiler/lib:/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib

export NLSPATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib/locale/%l_%t/%N

export MANPATH=/opt/intel/composer_xe_2011_sp1.9.289/man/en_US:/opt/local/share/man:/opt/local/man:

export INCLUDE=/opt/intel/composer_xe_2011_sp1.9.289/mkl/include

export FPATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/include

export CPATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/include

export KMP_AFFINITY=compact,1

0 Kudos
3 Replies
Anonymous66
Valued Contributor I
553 Views
Hi,

Iam movingthis issueto the MKL forum since your question is about MKL.

Regards,
Annalee
Intel Developer Support
0 Kudos
Ying_H_Intel
Employee
553 Views
Hi
0 Kudos
Vladimir_Petrov__Int
New Contributor III
553 Views
Hi,

Regarding your comment:

"This MKL behavior is strange, because I'm using the sequential MKL from the link advisor, and I set bothOMP_NUM_THREADS=1 andOMP_MAX_THREADS=1 in my .profile file, and use

!$OMP PARALLEL NUM_THREADS(4) only for the loop I want to parallelize."

It seems you prevent the parallelization yourself by setting OMP_NUM_THREADS to 1.
Could you please try setting it to 4 (keeping the link line the same)?
Also I don't think you need that OMP_MAX_THREADS at all.

Best regards,
Vladimir
0 Kudos
Reply