log$fenv_access_off, __svml_pow2, __svml_log2, __svml_exp2, pow$fenv_access_off

moortgatgmail_com · ‎04-09-2012

Hi,

I'm profiling the performance of a fairly large finite element reservoir simulator code (with Shark on Mac OS X). I'm noticing that a large percentage of CPU is spent on basic math functions, such as those in the subject line. About 17% of CPU time is spent on libSystem.B.dylib, even though the source code involves only few of these functions. Is this expected behavior, or could there be an error in my blas/lapack linking or something like that.

I'm observing similar behavior when I link with either

1) MKL Blas and Lapack with:

MKL =-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -L$(MKLROOT)/lib $(MKLROOT)/lib/libmkl_blas95_lp64.a $(MKLROOT)/lib/libmkl_lapack95_lp64.a -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm

or

2) link against the BLAS/Lapack libraries included in the Mac OS X by

MKL =-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm

together with

-framework Accelerate

I would appreciate any suggestions to increase the CPU efficiency of these function calls.

--Joachim

TimP · ‎04-09-2012

blas95 is simply a convenience wrapper for the same underlying MKL functions, so the similar performance is expected.
If the exp() and log() calls come from your own source code, as I would expect, it seems unlikely the use of MKL is a player in them. You appear to be getting vectorization, which is good. Assuming you require the double precision math functions, the most likely way to improve performance would be to avoid unnecessary use of them. To give an ugly example, x**3. - x might be replaced by x*(x+1)*(x-1).