- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm profiling the performance of a fairly large finite element reservoir simulator code (with Shark on Mac OS X). I'm noticing that a large percentage of CPU is spent on basic math functions, such as those in the subject line. About 17% of CPU time is spent on libSystem.B.dylib, even though the source code involves only few of these functions. Is this expected behavior, or could there be an error in my blas/lapack linking or something like that.
I'm observing similar behavior when I link with either
1) MKL Blas and Lapack with:
MKL =-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -L$(MKLROOT)/lib $(MKLROOT)/lib/libmkl_blas95_lp64.a $(MKLROOT)/lib/libmkl_lapack95_lp64.a -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm
or
2) link against the BLAS/Lapack libraries included in the Mac OS X by
MKL =-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm
together with
-framework Accelerate
I would appreciate any suggestions to increase the CPU efficiency of these function calls.--Joachim
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
blas95 is simply a convenience wrapper for the same underlying MKL functions, so the similar performance is expected.
If the exp() and log() calls come from your own source code, as I would expect, it seems unlikely the use of MKL is a player in them. You appear to be getting vectorization, which is good. Assuming you require the double precision math functions, the most likely way to improve performance would be to avoid unnecessary use of them. To give an ugly example, x**3. - x might be replaced by x*(x+1)*(x-1).
If the exp() and log() calls come from your own source code, as I would expect, it seems unlikely the use of MKL is a player in them. You appear to be getting vectorization, which is good. Assuming you require the double precision math functions, the most likely way to improve performance would be to avoid unnecessary use of them. To give an ugly example, x**3. - x might be replaced by x*(x+1)*(x-1).
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page