- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been doing some simple tests, turning on MKL_VERBOSE and comparing the times reported there against times measured in Python. My question is about the observed overheads.
The setup I used to generate these results is Ivy Bridge (2x12 cores, 24 cores total) on Edison at NERSC. I can run on Haswell too and I recall seeing similar overheads. I've installed IDP through Anaconda (Python 2.7) following these instructions:
https://software.intel.com/en-us/articles/using-intel-distribution-for-python-with-anaconda
See the attached script. For a 5k x 5k matrix, the following was typical output:
MKL_VERBOSE DSYRK(L,N,5000,5000,0x7fffffff4c70,0x2aaaf0000010,5000,0x7fffffff4c78,0x2aaafbebd010,5000) 335.77ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:24 WDiv:HOST:+0.000 MKL_VERBOSE DPOTRF(U,5000,0x2aab07d7a010,5000,0) 103.63ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:24 WDiv:HOST:+0.000 PYTHON_TIME 0.607532024384
The numbers from MKL_VERBOSE are obviously in ms, and my number (PYTHON_TIME, in seconds) is 6x longer. I've experimented with placing the time() calls directly around the call to potrf inside the scipy source but it just gives me the same answer. I haven't tried to dig deeper, and I am just wondering if (a) I am interpreting the MKL_VERBOSE result properly, and (b) whether you can suggest where the overheads is coming from (they don't appear to be fixed, they vary with matrix size).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The scipy.linalg.cholesky is implemented via a call into Fortran code (see scipy/linalg/decomp_cholesky.py#L27). The input matrix, which is generated in the C layout thus needs to be converted to the F-layout.
Changing the function in your benchmark to convert the input to F-layout prior to measuring timing of the Cholesky decomposition like so:
def one_iteration( size ) : A = numpy.random.uniform( size = ( size, size ) ) ATA = numpy.dot(A.T, A).copy(order='F') start = time.time() scipy.linalg.cholesky( ATA, overwrite_a = True, check_finite = False ) stop = time.time() return stop - start
results in better agreement between MKL's and Python's time measurements.
MKL_VERBOSE DPOTRF(U,10000,0x7fd1c050f010,10000,0) 1.64s CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 WDiv:HOST:+0.000 PYTHON_TIME 1.7553684711456299
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah of course, thanks, that brings the two numbers into agreement.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page