Intel® Distribution for Python*
Support and discussions for achieving faster Python* applications and core computational packages.
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.

Large overheads on scipy.linalg.cholesky calls


I've been doing some simple tests, turning on MKL_VERBOSE and comparing the times reported there against times measured in Python.  My question is about the observed overheads.

The setup I used to generate these results is Ivy Bridge (2x12 cores, 24 cores total) on Edison at NERSC.  I can run on Haswell too and I recall seeing similar overheads.  I've installed IDP through Anaconda (Python 2.7) following these instructions:

See the attached script.  For a 5k x 5k matrix, the following was typical output:

MKL_VERBOSE DSYRK(L,N,5000,5000,0x7fffffff4c70,0x2aaaf0000010,5000,0x7fffffff4c78,0x2aaafbebd010,5000) 335.77ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE DPOTRF(U,5000,0x2aab07d7a010,5000,0) 103.63ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
PYTHON_TIME 0.607532024384

The numbers from MKL_VERBOSE are obviously in ms, and my number (PYTHON_TIME, in seconds) is 6x longer.  I've experimented with placing the time() calls directly around the call to potrf inside the scipy source but it just gives me the same answer.  I haven't tried to dig deeper, and I am just wondering if (a) I am interpreting the MKL_VERBOSE result properly, and (b) whether you can suggest where the overheads is coming from (they don't appear to be fixed, they vary with matrix size).

0 Kudos
2 Replies

The scipy.linalg.cholesky is implemented via a call into Fortran code (see scipy/linalg/ The input matrix, which is generated in the C layout thus needs to be converted to the F-layout.

Changing the function in your benchmark to convert the input to F-layout prior to measuring timing of the Cholesky decomposition like so:

def one_iteration( size ) :
    A   = numpy.random.uniform( size = ( size, size ) )
    ATA =, A).copy(order='F')
    start = time.time()
    scipy.linalg.cholesky( ATA, overwrite_a = True, check_finite = False )
    stop = time.time()
    return stop - start

results in better agreement between MKL's and Python's time measurements.

MKL_VERBOSE DPOTRF(U,10000,0x7fd1c050f010,10000,0) 1.64s CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:16 WDiv:HOST:+0.000
PYTHON_TIME 1.7553684711456299



Ah of course, thanks, that brings the two numbers into agreement.