huge slow down upgrading from ComposerXE 2011.4.184 to sp1.9.289
I get a 30% performance degradation after upgrading from ComposerXE version 2011.4.184 to the latest version sp1.9.289.
My benchmark consists of an iterative algorithm where I of course take into account warm up times and do around 300 repetitions for each problem size. The stddevs are really low but the difference in means response time coming out from switching between ComposerXE versions is huge.
My computer setup is (I posted it in a separate thread): /Users/bravegag/code/fastcode_project/code$ uname -a Darwin
Macintosh-4.local 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12
18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64
/Users/bravegag/code/fastcode_project/code$ gcc --version gcc (GCC) 4.6.3 Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
No, no changes other than switching the symlinks in /opt/intel/ to make one or the other the current one.
Basically this is my Thesis work, related to the implementation of an iterative algorithm that mutates some matrices and keeps solving them until convergence. All the MKL calls so far are QR decomposition LAPACKE_dgeqrf, Solve (full LAPACKE_dgels or MVM LAPACKE_dormqr and backsubstitution cblas_dtrsm), and MMM cblas_dgemm.
These are my computer specs: CPU manufacturer Intel Model name Core 2 Duo T9900 CPU cores 2 CPU-core frequency 3.06Ghz Cycles/issue for floating point additions (Latency/Throughput) (3/1) Cycles/issue for floating point (5/1) multiplications (Latency/Throughput) Maximum theoretical floating point 6 Gflop/s point scalar peak performance L1 cache size 32KB L2 cache size 6144KB
Could you please provide little moredetailsof your testcase like size of matrices, other input parameters like lwork size. It would be great if you provide a simple reproducer, so we could check exactly the case on our side.
I am trying to think how I can do this, if I try to isolate something specific I might not be able to reproduce the slow down and I am concerned about sharing the whole code base, I can ask the Professor though. Do you have any specific emails I could send the stuff to rather then making it public?
Would it be enough for your analysis to give you an executable built in Debug mode?
Another possibility, can I run a profiler or whatever you want me to at the same time while I'm benchmarking.
When you create a new thread or reply to an existing post, simply select the "Yes" button next to the label Mark this post Private ?at the bottom of the page, two rows above the Preview Submit Cancel buttons.