huge slow down upgrading from ComposerXE 2011.4.184 to sp1.9.289

Azua_Garcia__Giovann · ‎03-23-2012

Hello,

I get a 30% performance degradation after upgrading from ComposerXE version 2011.4.184 to the latest version sp1.9.289.

My benchmark consists of an iterative algorithm where I of course take into account warm up times and do around 300 repetitions for each problem size. The stddevs are really low but the difference in means response time coming out from switching between ComposerXE versions is huge.

My computer setup is (I posted it in a separate thread):
/Users/bravegag/code/fastcode_project/code$ uname -a
Darwin Macintosh-4.local 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64

/Users/bravegag/code/fastcode_project/code$ gcc --version
gcc (GCC) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compilation:
/usr/bin/gcc -DGTEST_HAS_TR1_TUPLE=0 --std=gnu99 -Wall -Wextra -Wshadow -Wstrict-prototypes -Wmissing-prototypes -g3 -ggdb3 -I/Users/bravegag/code/fastcode_project/code/third_party/googletest/include -I/Users/bravegag/code/fastcode_project/code/third_party/googletest -I/opt/intel/composerxe-2011.4.184/mkl/include -I/Users/bravegag/code/fastcode_project/code/src -I/Users/bravegag/code/fastcode_project/code/build -I/Users/bravegag/code/fastcode_project/code/third_party/genrmf -o CMakeFiles/submodularity.dir/src/matrix.c.o -c /Users/bravegag/code/fastcode_project/code/src/matrix.c

What could be wrong here?

TIA,
Best regards,
Giovanni

mecej4 · ‎03-23-2012

Was the change in MKL version the only one, or did you change GCC version, X-Code, etc.? Which MKL function is it that consumes the most time? Can you provide an example?

Azua_Garcia__Giovann · ‎03-23-2012

Hello Mecej4,

No, no changes other than switching the symlinks in /opt/intel/ to make one or the other the current one.

Basically this is my Thesis work, related to the implementation of an iterative algorithm that mutates some matrices and keeps solving them until convergence. All the MKL calls so far are QR decomposition LAPACKE_dgeqrf, Solve (full LAPACKE_dgels or MVM LAPACKE_dormqr and backsubstitution cblas_dtrsm), and MMM cblas_dgemm.

These are my computer specs:
CPU manufacturer Intel
Model name Core 2 Duo T9900
CPU cores 2
CPU-core frequency 3.06Ghz
Cycles/issue for floating point additions (Latency/Throughput) (3/1)
Cycles/issue for floating point (5/1) multiplications (Latency/Throughput)
Maximum theoretical floating point 6 Gflop/s point scalar peak performance
L1 cache size 32KB
L2 cache size 6144KB

TIA,
Best regards,
Giovanni

Gennady_F_Intel · ‎03-23-2012

Ican suggest only one thing in this case - to check the performance results for each of these routines between two versions and let us know the results.

--Gennady

Alexander_K_Intel3 · ‎03-29-2012

Hello Giovanni,

Could you please provide little moredetailsof your testcase like size of matrices, other input parameters like lwork size.
It would be great if you provide a simple reproducer, so we could check exactly the case on our side.

Thanks,
Alexander

Azua_Garcia__Giovann · ‎03-30-2012

Hello Alexander,

I am trying to think how I can do this, if I try to isolate something specific I might not be able to reproduce the slow down and I am concerned about sharing the whole code base, I can ask the Professor though. Do you have any specific emails I could send the stuff to rather then making it public?

Would it be enough for your analysis to give you an executable built in Debug mode?

Another possibility, can I run a profiler or whatever you want me to at the same time while I'm benchmarking.

Best regards,
Giovanni

mecej4 · ‎03-30-2012

To post private messages:

When you create a new thread or reply to an existing post, simply select the "Yes" button next to the label Mark this post Private ? at the bottom of the page, two rows above the Preview Submit Cancel buttons.