- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I get a 30% performance degradation after upgrading from ComposerXE version 2011.4.184 to the latest version sp1.9.289.
My benchmark consists of an iterative algorithm where I of course take into account warm up times and do around 300 repetitions for each problem size. The stddevs are really low but the difference in means response time coming out from switching between ComposerXE versions is huge.
My computer setup is (I posted it in a separate thread):
/Users/bravegag/code/fastcode_project/code$ uname -a
Darwin Macintosh-4.local 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64
/Users/bravegag/code/fastcode_project/code$ gcc --version
gcc (GCC) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compilation:
/usr/bin/gcc -DGTEST_HAS_TR1_TUPLE=0 --std=gnu99 -Wall -Wextra -Wshadow -Wstrict-prototypes -Wmissing-prototypes -g3 -ggdb3 -I/Users/bravegag/code/fastcode_project/code/third_party/googletest/include -I/Users/bravegag/code/fastcode_project/code/third_party/googletest -I/opt/intel/composerxe-2011.4.184/mkl/include -I/Users/bravegag/code/fastcode_project/code/src -I/Users/bravegag/code/fastcode_project/code/build -I/Users/bravegag/code/fastcode_project/code/third_party/genrmf -o CMakeFiles/submodularity.dir/src/matrix.c.o -c /Users/bravegag/code/fastcode_project/code/src/matrix.c
What could be wrong here?
TIA,
Best regards,
Giovanni
I get a 30% performance degradation after upgrading from ComposerXE version 2011.4.184 to the latest version sp1.9.289.
My benchmark consists of an iterative algorithm where I of course take into account warm up times and do around 300 repetitions for each problem size. The stddevs are really low but the difference in means response time coming out from switching between ComposerXE versions is huge.
My computer setup is (I posted it in a separate thread):
/Users/bravegag/code/fastcode_project/code$ uname -a
Darwin Macintosh-4.local 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64
/Users/bravegag/code/fastcode_project/code$ gcc --version
gcc (GCC) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compilation:
/usr/bin/gcc -DGTEST_HAS_TR1_TUPLE=0 --std=gnu99 -Wall -Wextra -Wshadow -Wstrict-prototypes -Wmissing-prototypes -g3 -ggdb3 -I/Users/bravegag/code/fastcode_project/code/third_party/googletest/include -I/Users/bravegag/code/fastcode_project/code/third_party/googletest -I/opt/intel/composerxe-2011.4.184/mkl/include -I/Users/bravegag/code/fastcode_project/code/src -I/Users/bravegag/code/fastcode_project/code/build -I/Users/bravegag/code/fastcode_project/code/third_party/genrmf -o CMakeFiles/submodularity.dir/src/matrix.c.o -c /Users/bravegag/code/fastcode_project/code/src/matrix.c
What could be wrong here?
TIA,
Best regards,
Giovanni
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Was the change in MKL version the only one, or did you change GCC version, X-Code, etc.? Which MKL function is it that consumes the most time? Can you provide an example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Mecej4,
No, no changes other than switching the symlinks in /opt/intel/ to make one or the other the current one.
Basically this is my Thesis work, related to the implementation of an iterative algorithm that mutates some matrices and keeps solving them until convergence. All the MKL calls so far are QR decomposition LAPACKE_dgeqrf, Solve (full LAPACKE_dgels or MVM LAPACKE_dormqr and backsubstitution cblas_dtrsm), and MMM cblas_dgemm.
These are my computer specs:
CPU manufacturer Intel
Model name Core 2 Duo T9900
CPU cores 2
CPU-core frequency 3.06Ghz
Cycles/issue for floating point additions (Latency/Throughput) (3/1)
Cycles/issue for floating point (5/1) multiplications (Latency/Throughput)
Maximum theoretical floating point 6 Gflop/s point scalar peak performance
L1 cache size 32KB
L2 cache size 6144KB
TIA,
Best regards,
Giovanni
No, no changes other than switching the symlinks in /opt/intel/ to make one or the other the current one.
Basically this is my Thesis work, related to the implementation of an iterative algorithm that mutates some matrices and keeps solving them until convergence. All the MKL calls so far are QR decomposition LAPACKE_dgeqrf, Solve (full LAPACKE_dgels or MVM LAPACKE_dormqr and backsubstitution cblas_dtrsm), and MMM cblas_dgemm.
These are my computer specs:
CPU manufacturer Intel
Model name Core 2 Duo T9900
CPU cores 2
CPU-core frequency 3.06Ghz
Cycles/issue for floating point additions (Latency/Throughput) (3/1)
Cycles/issue for floating point (5/1) multiplications (Latency/Throughput)
Maximum theoretical floating point 6 Gflop/s point scalar peak performance
L1 cache size 32KB
L2 cache size 6144KB
TIA,
Best regards,
Giovanni
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ican suggest only one thing in this case - to check the performance results for each of these routines between two versions and let us know the results.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Giovanni,
Could you please provide little moredetailsof your testcase like size of matrices, other input parameters like lwork size.
It would be great if you provide a simple reproducer, so we could check exactly the case on our side.
Thanks,
Alexander
Could you please provide little moredetailsof your testcase like size of matrices, other input parameters like lwork size.
It would be great if you provide a simple reproducer, so we could check exactly the case on our side.
Thanks,
Alexander
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Alexander,
I am trying to think how I can do this, if I try to isolate something specific I might not be able to reproduce the slow down and I am concerned about sharing the whole code base, I can ask the Professor though. Do you have any specific emails I could send the stuff to rather then making it public?
Would it be enough for your analysis to give you an executable built in Debug mode?
Another possibility, can I run a profiler or whatever you want me to at the same time while I'm benchmarking.
Best regards,
Giovanni
I am trying to think how I can do this, if I try to isolate something specific I might not be able to reproduce the slow down and I am concerned about sharing the whole code base, I can ask the Professor though. Do you have any specific emails I could send the stuff to rather then making it public?
Would it be enough for your analysis to give you an executable built in Debug mode?
Another possibility, can I run a profiler or whatever you want me to at the same time while I'm benchmarking.
Best regards,
Giovanni
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To post private messages:
When you create a new thread or reply to an existing post, simply select the "Yes" button next to the label Mark this post Private ? at the bottom of the page, two rows above the Preview Submit Cancel buttons.
When you create a new thread or reply to an existing post, simply select the "Yes" button next to the label Mark this post Private ? at the bottom of the page, two rows above the Preview Submit Cancel buttons.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page