- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After switching from 12.1 to Composer XE 2013 ( Update 1, Windows 64-bit) I am seeing a consistent 10-15% slowdown across the board( code is built and benchmarked on a Quad Core Xeon). C++ Code compiled /O3, no auto-parellization.
Is this a known issue to be fixed in an update?
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Reading this topic I decided to re-compile some of my own test codes with Intel 12.1.4.325, Intel 13.1.0.149 and MSVC 2008. Follows some results in seconds :
13.1.0.149 12.1.4.325 MSVC 2008
NNS brute 43.87 52.55 70.40
NNS NearPT 0.363 0.411 0.2893
NNS kdtree 1.138 1.152 1.074
Mixed code 3.732 3.649 4.937
All compilations with equivalent optimization options, generating code for 32 bits, running under Windows 7 in i7 920. Target proc. as SSE2 only. NNS is "Nearest Neighbor Search", in three flavors . "mixed code" use 64 bits integer, floating point operations including some trascendental functions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey. I was impressed by the times you reported with MinGW, so I teste the last version available. But my results are very bad for the same test cases in my previous post. I am not familiar with MinGW, my switches were -O2 -msse3 . Results :
NNS brute 107.06
NNS NearPt 1.989
NNS kdtree 5.951
Mixed code 7.336 (very good for 64 bits ints, very bad for transcendental and Taylor series)
Could you please , recommend me better switches for maximum performance in SSE3 with MinGW ?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The original issue I reported was a very specific issue related to inlining inside a 'bottleneck' function .Once this specific issue was worked-around, I found no other performance issues with compiler 13.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey, Your results are astonishing for me ! My source code is not C++, it is C99, this could make a great difference for the compiler optimizer (only my guessing). In our case performance is very important because we develop medical physics systems for radiation therapy planning and image guided neurosurgery; in some regions a solution takes several seconds or minutes. I frequently use a battery of test cases in C99 that resembles real problems. I will try some other codes with MinGW but my first impressions were frustraiting with it. For Intel compiler I mostly use :
/O3 /Ob2 /Oi /Ot /Oy /Qip /GA /GF /MT /GS- /arch:SSE2 /fp:fast=2 /Qfp-speculation:fast /fp:double /Qparallel /Qstd=c99
The use of a conservative SSE2 is to avoid problems with users keeping old hardware and some with AMD CPUs. Some of this switches are unknown for me in MinGW, for example /Qparallel .
When I have new reults I will put them here.
Armando
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Armando Lazaro Alaminos Bouza wrote:
My source code is not C++, it is C99, this could make a great difference for the compiler optimizer (only my guessing). In our case performance is very important because we develop medical physics systems for radiation therapy planning and image guided neurosurgery; in some regions a solution takes several seconds or minutes. I frequently use a battery of test cases in C99 that resembles real problems. I will try some other codes with MinGW but my first impressions were frustraiting with it. For Intel compiler I mostly use :
/O3 /Ob2 /Oi /Ot /Oy /Qip /GA /GF /MT /GS- /arch:SSE2 /fp:fast=2 /Qfp-speculation:fast /fp:double /Qparallel /Qstd=c99
The use of a conservative SSE2 is to avoid problems with users keeping old hardware and some with AMD CPUs. Some of this switches are unknown for me in MinGW, for example /Qparallel .
C99 vs C++ makes no difference to the optimizer, with the compilers mentioned here, unless your style changes (as it well might).
g++ accepts the C99 restrict qualifier if it is spelled __restrict (and Intel C++ for linux accepts that spelling as well) ICL accepts restrict in C++ code with option /Qrestrict. Both Intel and gnu compilers make good use of restrict pointers ( * __restrict ptr).
You have a point in that some of the C99 features for optimization will not work with all C++ compilers
Intel 14.0 beta shows better optimization of certain STL as well as plain C99 features enabled by __restrict where current Intel released compilers require #pragma ivdep.
I find such long option strings confusing I will mention that /fp:double could prevent some optimizations on float data type due to the requirement to promote so many operations to double. It also prevents optimizations on sum reduction, where it would be better to write in the double casts and reduction variables in your source code so as to control how it's applied.
If you use only double data types, /fp:double I think would have the same effect as /fp:source.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the clarifications ! I stoped using explicit restrict on the switches because the C99 option ( /Qstd=c99 ) should include that. About the "source" or "double" evaluation for floating point, I use "source" in some code where single pres. is acceptable and "double" where needing to preserve all the bits of the representation. Of course, mixing has a performance penalty.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey, Good point and topic ! ( source vs double : precision vs performance )
I learned about that three years ago when migrated my projects from Watcom C (OpenWatcom at the time) to Intel. In Watcom every floating point is processed with the traditional FPU (80 bits) and the compiler use as acummulator an FPU register. So my first .exe generated with Intel gave different and worst results. Using the switch /fp:double in intel was enough to reach same results, as far as they are physically relevant.
By the way, if you like to try good old C/C++ compilers, take a look a Watcom. I think that it has not support for modern flavors of C++. But, for example, integer processing , bit operations, etc are great . In the floating point processing it is weak, because there is no use of SSEx. Other drawback (for me) is lack of support for OpenMP. In the past (some 15 years ago) Watcom was the performance winner in most contest, in front of Borland, MS, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »