- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I just downloaded a copy of the v10 compiler, and it would seem that the performance has actually dropped quite noticeably - by about 15% in fact. In my test program (7000 iterations of relatively simple sine curve fitting).
Compiled using v9.1.049 runs in 55 second
Compiled using v10.0.023 runs in 63 second
The compile stage output is reporting the same loops being vectorized.
Is there any reason why v10 would be slower?
The compiler options I am using are:
-fpic -O3 -fomit-frame-pointer -march=pentium3 -mcpu=pentium3 -msse -funroll-loops -mtune=pentium3 -fp-model fast=2 -rcd -xK -ipo -w1 -vec-report3
The machine in question is a Pentium 3, as the options indicate. Is there something new in the way v10 behaves? Is there another optimizer flag I have to add somewhere to get those 15% back?
Thanks.
I just downloaded a copy of the v10 compiler, and it would seem that the performance has actually dropped quite noticeably - by about 15% in fact. In my test program (7000 iterations of relatively simple sine curve fitting).
Compiled using v9.1.049 runs in 55 second
Compiled using v10.0.023 runs in 63 second
The compile stage output is reporting the same loops being vectorized.
Is there any reason why v10 would be slower?
The compiler options I am using are:
-fpic -O3 -fomit-frame-pointer -march=pentium3 -mcpu=pentium3 -msse -funroll-loops -mtune=pentium3 -fp-model fast=2 -rcd -xK -ipo -w1 -vec-report3
The machine in question is a Pentium 3, as the options indicate. Is there something new in the way v10 behaves? Is there another optimizer flag I have to add somewhere to get those 15% back?
Thanks.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
did you see any loop that is vectorized with 9.1 but not vectorized with 10.0? try with /Qparallel too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, this is what surprised me. The compiler output is similar, and in terms of vectorized loops, all the same loops get vectorized. But the code then runs about 15% slower.
I haven't had a chance to test this on more than one machine yet (only have a P3 handy). I'll try it on a Core 2 soon.
The don't think the program I am playing with at the moment would benefit from parallel options. The overhead of spawning threads would likely outweigh any benefits.
I haven't had a chance to test this on more than one machine yet (only have a P3 handy). I'll try it on a Core 2 soon.
The don't think the program I am playing with at the moment would benefit from parallel options. The overhead of spawning threads would likely outweigh any benefits.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page