- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My worker code is a loop with a number of computations including transcendental functions which is vectorized (using # pragma ivdep).
I use the Intel Composer XE 2011 SP1 within MS Visual Studio 2010 Ultimate.
It is essentially C code, since it uses nothing C++ specific.
If I change the file name extension from .cpp to .c it becomes twice as fast.
I checked the asm code and it turns out that the fast asm code uses
call ___svml_pow2 instead of call ___svml_powf4
and
call ___svml_exp2 instead of call ___svml_expf4
Could that explain the speed difference?
Why does it call different svml modules when the file name extension is changed from .cpp to .c?
I use the Intel Composer XE 2011 SP1 within MS Visual Studio 2010 Ultimate.
It is essentially C code, since it uses nothing C++ specific.
If I change the file name extension from .cpp to .c it becomes twice as fast.
I checked the asm code and it turns out that the fast asm code uses
call ___svml_pow2 instead of call ___svml_powf4
and
call ___svml_exp2 instead of call ___svml_expf4
Could that explain the speed difference?
Why does it call different svml modules when the file name extension is changed from .cpp to .c?
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have changed your source code from float to double data types. Maybe you have some conditional compilation there to make the switch according to language. In the C code you would presumably have called powf() and expf() explicitly or used for C++, that could bring about this change.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please attach the code snippets of the code so we can try to duplicate the issue. or if you have a small testcase, it would be great.
Also what compiler options used?
thanks,
Jennifer
Also what compiler options used?
thanks,
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did not change the source code, just the file name extension.
I also did not change the includes.
I am not aware of any conditional compilation.
I do have powf and expf in my code.
I was just using #include
not tgmath.h.
I also did not change the includes.
I am not aware of any conditional compilation.
I do have powf and expf in my code.
I was just using #include
not tgmath.h.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have also checked the preprocessor output, i.e. the .i file.
The fast version has
(float)pow((double)(T_primary), (double)((2.0f*path_ratio)))
The slow version has
powf(T_primary,(2.0f*path_ratio))
The fast version has
(float)pow((double)(T_primary
The slow version has
powf(T_primary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Changing expf to exp and powf to pow in the source code makes the .c version slightly slower, but seems to have no effect on the speed of the .cpp version.
.c version still outperforming the .cpp version, but by somewhat less than a factor 2.
.c version still outperforming the .cpp version, but by somewhat less than a factor 2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess we solved it.
It turns out that one of the input arrays (T_primary) was completely zero.
The C compiler made use of the fact that those zeros were raised to a certain power.
powf(T_primary,(2.0f*path_ratio))
In this way, it could skip most of the difficult math, we think.
The C++ compiler was not making use of this.
It is worth noting that the C compiler skipped that complicated math only if it were next to another expression.
I.e.
powf(T_primary,some algebra)
was slower (about 0.26 arbitrary time units) than
powf(T_primary,some algebra)*expf(some other algebra).
The latter took about 0.11 arbitrary time units.
For the C++ compiler there was no difference. Both took about 0.26 arbitrary time units.
Now that we reverted to mostly nonzero data for T_primary, things have changed completely.
C++ is faster than C, as it should be, since it makes more efficient calls to svml:
call ___svml_powf4 should be faster than call ___svml_pow2, right?
call ___svml_expf4 should be faster than call ___svml_exp2, right?
Our loops take 0.40 arbitary time units when compiled as C++ and 0.62 arbitrary time units when compiled as C source.
It turns out that one of the input arrays (T_primary
The C compiler made use of the fact that those zeros were raised to a certain power.
powf(T_primary
In this way, it could skip most of the difficult math, we think.
The C++ compiler was not making use of this.
It is worth noting that the C compiler skipped that complicated math only if it were next to another expression.
I.e.
powf(T_primary
was slower (about 0.26 arbitrary time units) than
powf(T_primary
The latter took about 0.11 arbitrary time units.
For the C++ compiler there was no difference. Both took about 0.26 arbitrary time units.
Now that we reverted to mostly nonzero data for T_primary
C++ is faster than C, as it should be, since it makes more efficient calls to svml:
call ___svml_powf4 should be faster than call ___svml_pow2, right?
call ___svml_expf4 should be faster than call ___svml_exp2, right?
Our loops take 0.40 arbitary time units when compiled as C++ and 0.62 arbitrary time units when compiled as C source.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've tried to reproduce this problem but I'm not seeing those effects. It's an interesting sighting that we'd be interested in and would like to understand.
Hence, same as Jennifer, I'd like to ask you whether it would be possible to condense the problem into a simple Visual Studio project so we can take a closer look.
In my opinion it's unlikely that you see such differences because of only changing the suffix. It's true that it makes the compiler switch to a different standard interpretation; but that does not explain what you're seeing.
Also I don't think that the values of the array can have impact, provided those are not known during compile time. If they were the compiler won't need to do calls anyways - it'd calculate the values during compile time already.
Thank you in advance & best regards,
Georg Zitzlsberger
I've tried to reproduce this problem but I'm not seeing those effects. It's an interesting sighting that we'd be interested in and would like to understand.
Hence, same as Jennifer, I'd like to ask you whether it would be possible to condense the problem into a simple Visual Studio project so we can take a closer look.
In my opinion it's unlikely that you see such differences because of only changing the suffix. It's true that it makes the compiler switch to a different standard interpretation; but that does not explain what you're seeing.
Also I don't think that the values of the array can have impact, provided those are not known during compile time. If they were the compiler won't need to do calls anyways - it'd calculate the values during compile time already.
Thank you in advance & best regards,
Georg Zitzlsberger

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page