- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Using Intel CC 14.0 under Visual Studio 2013SP2
- atan2f()
- with AVX: 3.915 sec.
- with SSE2: 0.800 sec.
- atanf() is not affected
- with AVX: 0.475sec.
- with SSE2: 0.626 sec.
- atan2f()
- atan2() is widely used when calculating with complex numbers (to get the phase).
- Double precision seems to be affected too, but the numbers are not as clear as with single precision.
Simplified example code:
const int iterations = 100000; const int size = 2048; float* a = new float[size]; float* b = new float[size]; for (int i = 0; i < size; ++i) { a = 1.1f; b = 2.2f; } for (int j = 0; j < iterations; ++j) { for (int i = 0; i < size; ++i) { a = atan2f(a, b); } } for (int j = 0; j < iterations; ++j) { for (int i = 0; i < size; ++i) { a = atanf(b); } }
Options (simplified from real world project)
- using SSE:
/GS /Qopenmp /Qrestrict /Qansi-alias /W3 /Qdiag-disable:"4267" /Qdiag-disable:"4251" /Zc:wchar_t /Zi /O2 /Ob2 /Fd"Release\64\vc120.pdb" /fp:fast /Qstd=c++11 /Qipo /GF /GT /Zc:forScope /GR /Oi /MD /Fa"Release\64\" /EHsc /nologo /Fo"Release\64\" /Ot /Fp"Release\64\TestPlugin.pch" - using AVX:
/Qopenmp /Qrestrict /Qansi-alias /W3 /Qdiag-disable:"4267" /Qdiag-disable:"4251" /Zc:wchar_t /Zi /O2 /Ob2 /Fd"Release\64\vc120.pdb" /fp:fast /Qstd=c++11 /Qipo /GF /GT /Zc:forScope /GR /arch:AVX /Oi /MD /Fa"Release\64\" /EHsc /nologo /Fo"Release\64\" /Ot /Fp"Release\64\TestPlugin.pch"
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your IPS support account is the way to submit issues where you require security, or wish to be able to track the response without depending on a volunteer from the Intel team. As this appears to be a library issue, it may not be the direct responsibility of Intel people who monitor this site regularly.
I'm still not getting a response from the SAVE step at IPS, and it's scheduled for down time at the end of the week. I thought perhaps my input might be helpful since I set it up to verify on VTune.
There seems to have been some sort of spam attack on Intel sites the last few days; why it's so important to some people to deny us the use of the sites beats me, if in fact there's a connection.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have to say, I can not reproduce the timing results for the SSE2-case anymore. Maybe that was a mistake of mine.
When using 64-bit code with AVX, then the comparison between Intel CC and VC++ is interesting:
- atan2f using VC++ is twice as fast as when using ICC (the missing optimization noted in my first post).
- atan2f using VC++ is twice as fast as atanf when using VC++ (?! - didn't notice that before, maybe related to SP3)
Using Intel CC 14.0, 64-bit, AVX (calling __svml_atanf8/__svml_atan2f8):
- ATan: 0.443 GFLOPS ( 0.462 sec.)
- ATan2: 0.052 GFLOPS ( 3.912 sec.)
Using VS2013SP3, 64-bit, AVX (calling atanf/atan2f):
- ATan: 0.051 GFLOPS ( 3.991 sec.)
- ATan2: 0.111 GFLOPS ( 1.847 sec.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, needed to properly rebuild everything (knew that but forgot).
Here are the results when using 32-bit with SSE2 (calling __svml_atanf4/__svml_atan2f4):
- ATan: 0.333 GFLOPS ( 0.615 sec.)
- ATan2: 0.280 GFLOPS ( 0.731 sec.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Optimizing away 100000 iterations of atan with known initial input is possible at compile time, but seems very unlikely to me. If that is happening (say, if the timing results are implausible), one may add some rand() initialization and print the average of the resulting array "a>>>
I suppose that ICC could optimize away the inner for-loop by removing call statements from the run-time code.Of course I do not expect further optimization like compile-time array filling which could eliminate inner for-loop from the runtime code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Filed as issue 6000062158: "Missing AVX-optimization of atan2f (__svml_atan2f8)"
(Intel C++ Compiler for Windows, Medium, 08/11/2014)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>I Think that compiler could went further in its optimization efforts and simply eliminate inner loop by calculating atan2 values and filling array in compile time>>>
I made mistake in quoted sentence. Of course compiler will not fill in dynamically allocated array at compile time because of new operator.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My Intel premier account is blocked. I've been getting some help from the support team but still can't file this new issue. The site is scheduled down tonight, so we will be waiting another week or two on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
submitted as intel premier issue 6000063006 during today's uptime between IPS site modifications
issue reported closed as a duplicate of another submission without further comment
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This issue is fixed with version 16.0.110.
AVX:
atan2(): 0.864195 seconds
atan(): 0.33743 seconds
SSE2:
atan2(): 0.93485 seconds
atan(): 0.457738 seconds
Bye,
Lars
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »