- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel x86 (without vectorization): 3 seconds
MSVC x64: 5 seconds
Java x86/x64: 7 seconds
Intel x64 (with vectorization): 9.5 seconds
Intel x86 (with vectorization): 9.5 seconds
Intel x64 (without vectorization): 12 seconds
MSVC x86: 15 seconds (uhh)
The bottom line is that the ICC seems to fail at vectorization this simple loop and also
the x64 build is very slow.
Additionally, the auto-parallelization fails either. If enabled, the code consumes about 70% CPU, instead of 25%,
but doesn't run any faster and compared to the winning build even 3 times slower (the winning build runs on one thread!).
This is really strange!
Since I want to create high performance programs and I thought the ICC would come in handy, now I am a bit shocked about
these results. Maybe someone can shed some light to this?
PS: I don't think posting the entire question here is a good idea ;). Just follow the link, were you can find the source
code as well as disassemblies of ICC and MSVC builds.
regards
chris
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
icc 12 expects you to try the #pragma reduction (+:.... for vectorization of sum reduction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is not recommended to use STL templates when doing performance evaluations of different
C/C++compilers because they have different implementations of STL.
Ideally,atest code has to be witten in C and should beas generic as possible. This is because C++
operators, like '=','+=', '*','[]', etc, could be inderectlyused. And of courseall of themwill have different
implementations depending on a C/C++ compiler.
In your test codeC++ operatorsare not used since 'arrPtr' is used instead of 'arr'.
Did you test in Debug or Release configuration in all cases?
In your test code:
...
long long var = 0;
std::array
int *arrPtr = arr.data();
CHighPrecisionTimer timer;
for(int i = 0; i < 1024; i++)
arrPtr = i;
timer.Start();
for(int i = 0; i < 1024 * 1024 * 10; i++)
{
for(int x = 0; x < 1024; x++)
{
var += arrPtr
}
}
timer.Stop();
...
there is a "hidden" cast from 'int' to 'long long' here:
...
var += arrPtr
...
but I'm not sure that it affects performance results. I would try tocompare withatype 'float' as well.
Java performance results are good butI would rather compare performance ofJavacodes with
performance of.NET codes, instead of C/C++ codes.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would be easier to simply add in a check of the result to make certain that one is actually produced and that the change in data types don't affect it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include
#include
#include
#include
#include
#include
#include
template
struct ArrayList
{
private:
std::vector m_Entries;
public:
template
void Foreach(TCallback inCallback)
{
for(int i = 0, size = m_Entries.size(); i < size; i++)
{
inCallback(i);
}
}
void Add(TValue inValue)
{
m_Entries.push_back(inValue);
}
};
int _tmain(int argc, _TCHAR* argv[])
{
auto t = [&]() {};
ArrayList arr;
int res = 0;
for(int i = 0; i < 100; i++)
{
arr.Add(i);
}
long long freq, t1, t2;
QueryPerformanceFrequency((LARGE_INTEGER*)&freq);
QueryPerformanceCounter((LARGE_INTEGER*)&t1);
for(int i = 0; i < 1000 * 1000 * 10; i++)
{
arr.Foreach([&](int v) {
res += v;
});
}
QueryPerformanceCounter((LARGE_INTEGER*)&t2);
printf("Time: %lld\n", ((t2-t1) * 1000000) / freq);
if(res == 4950)
return -1;
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sergey:
[SergeyK] Have you read my comments carefully? Please take a look at my comment
marked with '>>'
The hidden conversion is a point but should not cause any trouble since it is an increase of precision.
[SergeyK] What kind of precisioncould you get on integers?
>>...
>>In your test codeC++ operatorsare not used since 'arrPtr' is used instead of 'arr'.
>>...
Best regards,
Sergey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page