Working on converting apps from x86 to x64 to be able to measure performance differences and a few other things but I ran into a strange issue yesterday morning which I've been trying to find an answer for today.
The main question, which I probably won't get answered here, is why the performance is +30% for x64 in debug mode but -10/20% in release? It's fairly strange and I want to find an explanation of that and that's why I've been trying to use VTune for the last 3-4 hours or more. It takes some time to get a little bit of knowledge how to actually run it and get some info that I want, and when I finally got it I don't know really what to read out from it...
I'm running VTune integrated in VS2005, the application is single threaded and performance is tested through ACT which records and playbacks a certain sample.
To the question here:
What to look for when I'm comparing 32 & 64-bit builds, should I concentrate on Instructions Retired, Clockticks or CPI? I don't care too much about optimizing the code later on, the main issue is where the performance difference is 32-64 bits.
Can I check release builds the same way as Debug and get data from the functions, not only assembler code? How?
Architectural reasons are relatively rare. To find those with VTune, you would look (at least) for TLB misses, in addition to the defaults.
Normally, for analyzing with VTune, you would build with /Zi added to your release configuration options. /Zi doesn't have much effect on performance, provided that you include the appropriate /O switch, to prevent it from defaulting to /Od.
It has been checked and double checked that I've been using the same switches and I've also tried a lot of different settings to try to find a solution to the problem.
I'll check the switch tomorrow, at home now, but the compiler settings between 32 and 64 bits are the same. Not to mention that I've tried more or less every setting there is...
The other thing that could be happening is that, if you are comparing a Debug build to a Release build, typically the Debug build has optimizations turned off!
However, I'm with tim18, I don't see why it matters. The Release build is what you will "release". Correct?!
To analyze the Release build in the VTune analyzer, you just need to modify the Release build configuration so that debug information is generated.
It's something I've yet not been able to explain and since my work is to investigate the technology I want an answer, not primarily seeking the performance... just the answers and the right questions ;)
The worst case which I run into frequently is that -Op gives worse performance for float/single precision data with the 64-bit Windows compilers, because x87 code is not used for hidden widening of data types. This could degrade performance of mixed float and double arithmetic as well.
Microsoft published plenty of documents on performance pitfalls to avoid when porting 32-bit Windows C++ to 64-bit. There are a number of cases where unintentionally forcing odd alignments will degrade 64-bit performance more than 32-bit performance. Needless to say, if you have a struct some of whose components have changed size, you have a big opportunity for problems. There may even be places where allowances were made by the 32-bit compiler for 64-bit data with odd alignments, leaving more room to gain by correcting it in the 64-bit build. There are optional Windows compiler 64-bit portability warning messages.
VTune could help you look for performance problems due to mis-alignments.
Just to clarify and make sure that there aren't any misunderstandings, the main issue isn't performance wise but why it performs as it does. The 64-bit technology is interesting to say the least, but it's not the best possible option to convert a lot of applications if you don't have anything to gain from it. When I was conducting the initial testing, debug mode, the application gave a 30% performance increase when running the application in 64-bit mode. To finalize the test I ran a release build and started to find out that things had changed a little bit too much, and that I had to investigate the reason behind it. I simply can't give any recommendations based on two completely different results without explaining why they are different, and since all the hardware and software is the same except for the build type I'm in trouble. I've already tested a lot of different build settings and will try the recommendations given here too. I know, it would be more correct to base a recommendatioin upon release builds but still those results feels more weird than the 32-bits.
Why I'm using the VTune right now is to try to find out where the most time disappears. And, if I can find out the difference between 32 and 64-bit technologies I would be satisfied. Maybe that would also give me the possibility to revise the code ;)
As I stated im the first post, what would be most important to look for when I'm looking for that difference? Instructions Retired, Clockticks or CPI? I simply wanna know the best way to compare the two technologies and find the bottleneck...
If I simply would go for the order that VTune puts them, I would say that it looks like an I/O issue but there might be other tasks higher up that is more important in difference between them. I'm going to try adding some counter for that today and hopefully I will succeed ;)
Of course I've enabled the portability warnings option, and although the issue still might be there it seems odd to me concerning the fact that 64-bits is 30% faster running in debug mode. And it's also a fact, that when I run debug CPU load is even on 25% (full load, single thread on Dual Xeon with HTT) but in release the CPU load varies huge. Maybe I could succeed on finding the reason for that with I/O counters...