Linux and Windows Compiler Differences

jhudd · ‎08-11-2010

Good morning,

I am a research scientist in atmospheric sciences and developed a universal kriging C application under Ubuntu 10.04 Linux. It has openmp pragma inserts and runs very well on the six dual corehardware platform.

The scientists who work with me only have access to Windows boxes; so, I built aWindows executable using Visual Studio 2008 with no change to the source code. The results were not correct.

I builtanother Windows executable using the Intel C compiler for Windows with no change to the source code and it had the same result as the Visual Studio executable.

Finally, I builta Cygwinversion of the Windows executable with no change to the source code and while it ran 12 times slower than the other executables; the output was identical to the Ubuntu Linux Intel compiler results.

Is there something I can do to make the Windows Intel C compiler work like the Linux Intel compiler.

Thank you,

John

aazue · ‎08-11-2010

Hi
If you using (http://tdm-gcc.tdragon.net/download) compiler GCC you have approximately -5/7%
performance decreased. Icc Linux and Icc Microsoft also almost same difference.(without OpenMp)
With very short threads, Icc with OpenMp very slow.. but with Linux or Microsoft same slow.
Is not an problem just you must use TBB better appropriated for resolve this problem
of (short threads)

Without suspecting your sincerity i have doubt that can be resulting same difference.
performance ICC linux and Icc Microsoft.

Send an specific source function form where you have observed, where it can be verified.

Nb
Also build last compiler source Gcc 4.5.1 in your Ubuntu is largely better that version origine 4.4.x.
if you wait performance.
Regards

jhudd · ‎08-14-2010

Thank you for your time bustaf.

The MingW compiler does indeed make the executable run just as fast as the Visual Studio and Windows Intel icc compilers. However, the output is still not correct. This is not a performance issue. This is a runtime issue. The Intel Linux compiler correctly calculates the kriged data as does the Cygwin GCC. Here are the specs for the Cygwin GCC

Target: i686-pc-cygwin

Thread model: posix

gcc version 4.3.4 20090804 (release) 1 (GCC)

The command used to build the MingW and Cygwin executables was the same

gcc -O3 -o mkrigingz *.c -fopenmp

I'd share the code, that's no problem; however, the data required as input is large. Do you have a ftp site where I can upload a small dataset? The full dataset is 6GB; however, I have a smaller 148MB set for sample runs.

John

aazue · ‎08-16-2010

Hi John
Sorry with my small capacity control your language i have some difficulty to understand
exactly the sens that you wrote where it work and it not work in relation Microsoft and Linux.

Mingw (only msys.dll): slow, incorrect,correct
GCC in Cygwin complete: slow, incorrect,correct
VC : slow , incorrect, correct
ICC Microsoft: slow , incorrect ,correct

About: (Mingw single) you have same this after ?

When i use auto install tdm-gcc.
after less that 3 minutes for install, i open the shell cmd and all is ready for build.
when static is used the friend that sharing the network with O/S Microsoft
must add only (msys.dll) for it work program origin.
In the shell cmd Microsoft the syntax for build with OpenMp is exactly:

C:\MinGW32\bin>g++ -O2 -static -lpthread -fopenmp -lgomp source.cc -o program.exe

If i use dual boot same machine and I start Linux and i build exactly same.
I have same result with (decreased) increased 5,7% performance. (sorry involuntary fault with small control language rectified )

Differnence LTO is absent tdm-gcc and affinity with sched.h not work
(lto require libelf for linker typed coff ?...) (Make in your Linux (ld -V ) you see probably elf
if you have lto Gnu compiler)
but you having auto vectorize and an part (graphite) that working same (-floop-interchange etc....)

If i make same compare ICC Linux and ICC Microsoft and VC ; i have approximately same
result -3,5% (O/S Microsoft (Vista or Seven 32 || 64) washed for only 32,37 services permanent maximum).

Icc Microsoft work well just coexist with several type version VCC is complex and
some times, with incorrect version associate , you have the result lame...
kriging oriented is an domain not easy for machine require all perfectly.

Always problem is not function,equation and processors possibility,is call flux control
in very large storage file.(the faults result with bad regulation)

(OpenMp) Icc Verify the (ENV side) parameter affinity and block_time that you have default...

Option several ways are in and out program for change parameters is catastrophic i think personally.

Read also this file if you have time it can help..
http://upc.lbl.gov/publications/ppopp141-hofmeyr.pdf

Warning with Icc OpenMp require obligatory runtime ,all build static no possible with this type lib.

About your file (148 Mo)
(Here not FTP service , only specific socket signed two side for moving source files ...)
I see if is possible to use other way ??
Regards.

piet_de_weer · ‎08-19-2010

Hello John,

How incorrect is incorrect? Are we talking about rounding errors, or is the result completely off? And (just to be sure) are you certain that the end result that you call correct is indeed the correct result?

I've read here in the past that the Intel people have tried to make the Linux version of the compiler behave like gcc, and the Windows compiler like the Visual Studio compiler. Apparently that succeeded...

What happens if you turn all the optimizations off (in both versions)? If the cause is rounding errors, the behavior should be the same (and correct) on Linux and Windows.

TimP · ‎08-19-2010

Another possible issue is that icc (linux) inherits partial C99 support from gcc (even when not requested on command line), while MSVC has no C99 support, not even the bits which were optional in C89. ICL (windows) tries to be faithful to MSVC (when you set appropriate /fp: options). You may require specification of /Qstd=c99 and/or invocation of mathimf if you use float math functions, for example.

aazue · ‎08-19-2010

Hi
I have move an source AIX origin (only an change source for (second)is long to int)
with kriging for make test. (Only GCC Linux & Microsoft) i not have ICC (Mi) installed).
Sure different that you have, but can signify ... Is test to simulate analyze surface
projected graphic points (graph animate realtime),(Large calculate, for machine...)
(origin for test simulate result sampler interface captor machine DO (optical density)
Linux side result is better...(12/18%) (OpenMp not used in this program) .
Not faults observed about large floating point.. (negative and positive..)

machine to my hand with 2 O/S ready used is Atom 270 (32).
i have make only one short test each side, I'm afraid it takes fire...or he melts ...
keyboard is hot ..
When I have machine 64 to my hand possible i make this test 2 side OS ICC compiler.

About compatibility with specific flag language maybe,I don't now but I have doubt is resolve to all ...
Already comportment threads OpenMp is different in VC,ICC,and GCC (also ICC,GCC two side O/S)
also is impossible that you have an compatibility perfectly assured two system ,
sometime require some change your source..
same result identical two OS only in films,or to tourist maybe..
Icc is not even a standalone compiler for to consider as possible
this succeed feat, but an large part of compatibility work already very well with ICC Intel.
Only real test show the true.
Regards.

jhudd · ‎08-19-2010

Thank you all for your answers and offer for assistance,

No, it is not a rounding issue. You were right about the optimization.

I ran the Gnu Debugger (GDB) and found that some of my variables had disappeared with the -O3 option under Windows. Of course, it worked perfectly for the Linux 64 bit compiler.

That was my first clue. I studied the man page for GCC extensively and discovered that the pointers are all set to 32 bit for a Win32 system. I wrote a little test program; and, sure enough, the float, float pointer,and double pointer are 32bit while the double is 64 bit. I was using double pointers to speed up the work.

All three MingW32 GCC, Intel ICL, and Microsoft Visual Studio VC behave the same. The cygwin GCC must behave differently.

The final result is that I'm rewriting the application for Win32 to be all FLOAT and to be carefull when I call the atof() and sqrt() function which return doubles.

Thank you again,

John

jhudd · ‎08-19-2010

Well, looks like I wrote too soon. Hopeful thinking on my part.

Again using all FLOAT, the Cygwin compiler executable produces accurate results and the MinGW32 and Intel ICL compilersproduced results that were incorrect.

I've inserted the two images to show that they are very different. The "good" agrees exactly with the Linux Intel compiler results and agrees with hand drawn estimates by collaborating scientists.

Most of the time taken by the Cygwin executable is in reading the 5.6GB of data. The kriging of the results takes less than a second. It uses the open/read/close calls.

aazue · ‎08-19-2010

Hi
You have an possibility to disable #pragma OpenMp for testing without ?
Regards

TimP · ‎08-20-2010

Quoting jhudd

Thank you all for your answers and offer for assistance,

That was my first clue. I studied the man page for GCC extensively and discovered that the pointers are all set to 32 bit for a Win32 system. I wrote a little test program; and, sure enough, the float, float pointer,and double pointer are 32bit while the double is 64 bit. I was using double pointers to speed up the work.

All three MingW32 GCC, Intel ICL, and Microsoft Visual Studio VC behave the same. The cygwin GCC must behave differently.

The final result is that I'm rewriting the application for Win32 to be all FLOAT and to be carefull when I call the atof() and sqrt() function which return doubles.

None of the compilers you mentioned suffer from the possible limitation of C compilers of 3 decades ago of not optimizing array references. You're better off writing portable code which you can understand rather than fiddling with fetching pairs of floats as a double, if that's what you mean. All of the compilers you mentioned, except MSVC, support auto-vectorization which will automatically perform float operations 4 at a time.
Likewise, all (with possible exception of MSVC in C mode) support in-line expansion of sqrtf().

jhudd · ‎08-20-2010

Tim18 and bustaf,

Thank you for your thoughts. The change to floats helped the memory situation since I use malloc for the User's specified area. That was a good move.

I solved the problem by changing all the level 1 I/O to level 2. That is, the open, seek, read, write, and close were changed to fopen, fseek, fread, fwrite, and fclose.

I had narrowed it down to I/O then wrote a small program to identify the issues, changed the I/O, and it worked. I changed the logic in the bigger universal kriging application and it works just like the Linux Intel compiler.

Intel Linux ICC and Cygwin GCC support level 1 I/O; other Windows compilers do not.

Thanks again,

John