- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Integration of Open Watcom C++ compiler - details, performance evaluation, etc ***
Welcome Back, Open Watcom C++ compiler!
At the end of 2015 a decision was made to integrate Open Watcom C++ compiler v1.9 with a project I've been working on since 2009. I used Watcom C++ compiler in the middle of 90th ( last century! ) and I know how superior it is when it comes to optimization of C and C++ codes.
Honestly, I was concerned about timing of the integration, that is end of the year, Christmas almost "knocks" to the door ( just two weeks before December 24th ), however a significant portion of the integration was completed in about 6 hours and I managed to compile C/C++ sources and executed some test-cases.
Even if the work is still in progress on stabilizing codes and solving some little technical problems I could say that The Legendary Watcom C++ compiler is Not at the top of a list of the Modern optimizing C/C++ compilers. First of all, because version 1.9 is 32-bit only and does Not fully support, or does Not support At All, some Hot-Modern technologies. There is No support of SSE 2.x, SSE 4.x, AVX, AVX2, FMA instructions, OpenMP, Intel intrinsic functions, etc.
But, don't be too frustrated because Open Watcom C++ compiler team is working, this is an Open Source Project now, and I hope that a new version of Open Watcom C++ compiler will be released in the future.
I will follow up with more technical details and performance evaluation numbers on a set of scientific algorithms later. I will demonstrate how good Open Watcom C++ compiler is compared to Borland, MinGW, Microsoft, Intel and Turbo C++ compilers.
Link Copied
90 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler - STL support ]
Supported but I didn't have time to do any tests and verifications. I don't think any time will be spent to do it in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler - Errors, Warnings and Notes ]
In case of errors a compilation output is impressive and could be overwhelming.
For example, this is a small piece of C language code with induced error:
...
typedef union tagRTm128
{
abc
RTfloat m128_f32[4];
...
} RTm128;
...
In overall, Watcom C++ compiler reports that:
...
...declaration specifiers are required to declare 'abc'
...
This is a complete compilation output:
...
------ Build started: Project: WccTestApp, Configuration: Release Win32 ------
Performing Makefile project actions
*** ScaLib Message: Compiling with Watcom C++ compiler v1.9.0 ***
*** ScaLib Message: Configuration - Desktop - _WIN32_WCC - RELEASE ( 32-bit ) ***
*** ScaLib Message: Advanced ICC v12 Bat-Configuration ***
Open Watcom C/C++32 Compile and Link Utility Version 1.9
Portions Copyright (c) 1988-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
wpp386 WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
Open Watcom C++32 Optimizing Compiler Version 1.9
Portions Copyright (c) 1989-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
../../Include/BaseSet.h(1076): Error! E336: col(25) declaration specifiers are required to declare 'abc'
../../Include/BaseSet.h(1076): Note! N393: col(25) included from ../../Include/CommonSet.h(26)
../../Include/BaseSet.h(1076): Note! N393: col(25) included from Stdphf.h(120)
../../Include/BaseSet.h(1076): Note! N393: col(25) included from WccTestApp.cpp(21)
../../Include/BaseSet.h(1076): Error! E006: col(17) syntax error; probable cause: missing ';'
../../Include/BaseSet.h(1086): Error! E412: col(66) only member functions can be declared const or volatile
../../Include/BaseSet.h(1086): Error! E264: col(66) user-defined conversion must be a non-static member function
../../Include/BaseSet.h(1086): Error! E029: col(86) symbol 'm128_f32' has not been declared
../../Include/BaseSet.h(1087): Error! E412: col(67) only member functions can be declared const or volatile
../../Include/BaseSet.h(1087): Error! E264: col(67) user-defined conversion must be a non-static member function
../../Include/BaseSet.h(1087): Error! E029: col(88) symbol 'm128_f32' has not been declared
../../Include/BaseSet.h(1088): Error! E498: col(11) syntax error before 'RTm128'; probable cause: incorrectly spelled type name
../../Include/DevIrtAL.h(820): Error! E135: col(41) 'friend', 'virtual' or 'inline' modifiers may only be used on functions
../../Include/DevIrtAL.h(820): Error! E336: col(41) declaration specifiers are required to declare 'RTm128'
../../Include/DevIrtAL.h(820): Error! E006: col(26) syntax error; probable cause: missing ';'
../../Include/RuntimeSet.h(370): Error! E498: col(49) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/RuntimeSet.h(401): Error! E498: col(48) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/RuntimeSet.h(447): Error! E498: col(47) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/TraceSet.h(54): Error! E498: col(47) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/DataSet.h(1810): Error! E498: col(46) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/TestSet.h(240): Error! E498: col(46) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/SortSet.h(85): Error! E498: col(46) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../Include/CommonSet.h(271): Error! E498: col(48) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name
../../AppsSca/ScaLib/BaseSet.cpp(245): Error! E133: col(18) too many errors: compilation aborted
WccTestApp.cpp: no lines, included 159290, no warnings, 21 errors
Error: Compiler returned a bad status compiling "WccTestApp.cpp"
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ List of Warnings of Watcom C++ compiler I was forced to disable ]
Many hundreds of Warnings and Notes were displayed by the compiler during initial phase of integration.
Some of these Warnings were displayed in order to get attention of a Software Engineer and they could be disabled:
For example,
Warning W007 // Declaration may not produce intended result
Warning W008 // Returning address of function argument or of auto or register variable
Warning W013 // Unreachable code
Warning W014 // No reference to symbol
Warning W086 // Definition of macro not identical to previous definition
Warning W188 // Base class is inherited with private access
Warning W367 // Conditional expression in if statement is always true
Warning W368 // Conditional expression in if statement is always false
Warning W369 // Selection expression in switch statement is a constant value
Warning W387 // Expression is useful only for its side effects
Warning W389 // Integral value may be truncated during assignment or initialization
Warning W549 // Sizeof operand contains compiler generated information
Warning W628 // Expression is not meaningful
Warning W689 // Conditional expression is always true (non-zero)
Warning W716 // Integral value may be truncated
Warning W725 // Repeats a "some text" of #pragma message ( "some text" ) directive
Warning W726 // No reference to formal parameter
Warning W735 // Single-line style comment continues on next line
Note 1: Warnings are disabled from a command line only. For example:
... -wcd=007 ...
Note 2: Note-like compilation messages are similar to Intel's Remark-like compilation messages.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Compilation Output ( Debug ) of Watcom C++ compiler ( Integration with VS 2008 Professional Edition ) ]
This is an example of Compilation Output ( Debug ) when codes compiled without any problems.
------ Build started: Project: WccTestApp, Configuration: Debug Win32 ------
Performing Makefile project actions
*** ScaLib Message: Compiling with Watcom C++ compiler v1.9.0 ***
*** ScaLib Message: Configuration - Desktop - _WIN32_WCC - DEBUG ( 32-bit ) ***
*** ScaLib Message: Advanced ICC v12 Bat-Configuration ***
Open Watcom C/C++32 Compile and Link Utility Version 1.9
Portions Copyright (c) 1988-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
wpp386 WccTestApp.cpp -5r -fp5 -fpi87 -wx -d2 -od -D_WIN32_WCC -D_DEBUG -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
Open Watcom C++32 Optimizing Compiler Version 1.9
Portions Copyright (c) 1989-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
WccTestApp.cpp: 661 lines, included 242201, no warnings, no errors
wlink @__wcl__.lnk
Open Watcom Linker Version 1.9
Portions Copyright (c) 1985-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
loading object files
searching libraries
creating a Windows NT character-mode executable
1 file(s) copied.
1 file(s) copied.
Could Not Find c:\WorkEnv\AppsWorkDev\AppsTst\WccTestApp\*.err
WccTestApp - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 1 up-to-date, 0 skipped ==========
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Compilation Output ( Release ) of Watcom C++ compiler ( Integration with VS 2008 Professional Edition ) ]
This is an example of Compilation Output ( Release ) when codes compiled without any problems.
------ Build started: Project: WccTestApp, Configuration: Release Win32 ------
Performing Makefile project actions
*** ScaLib Message: Compiling with Watcom C++ compiler v1.9.0 ***
*** ScaLib Message: Configuration - Desktop - _WIN32_WCC - RELEASE ( 32-bit ) ***
*** ScaLib Message: Advanced ICC v12 Bat-Configuration ***
Open Watcom C/C++32 Compile and Link Utility Version 1.9
Portions Copyright (c) 1988-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
wpp386 WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
Open Watcom C++32 Optimizing Compiler Version 1.9
Portions Copyright (c) 1989-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
WccTestApp.cpp: 661 lines, included 242201, no warnings, no errors
wlink @__wcl__.lnk
Open Watcom Linker Version 1.9
Portions Copyright (c) 1985-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See http://www.openwatcom.org/ for details.
loading object files
searching libraries
creating a Windows NT character-mode executable
1 file(s) copied.
1 file(s) copied.
Could Not Find c:\WorkEnv\AppsWorkDev\AppsTst\WccTestApp\*.err
WccTestApp - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 1 up-to-date, 0 skipped ==========
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler - Command line options - Comments ]
I had some issues when '-of+', '-oi+', and '-ol+' optimization options were used at the same time.
I didn't try to optimize code for 'space', that is with option '-os'.
A very interesting option is '-or' ( re-order instructions to avoid stalls ) and a test-case will be needed in order to see how it works and what possible performance improvements are.
Another very interesting option is '-ob' ( branch prediction ).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom Linker - detected problems ]
No problems detected but Watcom Linker uses more then 1.5GB of memory during final phase of generation of 32-bit binaries and even on fast PCs it takes a couple of minutes to create an executable.
It is only my guess but I think that final phase of the Watcom Linker is very similar to Intel's IPO.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Support of Intel MKL libraries ]
A format of Watcom import / static libraries is incompatible with a format of Microsoft import / static libraries.
In order to call several MKL functions used in scientific algorithms Watcom 'Wlib.exe' utility was used to generate import libraries in Watcom Linker format.
Here is a list of MKL DLLs I used to create Watcom Linker compatible import libraries:
mkl_rt.dll
mkl_core.dll
mkl_def.dll
mkl_p4.dll
mkl_sequential.dll
mkl_scalapack_core.dll
However, in production codes only 'LoadLibrary' based solution is used to call MKL functions because it is absolutely flexible, portable and very efficient.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom Debugger ]
It works well but its UI-interface is very obsolete and I'll try to upload some screenshots later.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Processing when CrtDebugBreak() is used in Release or Debug Configurations ]
Note: CrtDebugBreak() function is also known as 'CCC3', or '3C3', or 'INT 3 -> RET'.
This is a processing output of a test-case when CrtDebugBreak() was called in order to see how 'atexit' function handles that call, or how it handles a fatal error in codes:
...
Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
Tests: Start
> Test0001 Start <
**********************************************
Configuration - WIN32_WCC ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CRuntimeSet Start *
> CRT Macros <
HrtPrefetchData< T0/T1/T2/NTA > - Passed
HrtClock - [ uiClock2 - uiClock1 ] Elapsed: 1.0000 sec
HrtRdtsc - [ uiClock2 - uiClock1 ] Elapsed: 1.0000 sec
HrtRdtsc - [ uiClock2 - uiClock1 ] Elapsed: 1594585452 clock cycles
Macro-Wrappers of HRT-Functions - Passed
IrtSetRoundingMode & CrtSetRoundingMode
IrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 120513456 clock cycles
CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 122180876 clock cycles
IrtMalloc & CrtMalloc & IrtFree & CrtFree
IrtCalloc & CrtCalloc & IrtFree & CrtFree
IrtSfence & CrtSfence
IrtLfence & CrtLfence
IrtMfence & CrtMfence
IrtSetZeroPs128 & CrtSetZeroPs128
IrtSetZeroPd128 & CrtSetZeroPd128
IrtSetZeroSi128 & CrtSetZeroSi128
IrtSetZeroPs256 & CrtSetZeroPs256
IrtSetZeroPd256 & CrtSetZeroPd256
IrtSetZeroSi256 & CrtSetZeroSi256
Macro-Wrappers of IRT-Functions - Passed
Macro-Wrappers of CRT-Functions - Passed
Macro-Wrappers of QRT-Functions - Passed
Macro-Wrappers of PRT-Functions - Passed
SetDebugInfoLevel - Passed
GetDebugInfoLevel - Passed
SetMemoryTracerParams - Passed
DisplayMessage - Passed
The program encountered exception 0x80000003 at address 0x7c90120e and
cannot continue.
Exception fielded by 0x0040ef00
EAX=0x00000018 EBX=0x0041727a ECX=0xffffffff EDX=0x00410736
ESI=0xf40dff5c EDI=0x1041fc18 EBP=0x1041fc56 ESP=0x1041fa14
EIP=0x7c90120e EFL=0x00000202 CS =0x0000001b SS =0x00000023
DS =0x00000023 ES =0x00000023 FS =0x0000003b GS =0x00000000
Stack dump (SS:ESP)
0x0040358d 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
Press any key to continue...
It means that when debugging is needed a VS 'Just-In-Time' functionality is Not used and a different technique is needed to debug codes in Debug or Release Configurations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation (1) - Test code in C language ]
Here is a small generic test-case in C language in order to evaluate performance of a C++ compiler:
...
RTuint64 uiClock1;
RTuint64 uiClock2;
RTint t;
// CrtDebugBreak();
// CrtDebugLabel( 0x5555 );
uiClock1 = CrtRdtsc();
for( t = 0; t < _RTNUMBER_OF_TESTS_0016777216; t += 1 )
{
volatile RTfloat x = ( RTfloat )t;
volatile RTfloat y = x * x * x;
}
uiClock2 = CrtRdtsc();
// CrtDebugLabel( 0x7777 );
...
Note: All commented code lines were uncommented in order to get into Debugger and to grab assembler codes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler ]
...
00402ACC call dword ptr ds:[415120h] // CrtDebugBreak();
00402AD2 mov dword ptr [ebp+26h], 5555h // CrtDebugLabel( 0x5555 );
00402AD9 rdtsc // CrtRdtsc();
00402ADB mov ecx, eax
00402ADD mov ebx, edx
00402ADF xor eax, eax
00402AE1 fld dword ptr [ebp+2Ah]
00402AE4 fld dword ptr [ebp+46h]
00402AE7 mov dword ptr [ebp+72h], eax
00402AEA fild dword ptr [ebp+72h]
00402AED fst st(2)
00402AEF fmul st, st(2)
00402AF1 fmul st, st(2)
00402AF3 fstp st(1)
00402AF5 inc eax
00402AF6 cmp eax, 1000000h
00402AFB jl 00402AE7
00402AFD fstp dword ptr [ebp+46h]
00402B00 fstp dword ptr [ebp+2Ah]
00402B03 rdtsc // CrtRdtsc();
00402B05 mov dword ptr [ebp+2Eh], 7777h // CrtDebugLabel( 0x7777 );
00402B0C sub eax, ecx
00402B0E sbb edx, ebx
...
[ Output ]
...
CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 120554024 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ]
...
0024344D call dword ptr ds:[245030h] // CrtDebugBreak();
00243453 rdtsc // CrtRdtsc();
00243455 mov dword ptr [ebp-10h], eax
00243458 xor eax, eax
0024345A mov dword ptr [ebp-4], 5555h // CrtDebugLabel( 0x5555 );
00243461 mov ecx, edx
00243463 mov dword ptr [ebp-4], eax
00243466 jmp CRuntimeSet::RunTest+1C0h (243470h)
00243468 lea esp, [esp]
0024346F nop
00243470 fild dword ptr [ebp-4]
00243473 add eax, 1
00243476 cmp eax, 1000000h
0024347B fstp dword ptr [ebp-4]
0024347E fld dword ptr [ebp-4]
00243481 fmul dword ptr [ebp-4]
00243484 fmul dword ptr [ebp-4]
00243487 fstp dword ptr [ebp-4]
0024348A mov dword ptr [ebp-4], eax
0024348D jl CRuntimeSet::RunTest+1C0h (243470h)
0024348F rdtsc // CrtRdtsc();
00243491 sub eax, dword ptr [ebp-10h]
00243494 mov dword ptr [ebp-4], 7777h // CrtDebugLabel( 0x7777 );
0024349B sbb edx, ecx
...
[ Output ]
...
CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 186046772 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Borland C++ compiler ]
...
00403022 call 00416E5A // CrtDebugBreak();
00403027 mov dword ptr [ebp-0A4h], 5555h // CrtDebugLabel( 0x5555 );
00403031 call 00405D34 // CrtRdtsc();
00403036 mov dword ptr [ebp-98h], eax
0040303C mov dword ptr [ebp-94h], edx
00403042 xor eax, eax
00403044 mov dword ptr [ebp-320h], eax
0040304A fild dword ptr [ebp-320h]
00403050 fstp dword ptr [ebp-0A8h]
00403056 fld dword ptr [ebp-0A8h]
0040305C fmul dword ptr [ebp-0A8h]
00403062 fmul dword ptr [ebp-0A8h]
00403068 fstp dword ptr [ebp-0ACh]
0040306E inc eax
0040306F cmp eax, 1000000h
00403074 jl 00403044
00403076 call 00405D34 // CrtRdtsc();
0040307B mov dword ptr [ebp-0A0h], eax
00403081 mov dword ptr [ebp-9Ch], edx
00403087 mov dword ptr [ebp-0B0h], 7777h // CrtDebugLabel( 0x7777 );
...
[ Output ]
...
CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 188474452 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Seems that VS Compiler inserted unconditional jump to CRuntimeSet::RunTest+1C0h (243470h) , I suppose that this branch (not present in Watcom) generated machine code can be the reason for the slower performance of MS Compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler ]
...
004018F4 call dword ptr ds:[413220h] // CrtDebugBreak();
004018FA mov eax, 5555h // CrtDebugLabel( 0x5555 );
004018FF rdtsc // CrtRdtsc();
00401901 mov dword ptr [ebp-48h], eax
00401904 mov dword ptr [ebp-44h], edx
00401907 xor esi, esi
00401909 mov edi, dword ptr [ebp-48h]
0040190C mov ebx, dword ptr [ebp-44h]
0040190F nop
00401910 pxor xmm2, xmm2
00401914 cvtsi2ss xmm2, esi
00401918 add esi, 1
0040191B cmp esi, 1000000h
00401921 movss dword ptr [ebp-8Ch], xmm2
00401929 movss xmm6, dword ptr [ebp-8Ch]
00401931 movss xmm7, dword ptr [ebp-8Ch]
00401939 mulss xmm6, xmm7
0040193D movss xmm0, dword ptr [ebp-8Ch]
00401945 mulss xmm6, xmm0
00401949 movss dword ptr [ebp-88h], xmm6
00401951 jne _ZN11CRuntimeSet7RunTestEv+300h (401910h)
00401953 rdtsc // CrtRdtsc();
00401955 mov dword ptr [ebp-40h], eax
00401958 mov dword ptr [ebp-3Ch], edx
0040195B mov eax, dword ptr [ebp-40h]
0040195E mov edx, dword ptr [ebp-3Ch]
00401961 mov eax, 7777h // CrtDebugLabel( 0x7777 );
...
[ Output ]
...
CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 158981392 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Borland Compiler produced almost the same assembly code and yet it is slower than Watcom. How many times did you run all those compiler specific tests? I suppose that Borland test is not averaged out sufficiently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler ]
...
00402035 call dword ptr ds:[41A00Ch] // CrtDebugBreak();
0040203B mov dword ptr [ebp-64h], 5555h // CrtDebugLabel( 0x5555 );
00402042 rdtsc // CrtRdtsc();
00402044 mov ecx, eax
00402046 mov esi, edx
00402048 xor eax, eax
0040204A cvtsi2ss xmm0, eax
0040204E movss dword ptr [ebp-30h], xmm0
00402053 inc eax
00402054 movss xmm3, dword ptr [ebp-30h]
00402059 cmp eax, 1000000h
0040205E movss xmm1, dword ptr [ebp-30h]
00402063 mulss xmm3, xmm1
00402067 movss xmm2, dword ptr [ebp-30h]
0040206C mulss xmm3, xmm2
00402070 movss dword ptr [ebp-2Ch], xmm3
00402075 jl CRuntimeSet::RunTest+1BAh (40204Ah)
00402077 rdtsc // CrtRdtsc();
00402079 add esp, 0FFFFFFF4h
0040207C sub eax, ecx
0040207E mov dword ptr [ebp-60h], 7777h // CrtDebugLabel( 0x7777 );
00402085 sbb edx, esi
...
[ Output ]
...
CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 150922384 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation (1) - Summary ]
1. Watcom C++ compiler Test Executed in: 120,554,024 clock cycles
2. Intel C++ compiler Test Executed in: 150,922,384 clock cycles
3. MinGW C++ compiler Test Executed in: 158,981,392 clock cycles
4. Microsoft C++ compiler Test Executed in: 186,046,772 clock cycles
5. Borland C++ compiler Test Executed in: 188,474,452 clock cycles
Note 1: Watcom C++ compiler completed the test by ~20% faster then Intel C++ compiler.
Note 2: Take into account that timings are in a CPU clock cycles and CrtRdtsc() function was used to get these performance values.
These values are always different and a value in clock cycles could be easily converted to nanoseconds, or microseconds, or milliseconds, etc, when that value is divided by a base CPU frequency in Hz and multiplied by a normalizing constant.
A Non-Deterministic nature of an SMT-based scheduler of a Windows operating system was clearly seen and there is Nothing wrong here because this is how the SMT based scheduler was designed by David Cutler.
It means that if the test-case is executed 10 times than last 5 or 6 digits ( from the right ) of a value in clock cycles will be different.
Note 3: David Cutler was a Lead Software Engineer at Microsoft more than 25 years ago and he is the "Father" of SMT-based Windows NT scheduler.
Note 4: SMT stands for a Symmetric Multithreading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Non-Deterministic nature of an SMT-based scheduler of a Windows OSs ]
See a comment about a Non-Deterministic nature of an SMT-based scheduler of Windows OSs in the previous post.
This is how it looks like in reality when ten tests are completed and all measurements are taken with nanoseconds accuracy:
...
Pass 01 - [ uiClock2 - uiClock1 ] Difference: 151734056 clock cycles
Pass 02 - [ uiClock2 - uiClock1 ] Difference: 151648204 clock cycles
Pass 03 - [ uiClock2 - uiClock1 ] Difference: 151881784 clock cycles
Pass 04 - [ uiClock2 - uiClock1 ] Difference: 151807612 clock cycles
Pass 05 - [ uiClock2 - uiClock1 ] Difference: 151679396 clock cycles
Pass 06 - [ uiClock2 - uiClock1 ] Difference: 151793996 clock cycles
Pass 07 - [ uiClock2 - uiClock1 ] Difference: 151711436 clock cycles
Pass 08 - [ uiClock2 - uiClock1 ] Difference: 151787256 clock cycles
Pass 09 - [ uiClock2 - uiClock1 ] Difference: 151846488 clock cycles
Pass 10 - [ uiClock2 - uiClock1 ] Difference: 151611644 clock cycles
...
For that set of test-cases a value of 1594500000 clock cycles equals to:
1 second, or
1000 milliseconds, or
1000000 microseconds, or
1000000000 nanoseconds
Then, in nanoseconds the same test results look like:
...
Pass 01 - [ uiClock2 - uiClock1 ] Difference: 95160900 nanoseconds
Pass 02 - [ uiClock2 - uiClock1 ] Difference: 95107057 nanoseconds
Pass 03 - [ uiClock2 - uiClock1 ] Difference: 95253549 nanoseconds
Pass 04 - [ uiClock2 - uiClock1 ] Difference: 95207031 nanoseconds
Pass 05 - [ uiClock2 - uiClock1 ] Difference: 95126620 nanoseconds
Pass 06 - [ uiClock2 - uiClock1 ] Difference: 95198492 nanoseconds
Pass 07 - [ uiClock2 - uiClock1 ] Difference: 95146714 nanoseconds
Pass 08 - [ uiClock2 - uiClock1 ] Difference: 95194265 nanoseconds
Pass 09 - [ uiClock2 - uiClock1 ] Difference: 95231412 nanoseconds
Pass 10 - [ uiClock2 - uiClock1 ] Difference: 95084129 nanoseconds
...
Note 1: For example, in case of the 'Pass 01' a value of 95160900 nanoseconds ( 0.095160900 seconds ) calculated as follows:
151734056 cc * 1000000000 ns / 1594500000 cc ~= 95160900 ns
where,
'cc' stands for 'clock cycles', and
'ns' stands for 'nanoseconds'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Memory Leaks Detection - WCC ]
Note: As you can see file names and line numbers are Not displayed.
...
Tests: Completed
* Memory Block: 0 *
...(0)
Memory Block State: 3 - Released
* Memory Block: 1 *
...(0)
Memory Block State: 3 - Released
* Memory Block: 2 *
...(0)
Memory Block State: 3 - Released
Memory Blocks Allocated : 3
Memory Blocks Released : 3
Memory Blocks NOT Released: 0
Memory Tracer Integrity Verified - Memory Leaks NOT Detected
Deallocating Memory Tracer Data Table
Completed
...

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page