Software Archive
Read-only legacy content
17061 Discussions

Integration of Open Watcom C++ compiler - details, performance evaluation, etc

SergeyKostrov
Valued Contributor II
13,150 Views
*** Integration of Open Watcom C++ compiler - details, performance evaluation, etc *** Welcome Back, Open Watcom C++ compiler! At the end of 2015 a decision was made to integrate Open Watcom C++ compiler v1.9 with a project I've been working on since 2009. I used Watcom C++ compiler in the middle of 90th ( last century! ) and I know how superior it is when it comes to optimization of C and C++ codes. Honestly, I was concerned about timing of the integration, that is end of the year, Christmas almost "knocks" to the door ( just two weeks before December 24th ), however a significant portion of the integration was completed in about 6 hours and I managed to compile C/C++ sources and executed some test-cases. Even if the work is still in progress on stabilizing codes and solving some little technical problems I could say that The Legendary Watcom C++ compiler is Not at the top of a list of the Modern optimizing C/C++ compilers. First of all, because version 1.9 is 32-bit only and does Not fully support, or does Not support At All, some Hot-Modern technologies. There is No support of SSE 2.x, SSE 4.x, AVX, AVX2, FMA instructions, OpenMP, Intel intrinsic functions, etc. But, don't be too frustrated because Open Watcom C++ compiler team is working, this is an Open Source Project now, and I hope that a new version of Open Watcom C++ compiler will be released in the future. I will follow up with more technical details and performance evaluation numbers on a set of scientific algorithms later. I will demonstrate how good Open Watcom C++ compiler is compared to Borland, MinGW, Microsoft, Intel and Turbo C++ compilers.
0 Kudos
90 Replies
SergeyKostrov
Valued Contributor II
2,315 Views
[ Watcom C++ compiler - STL support ] Supported but I didn't have time to do any tests and verifications. I don't think any time will be spent to do it in the future.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Watcom C++ compiler - Errors, Warnings and Notes ] In case of errors a compilation output is impressive and could be overwhelming. For example, this is a small piece of C language code with induced error: ... typedef union tagRTm128 { abc RTfloat m128_f32[4]; ... } RTm128; ... In overall, Watcom C++ compiler reports that: ... ...declaration specifiers are required to declare 'abc' ... This is a complete compilation output: ... ------ Build started: Project: WccTestApp, Configuration: Release Win32 ------ Performing Makefile project actions *** ScaLib Message: Compiling with Watcom C++ compiler v1.9.0 *** *** ScaLib Message: Configuration - Desktop - _WIN32_WCC - RELEASE ( 32-bit ) *** *** ScaLib Message: Advanced ICC v12 Bat-Configuration *** Open Watcom C/C++32 Compile and Link Utility Version 1.9 Portions Copyright (c) 1988-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. wpp386 WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735 Open Watcom C++32 Optimizing Compiler Version 1.9 Portions Copyright (c) 1989-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. ../../Include/BaseSet.h(1076): Error! E336: col(25) declaration specifiers are required to declare 'abc' ../../Include/BaseSet.h(1076): Note! N393: col(25) included from ../../Include/CommonSet.h(26) ../../Include/BaseSet.h(1076): Note! N393: col(25) included from Stdphf.h(120) ../../Include/BaseSet.h(1076): Note! N393: col(25) included from WccTestApp.cpp(21) ../../Include/BaseSet.h(1076): Error! E006: col(17) syntax error; probable cause: missing ';' ../../Include/BaseSet.h(1086): Error! E412: col(66) only member functions can be declared const or volatile ../../Include/BaseSet.h(1086): Error! E264: col(66) user-defined conversion must be a non-static member function ../../Include/BaseSet.h(1086): Error! E029: col(86) symbol 'm128_f32' has not been declared ../../Include/BaseSet.h(1087): Error! E412: col(67) only member functions can be declared const or volatile ../../Include/BaseSet.h(1087): Error! E264: col(67) user-defined conversion must be a non-static member function ../../Include/BaseSet.h(1087): Error! E029: col(88) symbol 'm128_f32' has not been declared ../../Include/BaseSet.h(1088): Error! E498: col(11) syntax error before 'RTm128'; probable cause: incorrectly spelled type name ../../Include/DevIrtAL.h(820): Error! E135: col(41) 'friend', 'virtual' or 'inline' modifiers may only be used on functions ../../Include/DevIrtAL.h(820): Error! E336: col(41) declaration specifiers are required to declare 'RTm128' ../../Include/DevIrtAL.h(820): Error! E006: col(26) syntax error; probable cause: missing ';' ../../Include/RuntimeSet.h(370): Error! E498: col(49) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/RuntimeSet.h(401): Error! E498: col(48) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/RuntimeSet.h(447): Error! E498: col(47) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/TraceSet.h(54): Error! E498: col(47) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/DataSet.h(1810): Error! E498: col(46) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/TestSet.h(240): Error! E498: col(46) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/SortSet.h(85): Error! E498: col(46) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../Include/CommonSet.h(271): Error! E498: col(48) syntax error before 'CBaseSet'; probable cause: incorrectly spelled type name ../../AppsSca/ScaLib/BaseSet.cpp(245): Error! E133: col(18) too many errors: compilation aborted WccTestApp.cpp: no lines, included 159290, no warnings, 21 errors Error: Compiler returned a bad status compiling "WccTestApp.cpp" ...
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ List of Warnings of Watcom C++ compiler I was forced to disable ] Many hundreds of Warnings and Notes were displayed by the compiler during initial phase of integration. Some of these Warnings were displayed in order to get attention of a Software Engineer and they could be disabled: For example, Warning W007 // Declaration may not produce intended result Warning W008 // Returning address of function argument or of auto or register variable Warning W013 // Unreachable code Warning W014 // No reference to symbol Warning W086 // Definition of macro not identical to previous definition Warning W188 // Base class is inherited with private access Warning W367 // Conditional expression in if statement is always true Warning W368 // Conditional expression in if statement is always false Warning W369 // Selection expression in switch statement is a constant value Warning W387 // Expression is useful only for its side effects Warning W389 // Integral value may be truncated during assignment or initialization Warning W549 // Sizeof operand contains compiler generated information Warning W628 // Expression is not meaningful Warning W689 // Conditional expression is always true (non-zero) Warning W716 // Integral value may be truncated Warning W725 // Repeats a "some text" of #pragma message ( "some text" ) directive Warning W726 // No reference to formal parameter Warning W735 // Single-line style comment continues on next line Note 1: Warnings are disabled from a command line only. For example: ... -wcd=007 ... Note 2: Note-like compilation messages are similar to Intel's Remark-like compilation messages.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Compilation Output ( Debug ) of Watcom C++ compiler ( Integration with VS 2008 Professional Edition ) ] This is an example of Compilation Output ( Debug ) when codes compiled without any problems. ------ Build started: Project: WccTestApp, Configuration: Debug Win32 ------ Performing Makefile project actions *** ScaLib Message: Compiling with Watcom C++ compiler v1.9.0 *** *** ScaLib Message: Configuration - Desktop - _WIN32_WCC - DEBUG ( 32-bit ) *** *** ScaLib Message: Advanced ICC v12 Bat-Configuration *** Open Watcom C/C++32 Compile and Link Utility Version 1.9 Portions Copyright (c) 1988-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. wpp386 WccTestApp.cpp -5r -fp5 -fpi87 -wx -d2 -od -D_WIN32_WCC -D_DEBUG -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735 Open Watcom C++32 Optimizing Compiler Version 1.9 Portions Copyright (c) 1989-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. WccTestApp.cpp: 661 lines, included 242201, no warnings, no errors wlink @__wcl__.lnk Open Watcom Linker Version 1.9 Portions Copyright (c) 1985-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. loading object files searching libraries creating a Windows NT character-mode executable 1 file(s) copied. 1 file(s) copied. Could Not Find c:\WorkEnv\AppsWorkDev\AppsTst\WccTestApp\*.err WccTestApp - 0 error(s), 0 warning(s) ========== Build: 1 succeeded, 0 failed, 1 up-to-date, 0 skipped ==========
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Compilation Output ( Release ) of Watcom C++ compiler ( Integration with VS 2008 Professional Edition ) ] This is an example of Compilation Output ( Release ) when codes compiled without any problems. ------ Build started: Project: WccTestApp, Configuration: Release Win32 ------ Performing Makefile project actions *** ScaLib Message: Compiling with Watcom C++ compiler v1.9.0 *** *** ScaLib Message: Configuration - Desktop - _WIN32_WCC - RELEASE ( 32-bit ) *** *** ScaLib Message: Advanced ICC v12 Bat-Configuration *** Open Watcom C/C++32 Compile and Link Utility Version 1.9 Portions Copyright (c) 1988-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. wpp386 WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735 Open Watcom C++32 Optimizing Compiler Version 1.9 Portions Copyright (c) 1989-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. WccTestApp.cpp: 661 lines, included 242201, no warnings, no errors wlink @__wcl__.lnk Open Watcom Linker Version 1.9 Portions Copyright (c) 1985-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License. See http://www.openwatcom.org/ for details. loading object files searching libraries creating a Windows NT character-mode executable 1 file(s) copied. 1 file(s) copied. Could Not Find c:\WorkEnv\AppsWorkDev\AppsTst\WccTestApp\*.err WccTestApp - 0 error(s), 0 warning(s) ========== Build: 1 succeeded, 0 failed, 1 up-to-date, 0 skipped ==========
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Watcom C++ compiler - Command line options - Comments ] I had some issues when '-of+', '-oi+', and '-ol+' optimization options were used at the same time. I didn't try to optimize code for 'space', that is with option '-os'. A very interesting option is '-or' ( re-order instructions to avoid stalls ) and a test-case will be needed in order to see how it works and what possible performance improvements are. Another very interesting option is '-ob' ( branch prediction ).
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Watcom Linker - detected problems ] No problems detected but Watcom Linker uses more then 1.5GB of memory during final phase of generation of 32-bit binaries and even on fast PCs it takes a couple of minutes to create an executable. It is only my guess but I think that final phase of the Watcom Linker is very similar to Intel's IPO.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Support of Intel MKL libraries ] A format of Watcom import / static libraries is incompatible with a format of Microsoft import / static libraries. In order to call several MKL functions used in scientific algorithms Watcom 'Wlib.exe' utility was used to generate import libraries in Watcom Linker format. Here is a list of MKL DLLs I used to create Watcom Linker compatible import libraries: mkl_rt.dll mkl_core.dll mkl_def.dll mkl_p4.dll mkl_sequential.dll mkl_scalapack_core.dll However, in production codes only 'LoadLibrary' based solution is used to call MKL functions because it is absolutely flexible, portable and very efficient.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Watcom Debugger ] It works well but its UI-interface is very obsolete and I'll try to upload some screenshots later.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Processing when CrtDebugBreak() is used in Release or Debug Configurations ] Note: CrtDebugBreak() function is also known as 'CCC3', or '3C3', or 'INT 3 -> RET'. This is a processing output of a test-case when CrtDebugBreak() was called in order to see how 'atexit' function handles that call, or how it handles a fatal error in codes: ... Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test0001 Start < ********************************************** Configuration - WIN32_WCC ( 32-bit ) - Release CTestSet::InitTestEnv - Passed * CRuntimeSet Start * > CRT Macros < HrtPrefetchData< T0/T1/T2/NTA > - Passed HrtClock - [ uiClock2 - uiClock1 ] Elapsed: 1.0000 sec HrtRdtsc - [ uiClock2 - uiClock1 ] Elapsed: 1.0000 sec HrtRdtsc - [ uiClock2 - uiClock1 ] Elapsed: 1594585452 clock cycles Macro-Wrappers of HRT-Functions - Passed IrtSetRoundingMode & CrtSetRoundingMode IrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 120513456 clock cycles CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 122180876 clock cycles IrtMalloc & CrtMalloc & IrtFree & CrtFree IrtCalloc & CrtCalloc & IrtFree & CrtFree IrtSfence & CrtSfence IrtLfence & CrtLfence IrtMfence & CrtMfence IrtSetZeroPs128 & CrtSetZeroPs128 IrtSetZeroPd128 & CrtSetZeroPd128 IrtSetZeroSi128 & CrtSetZeroSi128 IrtSetZeroPs256 & CrtSetZeroPs256 IrtSetZeroPd256 & CrtSetZeroPd256 IrtSetZeroSi256 & CrtSetZeroSi256 Macro-Wrappers of IRT-Functions - Passed Macro-Wrappers of CRT-Functions - Passed Macro-Wrappers of QRT-Functions - Passed Macro-Wrappers of PRT-Functions - Passed SetDebugInfoLevel - Passed GetDebugInfoLevel - Passed SetMemoryTracerParams - Passed DisplayMessage - Passed The program encountered exception 0x80000003 at address 0x7c90120e and cannot continue. Exception fielded by 0x0040ef00 EAX=0x00000018 EBX=0x0041727a ECX=0xffffffff EDX=0x00410736 ESI=0xf40dff5c EDI=0x1041fc18 EBP=0x1041fc56 ESP=0x1041fa14 EIP=0x7c90120e EFL=0x00000202 CS =0x0000001b SS =0x00000023 DS =0x00000023 ES =0x00000023 FS =0x0000003b GS =0x00000000 Stack dump (SS:ESP) 0x0040358d 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 Press any key to continue... It means that when debugging is needed a VS 'Just-In-Time' functionality is Not used and a different technique is needed to debug codes in Debug or Release Configurations.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Performance Evaluation (1) - Test code in C language ] Here is a small generic test-case in C language in order to evaluate performance of a C++ compiler: ... RTuint64 uiClock1; RTuint64 uiClock2; RTint t; // CrtDebugBreak(); // CrtDebugLabel( 0x5555 ); uiClock1 = CrtRdtsc(); for( t = 0; t < _RTNUMBER_OF_TESTS_0016777216; t += 1 ) { volatile RTfloat x = ( RTfloat )t; volatile RTfloat y = x * x * x; } uiClock2 = CrtRdtsc(); // CrtDebugLabel( 0x7777 ); ... Note: All commented code lines were uncommented in order to get into Debugger and to grab assembler codes.
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Watcom C++ compiler ] ... 00402ACC call dword ptr ds:[415120h] // CrtDebugBreak(); 00402AD2 mov dword ptr [ebp+26h], 5555h // CrtDebugLabel( 0x5555 ); 00402AD9 rdtsc // CrtRdtsc(); 00402ADB mov ecx, eax 00402ADD mov ebx, edx 00402ADF xor eax, eax 00402AE1 fld dword ptr [ebp+2Ah] 00402AE4 fld dword ptr [ebp+46h] 00402AE7 mov dword ptr [ebp+72h], eax 00402AEA fild dword ptr [ebp+72h] 00402AED fst st(2) 00402AEF fmul st, st(2) 00402AF1 fmul st, st(2) 00402AF3 fstp st(1) 00402AF5 inc eax 00402AF6 cmp eax, 1000000h 00402AFB jl 00402AE7 00402AFD fstp dword ptr [ebp+46h] 00402B00 fstp dword ptr [ebp+2Ah] 00402B03 rdtsc // CrtRdtsc(); 00402B05 mov dword ptr [ebp+2Eh], 7777h // CrtDebugLabel( 0x7777 ); 00402B0C sub eax, ecx 00402B0E sbb edx, ebx ... [ Output ] ... CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 120554024 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Microsoft C++ compiler ] ... 0024344D call dword ptr ds:[245030h] // CrtDebugBreak(); 00243453 rdtsc // CrtRdtsc(); 00243455 mov dword ptr [ebp-10h], eax 00243458 xor eax, eax 0024345A mov dword ptr [ebp-4], 5555h // CrtDebugLabel( 0x5555 ); 00243461 mov ecx, edx 00243463 mov dword ptr [ebp-4], eax 00243466 jmp CRuntimeSet::RunTest+1C0h (243470h) 00243468 lea esp, [esp] 0024346F nop 00243470 fild dword ptr [ebp-4] 00243473 add eax, 1 00243476 cmp eax, 1000000h 0024347B fstp dword ptr [ebp-4] 0024347E fld dword ptr [ebp-4] 00243481 fmul dword ptr [ebp-4] 00243484 fmul dword ptr [ebp-4] 00243487 fstp dword ptr [ebp-4] 0024348A mov dword ptr [ebp-4], eax 0024348D jl CRuntimeSet::RunTest+1C0h (243470h) 0024348F rdtsc // CrtRdtsc(); 00243491 sub eax, dword ptr [ebp-10h] 00243494 mov dword ptr [ebp-4], 7777h // CrtDebugLabel( 0x7777 ); 0024349B sbb edx, ecx ... [ Output ] ... CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 186046772 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Borland C++ compiler ] ... 00403022 call 00416E5A // CrtDebugBreak(); 00403027 mov dword ptr [ebp-0A4h], 5555h // CrtDebugLabel( 0x5555 ); 00403031 call 00405D34 // CrtRdtsc(); 00403036 mov dword ptr [ebp-98h], eax 0040303C mov dword ptr [ebp-94h], edx 00403042 xor eax, eax 00403044 mov dword ptr [ebp-320h], eax 0040304A fild dword ptr [ebp-320h] 00403050 fstp dword ptr [ebp-0A8h] 00403056 fld dword ptr [ebp-0A8h] 0040305C fmul dword ptr [ebp-0A8h] 00403062 fmul dword ptr [ebp-0A8h] 00403068 fstp dword ptr [ebp-0ACh] 0040306E inc eax 0040306F cmp eax, 1000000h 00403074 jl 00403044 00403076 call 00405D34 // CrtRdtsc(); 0040307B mov dword ptr [ebp-0A0h], eax 00403081 mov dword ptr [ebp-9Ch], edx 00403087 mov dword ptr [ebp-0B0h], 7777h // CrtDebugLabel( 0x7777 ); ... [ Output ] ... CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 188474452 clock cycles ...
0 Kudos
Bernard
Valued Contributor I
2,315 Views

Seems that VS Compiler inserted unconditional jump to CRuntimeSet::RunTest+1C0h (243470h) , I suppose that this branch (not present in Watcom) generated machine code can be the reason for the slower performance of MS Compiler.

0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ MinGW C++ compiler ] ... 004018F4 call dword ptr ds:[413220h] // CrtDebugBreak(); 004018FA mov eax, 5555h // CrtDebugLabel( 0x5555 ); 004018FF rdtsc // CrtRdtsc(); 00401901 mov dword ptr [ebp-48h], eax 00401904 mov dword ptr [ebp-44h], edx 00401907 xor esi, esi 00401909 mov edi, dword ptr [ebp-48h] 0040190C mov ebx, dword ptr [ebp-44h] 0040190F nop 00401910 pxor xmm2, xmm2 00401914 cvtsi2ss xmm2, esi 00401918 add esi, 1 0040191B cmp esi, 1000000h 00401921 movss dword ptr [ebp-8Ch], xmm2 00401929 movss xmm6, dword ptr [ebp-8Ch] 00401931 movss xmm7, dword ptr [ebp-8Ch] 00401939 mulss xmm6, xmm7 0040193D movss xmm0, dword ptr [ebp-8Ch] 00401945 mulss xmm6, xmm0 00401949 movss dword ptr [ebp-88h], xmm6 00401951 jne _ZN11CRuntimeSet7RunTestEv+300h (401910h) 00401953 rdtsc // CrtRdtsc(); 00401955 mov dword ptr [ebp-40h], eax 00401958 mov dword ptr [ebp-3Ch], edx 0040195B mov eax, dword ptr [ebp-40h] 0040195E mov edx, dword ptr [ebp-3Ch] 00401961 mov eax, 7777h // CrtDebugLabel( 0x7777 ); ... [ Output ] ... CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 158981392 clock cycles ...
0 Kudos
Bernard
Valued Contributor I
2,315 Views

Borland Compiler produced almost the same assembly code and yet it is slower than Watcom. How many times did you run all those compiler specific tests?  I suppose that Borland test is not averaged out sufficiently.

0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Intel C++ compiler ] ... 00402035 call dword ptr ds:[41A00Ch] // CrtDebugBreak(); 0040203B mov dword ptr [ebp-64h], 5555h // CrtDebugLabel( 0x5555 ); 00402042 rdtsc // CrtRdtsc(); 00402044 mov ecx, eax 00402046 mov esi, edx 00402048 xor eax, eax 0040204A cvtsi2ss xmm0, eax 0040204E movss dword ptr [ebp-30h], xmm0 00402053 inc eax 00402054 movss xmm3, dword ptr [ebp-30h] 00402059 cmp eax, 1000000h 0040205E movss xmm1, dword ptr [ebp-30h] 00402063 mulss xmm3, xmm1 00402067 movss xmm2, dword ptr [ebp-30h] 0040206C mulss xmm3, xmm2 00402070 movss dword ptr [ebp-2Ch], xmm3 00402075 jl CRuntimeSet::RunTest+1BAh (40204Ah) 00402077 rdtsc // CrtRdtsc(); 00402079 add esp, 0FFFFFFF4h 0040207C sub eax, ecx 0040207E mov dword ptr [ebp-60h], 7777h // CrtDebugLabel( 0x7777 ); 00402085 sbb edx, esi ... [ Output ] ... CrtRdtsc - [ uiClock2 - uiClock1 ] Difference: 150922384 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
2,315 Views
[ Performance Evaluation (1) - Summary ] 1. Watcom C++ compiler Test Executed in: 120,554,024 clock cycles 2. Intel C++ compiler Test Executed in: 150,922,384 clock cycles 3. MinGW C++ compiler Test Executed in: 158,981,392 clock cycles 4. Microsoft C++ compiler Test Executed in: 186,046,772 clock cycles 5. Borland C++ compiler Test Executed in: 188,474,452 clock cycles Note 1: Watcom C++ compiler completed the test by ~20% faster then Intel C++ compiler. Note 2: Take into account that timings are in a CPU clock cycles and CrtRdtsc() function was used to get these performance values. These values are always different and a value in clock cycles could be easily converted to nanoseconds, or microseconds, or milliseconds, etc, when that value is divided by a base CPU frequency in Hz and multiplied by a normalizing constant. A Non-Deterministic nature of an SMT-based scheduler of a Windows operating system was clearly seen and there is Nothing wrong here because this is how the SMT based scheduler was designed by David Cutler. It means that if the test-case is executed 10 times than last 5 or 6 digits ( from the right ) of a value in clock cycles will be different. Note 3: David Cutler was a Lead Software Engineer at Microsoft more than 25 years ago and he is the "Father" of SMT-based Windows NT scheduler. Note 4: SMT stands for a Symmetric Multithreading.
0 Kudos
SergeyKostrov
Valued Contributor II
2,311 Views
[ Non-Deterministic nature of an SMT-based scheduler of a Windows OSs ] See a comment about a Non-Deterministic nature of an SMT-based scheduler of Windows OSs in the previous post. This is how it looks like in reality when ten tests are completed and all measurements are taken with nanoseconds accuracy: ... Pass 01 - [ uiClock2 - uiClock1 ] Difference: 151734056 clock cycles Pass 02 - [ uiClock2 - uiClock1 ] Difference: 151648204 clock cycles Pass 03 - [ uiClock2 - uiClock1 ] Difference: 151881784 clock cycles Pass 04 - [ uiClock2 - uiClock1 ] Difference: 151807612 clock cycles Pass 05 - [ uiClock2 - uiClock1 ] Difference: 151679396 clock cycles Pass 06 - [ uiClock2 - uiClock1 ] Difference: 151793996 clock cycles Pass 07 - [ uiClock2 - uiClock1 ] Difference: 151711436 clock cycles Pass 08 - [ uiClock2 - uiClock1 ] Difference: 151787256 clock cycles Pass 09 - [ uiClock2 - uiClock1 ] Difference: 151846488 clock cycles Pass 10 - [ uiClock2 - uiClock1 ] Difference: 151611644 clock cycles ... For that set of test-cases a value of 1594500000 clock cycles equals to: 1 second, or 1000 milliseconds, or 1000000 microseconds, or 1000000000 nanoseconds Then, in nanoseconds the same test results look like: ... Pass 01 - [ uiClock2 - uiClock1 ] Difference: 95160900 nanoseconds Pass 02 - [ uiClock2 - uiClock1 ] Difference: 95107057 nanoseconds Pass 03 - [ uiClock2 - uiClock1 ] Difference: 95253549 nanoseconds Pass 04 - [ uiClock2 - uiClock1 ] Difference: 95207031 nanoseconds Pass 05 - [ uiClock2 - uiClock1 ] Difference: 95126620 nanoseconds Pass 06 - [ uiClock2 - uiClock1 ] Difference: 95198492 nanoseconds Pass 07 - [ uiClock2 - uiClock1 ] Difference: 95146714 nanoseconds Pass 08 - [ uiClock2 - uiClock1 ] Difference: 95194265 nanoseconds Pass 09 - [ uiClock2 - uiClock1 ] Difference: 95231412 nanoseconds Pass 10 - [ uiClock2 - uiClock1 ] Difference: 95084129 nanoseconds ... Note 1: For example, in case of the 'Pass 01' a value of 95160900 nanoseconds ( 0.095160900 seconds ) calculated as follows: 151734056 cc * 1000000000 ns / 1594500000 cc ~= 95160900 ns where, 'cc' stands for 'clock cycles', and 'ns' stands for 'nanoseconds'
0 Kudos
SergeyKostrov
Valued Contributor II
2,311 Views
[ Memory Leaks Detection - WCC ] Note: As you can see file names and line numbers are Not displayed. ... Tests: Completed * Memory Block: 0 * ...(0) Memory Block State: 3 - Released * Memory Block: 1 * ...(0) Memory Block State: 3 - Released * Memory Block: 2 * ...(0) Memory Block State: 3 - Released Memory Blocks Allocated : 3 Memory Blocks Released : 3 Memory Blocks NOT Released: 0 Memory Tracer Integrity Verified - Memory Leaks NOT Detected Deallocating Memory Tracer Data Table Completed ...
0 Kudos
Reply