Integration of Open Watcom C++ compiler - details, performance evaluation, etc - Page 3

SergeyKostrov · ‎01-10-2016

*** Integration of Open Watcom C++ compiler - details, performance evaluation, etc *** Welcome Back, Open Watcom C++ compiler! At the end of 2015 a decision was made to integrate Open Watcom C++ compiler v1.9 with a project I've been working on since 2009. I used Watcom C++ compiler in the middle of 90th ( last century! ) and I know how superior it is when it comes to optimization of C and C++ codes. Honestly, I was concerned about timing of the integration, that is end of the year, Christmas almost "knocks" to the door ( just two weeks before December 24th ), however a significant portion of the integration was completed in about 6 hours and I managed to compile C/C++ sources and executed some test-cases. Even if the work is still in progress on stabilizing codes and solving some little technical problems I could say that The Legendary Watcom C++ compiler is Not at the top of a list of the Modern optimizing C/C++ compilers. First of all, because version 1.9 is 32-bit only and does Not fully support, or does Not support At All, some Hot-Modern technologies. There is No support of SSE 2.x, SSE 4.x, AVX, AVX2, FMA instructions, OpenMP, Intel intrinsic functions, etc. But, don't be too frustrated because Open Watcom C++ compiler team is working, this is an Open Source Project now, and I hope that a new version of Open Watcom C++ compiler will be released in the future. I will follow up with more technical details and performance evaluation numbers on a set of scientific algorithms later. I will demonstrate how good Open Watcom C++ compiler is compared to Borland, MinGW, Microsoft, Intel and Turbo C++ compilers.

SergeyKostrov · ‎01-14-2016

[ Memory Leaks Detection - MinGW ] Note: This is an example of a correct report and as you can see file names and line numbers are displayed. ... Tests: Completed * Memory Block: 0 * ../../AppsSca/ScaLib/CommonSet.cpp(383) Memory Block State: 3 - Released * Memory Block: 1 * ../../AppsSca/ScaLib/Test/RuntimeSetTest.cpp(685) Memory Block State: 3 - Released * Memory Block: 2 * ../../AppsSca/ScaLib/Test/RuntimeSetTest.cpp(713) Memory Block State: 3 - Released Memory Blocks Allocated : 3 Memory Blocks Released : 3 Memory Blocks NOT Released: 0 Memory Tracer Integrity Verified - Memory Leaks NOT Detected Deallocating Memory Tracer Data Table Completed ...

SergeyKostrov · ‎01-14-2016

[ Watcom Pragma Aux Inline Assembler ( PAIA ) ] Note 1: PAIA is an abbreviation I've started to use as soon as I've realized what Pragma Aux Inline Assembler is and this is Not the abbreviation from Watcom or Sybase Corporations, or Open Watcom C++ compiler team. Attention to Assembler Language developers! Pragma Aux Inline Assembler ( PAIA ) is a very Powerful feature of Watcom C++ compiler. Honestly, I even didn't expect to see it and that was a great discovery. I'd like to let you know that when special functions, which dependent on some CPU instruction sets, are needed they could be easily implemented using PAIA. For example, I've implemented several generic and fundamental special functions right after some initial integration of Watcom C++ compiler was completed. Here is an example of several special functions: Prefetch Sqrt Rdtsc Nop Hlt Pause Sfence Lfence Mfence SetZeroPs128 SetZeroPd128 SetZeroSi128 SetZeroPs256 SetZeroPd256 SetZeroSi256 This is a test-case implemented in C language for a HrtPrefetchData< T0/T1/T2/NTA > special function for the Prefetch< T0/T1/T2/NTA > instructions using Pragma Aux Inline Assembler: RTint main( RTvoid ) { ... RTint piAddress[ 128 ] = { 0x0 }; HrtPrefetchDataT0( ( RTchar * )&piAddress[0] ); HrtPrefetchDataT1( ( RTchar * )&piAddress[0] ); HrtPrefetchDataT2( ( RTchar * )&piAddress[0] ); HrtPrefetchDataNTA( ( RTchar * )&piAddress[0] ); ... return ( RTint )0; } This is how Prefetch< T0/T1/T2/NTA > instructions look like in a Debugger when Watcom C++ compiler generated binary codes: ... // HrtPrefetchDataT0( ( RTchar * )&piAddress[0] ); 8D 85 D0 F6 FF FF lea eax, [ piAddress ] 0F 18 08 prefetcht0 [ eax ] // HrtPrefetchDataT1( ( RTchar * )&piAddress[0] ); 8D 85 D0 F6 FF FF lea eax, [ piAddress ] 0F 18 10 prefetcht1 [ eax ] // HrtPrefetchDataT2( ( RTchar * )&piAddress[0] ); 8D 85 D0 F6 FF FF lea eax, [ piAddress ] 0F 18 18 prefetcht2 [ eax ] // HrtPrefetchDataNTA( ( RTchar * )&piAddress[0] ); 8D 85 D0 F6 FF FF lea eax, [ piAddress ] 0F 18 00 prefetchnta [ eax ] ... Everything is correct, as you can see, and it matches to how modern MinGW, Microsoft and Intel C++ compilers generate binary codes of the same test-case. In these special functions, created by Watcom C++ compiler, there are No any Junk binary codes regardless of what optimization options were used in a command line of the compiler to build executables for Debug and Release configurations. Another very important thing is that modern MinGW, Microsoft and Intel C++ compilers have 'Built-In' support for almost all Instruction Set Architectures ( aka ISA ) however Watcom C++ compiler doesn't have it. But PAIA allows to implement what a Software Engineer needs in All special cases.

SergeyKostrov · ‎01-14-2016

[ Watcom Pragma Aux Inline Assembler ( PAIA ) - Example 1 ] For example, this is how '_m_empty' intrinsic function is declared in 'mmintrin.h' header file: ... void _m_empty( void ); ... #pragma aux _m_empty = \ ".586" \ "emms" This is how '_m_empty' intrinsic function is used: #include ... void main( void ) { _m_empty(); } As I've already mentioned it is a very Powerful feature of Watcom C++ compiler because input arguments could be passed to a PAIA based function and the function could also return a value of any type.

SergeyKostrov · ‎01-14-2016

[ Watcom Pragma Aux Inline Assembler ( PAIA ) - Example 2 ] Attention to C and C++ Languages developers! Another useful feature is that PAIA could be used to implement methods of a C structure or a C++ class. Here is a small example I've created in order to test how it works: ... #pragma aux FunctionAsm = \ "xor eax, eax" \ "add eax, edx" \ "add eax, ebx" \ parm [ eax ][ edx ][ ebx ] \ value [ eax ] \ ; ... struct TestObj { RTint __pragma( "FunctionAsm" ) Function( RTint a, RTint b ); }; ... RTint main( RTvoid ) { TestObj tObj; RTint iValue = tObj.Function( 11, 11 ); CrtPrintf( RTU("Value=%ld\n"), ( RTint )iValue ); return ( RTint )0; } ...

SergeyKostrov · ‎01-14-2016

[ Watcom Pragma Aux Inline Assembler ( PAIA ) vs. Borland '__emit(...)__' statement ] Pragma Aux Inline Assembler ( PAIA ) is a more Powerful feature when compared to a similar feature created by Borland Corporation in its Borland and Turbo C++ compilers. Borland's '__emit(...)__' statement has more limitations especially when arguments need to be passed to '__emit(...)__'-based functions. Another disadvantage of '__emit(...)__'-based functions is that they are Not inlined and there is always a call to an actual body of the function. These calls create additional performance overheads and this is a negative thing for HPC applications. Note: HPC stands for High Performance Computing.

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - Intel and AMD intrinsic functions support ] There is only partial support of Intel 64-bit, aka 'MMX', intrinsic functions declared in 'mmintrin.h' header file. Here is a list of headers that "represent" different groups of intrinsic functions and 'No' means that it is Not supported: emmintrin.h No immintrin.h No intrin.h No mm3dnow.h No mmintrin.h Yes ( Incomplete ) nmmintrin.h No pmmintrin.h No smmintrin.h No tmmintrin.h No wmmintrin.h No xmmintrin.h No zmmintrin.h No

SergeyKostrov · ‎01-14-2016

[ Additional information for Intel and AMD headers with intrinsic functions ] I know that "bare" names of header files could confuse even experienced Software Engineers or Software Developers because there are more then ten different header files with intrinsic functions. Short descriptions of headers are provided and I reordered the list: xmmintrin.h Principal header file for SSE intrinsics intrin.h Definitions and Declarations for platform specific intrinsics mmintrin.h Definitions and Declarations for use with compiler intrinsics mm3dnow.h AMD(R) 3D Now! intrinsics emmintrin.h Intel(R) SSE2 intrinsics pmmintrin.h Intel(R) SSE3 intrinsics smmintrin.h Intel(R) SSE4.1 intrinsics nmmintrin.h Intel(R) SSE4.2 intrinsics tmmintrin.h Intel(R) HPI intrinsics wmmintrin.h Intel(R) AES intrinsics immintrin.h Intel(R) AVX intrinsics zmmintrin.h Intel(R) AVX2 intrinsics Abbreviations are as follows: SIMD - Single Instruction Multiple Data SSE - Streaming SIMD Extensions HPI - Horizontally Packed Intrinsics AES - Advanced Encryption Set AVX - Advanced Vector Extensions AVX2 - Advanced Vector Extensions 2

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - Incomplete support of Intel intrinsic functions in 'mmintrin.h' ] Support is Incomplete because Intel intrinsic functions from a group named as 'Utility intrinsics' are Not declared. In that group there should be 12 more intrinsic functions and here they are: ... /* Utility intrinsics */ __m64 __ICL_INTRINCC _mm_setzero_si64(); __m64 __ICL_INTRINCC _mm_set_pi32(int, int); __m64 __ICL_INTRINCC _mm_set_pi16(short, short, short, short); __m64 __ICL_INTRINCC _mm_set_pi8(char, char, char, char, char, char, char, char); __m64 __ICL_INTRINCC _mm_set1_pi32(int); __m64 __ICL_INTRINCC _mm_set1_pi16(short); __m64 __ICL_INTRINCC _mm_set1_pi8(char); __m64 __ICL_INTRINCC _mm_setr_pi32(int, int); __m64 __ICL_INTRINCC _mm_setr_pi16(short, short, short, short); __m64 __ICL_INTRINCC _mm_setr_pi8(char, char, char, char, char, char, char, char); __m64 __ICL_INTRINCC _m_from_int64(__int64); __int64 __ICL_INTRINCC _m_to_int64(__m64); ...

SergeyKostrov · ‎01-14-2016

[ Regarding a Lack of Support of modern Intel and AMD Instruction Sets and Intrinsic functions ] Take into account that support of Any (!) 32-bit Intel or AMD intrinsic functions could be easily done using Pragma Aux Inline Assembler of Watcom C++ compiler ( 32-bit only as I've mentioned it before). However, a major problem is that a gap between modern C++ compilers and Watcom C++ compiler is significant when it comes to supporting Intel or AMD Instruction Set Architectures ( ISA ) released after 2000 year. So, as soon as there will be a complete support of all ISAs and SIMD Technology, that is 3D Now!, SSE, SSE2, SSE3, SSE4.x, HPI, AES, AVX, AVX2, etc, then Watcom C++ compiler will be more competitive because that support could be finally qualified as 'Built-In'. Even if AVX or AVX2 instructions could be implemented with Pragma Aux Inline Assembler without (!) modifications of Watcom C++ compiler right now that support could be qualified as 'Declared' at the moment, or as 'Not-Built-In'. As a matter of fact, many Software Developers and Software Engineers are trying to use intrinsic functions as the solution of All performance problems, however some algorithmic solutions, even if they implemented in C language without use of any intrinsic functions, could provide better performance improvements. To finalize my comments, application of Intel or AMD intrinsic functions could Not be: 'Panacea-Of-All-Performance-Problems'. In reality if somebody has a 'Performance Problem' then a typically given advise could be as follows: 'Try to Re-Implement codes with application of intrinsic functions' Instead of another advise: 'Try to use a Better Algorithm which solves a problem by 2x, 4x, or 10x faster, and then improve it performance even more by applying Vectorization techniques or intrinsic functions' This is because Raw application of Intel & AMD Intrinsic functions makes C and C++ source codes Non Portable.

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - Vectorization support ] No. There is No any support of code Vectorization. See also a final statement of the previous post. Application of Vectorization is Not: 'Panacea-Of-All-Performance-Problems'.

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - OpenMP support ] No. There is No any support of OpenMP.

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - C++11 support ] No. There is No any support of C++11 Standard for C++ language.

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - C++ memory management operators ] C++ operators 'new' and 'delete' of course supported but there is a significant deviation from a legacy C++ Standard because in Debug configuration '__FILE__' and '__LINE__' macros could Not be passed as additional arguments to C++ operators 'new' and 'delete'. It means, that a highly portable Memory Leaks Detection subsystem currently used in the project doesn't record in what C or C++ source file, and on what line, a memory was allocated or released. This is how it looks like in reality and as you can see file names and line numbers are Not displayed: ... Tests: Completed * Memory Block: 0 * ...(0) Memory Block State: 3 - Released * Memory Block: 1 * ...(0) Memory Block State: 3 - Released * Memory Block: 2 * ...(0) Memory Block State: 3 - Released Memory Blocks Allocated : 3 Memory Blocks Released : 3 Memory Blocks NOT Released: 0 Memory Tracer Integrity Verified - Memory Leaks NOT Detected Deallocating Memory Tracer Data Table Completed ... Anyway, the report shows that all allocated memory blocks are released.

SergeyKostrov · ‎01-14-2016

[ Watcom C++ compiler - C Run-Time memory management functions ] CRT-functions 'malloc'&'free', 'calloc'&'free', 'alloca' are supported but there is No any support for CRT-functions which allocate aligned memory blocks. However, these CRT memory management functions capable to allocate aligned memory blocks could be easily implemented and I used the same solution that I've used for Borland, Turbo and early versions of MinGW C++ compilers.

Bernard · ‎01-14-2016

>>>See also a final statement of the previous post. Application of Vectorization is Not:

'Panacea-Of-All-Performance-Problems'.>>>

That is understood as far as data set or specific domain is not easily vectorisable.

SergeyKostrov · ‎01-16-2016

[ To Alexander ] >>... >>...Say, llvm ( as Intel Compiler ) already offer OpenCL support. And when >>you suppose to add this feature to Open Watcom? I don't work on Open Watcom project and I think Open Watcom team doesn't have OpenCL support task on the list of high priority tasks. I personally wouldn't vote for the support of C++11 and OpenCL technologies.

SergeyKostrov · ‎01-16-2016

[ To Alexander ] >>... >>...You sure that won't be a waste of time and other resources? Under no circumstances No. Because project I've been working for is a C library, with a very thin C++ layer, of Highly Portable scientific algorithms for HPC. Six major versions of different C/C++ compilers are supported as of today, including Open Watcom, and total number of updates for these C/C++ compilers is 12 ( plus 2 pending ): /* ... // Note 02: Visual C++: VS98 & VS2005 PE & VS2008 PE & VS2010 EE & VS2012 EE ... 01 Visual C++ 02 MinGW v3.4.2 03 MinGW v4.8.1 04 MinGW v4.9.0 05 MinGW v4.9.2 06 MinGW v5.1.0 Pending 07 Borland C++ v5.5.1 08 Turbo C++ v3.0.0 09 Intel C++ v7.1.x Update 029 10 Intel C++ v8.1.x Update 038 11 Intel C++ v12.1.x Update 7 12 Intel C++ v13.1.x Update 2 13 Watcom C++ v1.9.0 14 Watcom C++ v2.0.0 Pending ( 64-bit ) ... */ Every integration of a new C/C++ compiler ( usually once in a year or two ) improves overall quality of codes, helps to identify Not-So-Good-Implemented-Solutions, even if they highly portable. However, the process of integration is very-very time consuming. That is why integration of Watcom C++ compiler is Not exception and two small core subsystems of the library are improved already.

SergeyKostrov · ‎01-16-2016

[ To Alexander ] In case of updates the situation is very different because it is a less time consuming process: Microsoft compilers are updated if an update for a VS is available on Windows Updates. To my surprize a couple of updates for VS 2005 PE were released during last a couple of years. In overall, VS 2005 & 2008 PEs are very stable, "light" and support developments for Windows CE operating systems. There are No any plans to upgrade VSs to "Macrosoft" Visual Studios 2014 or 2015, etc. They are simply "monsters". In case of Express Editions there are No any complaints ( these Editions of VSs some kind of Verificators ). Installation of updates for MinGW is a very pleasant process now. Really! Borland compilers are in "No-Updates-At-All" state and for all detected in the past internal problems, I mean compiler bugs ( especially in Turbo C++ ), workarounds are found. Intel is not updated since version 13, that is since 2014, because latest versions of the compiler do not support VS 2008 Professional Edition and as far as I remember standalone installations, without VS, can't be done. A couple of more things, Turbo C++ is a Primary "Normalizer" ( a Verificator ) of C and C++ features. If some codes could not be compiled with Turbo C++ then it has to be simplified. That is why any C++11 features will be never allowed on the project. For many of us C++11 is a performance "killer". The same applies to STL and many 3rd-party C++ libraries. Even if STL is a good library C++ overheads affect performance significantly and it is clearly seen when it comes to processing with boosted thread priorities, like Above Normal or Time Critical, and measurements of time intervals with microseconds, or less ( hundred of nanoseconds ) accuracy.

SergeyKostrov · ‎01-16-2016

Verification of versions for C++ Standards: ... // Version of C++ Standard // // Note 01: Tese values are possible for the '__cplusplus' macro: // // 1 C++ compiler is qualified as Legacy // 199711L C++ compiler is qualified as Modern // 201103L C++ compiler is qualified as Modern ( C++11 ) ... Here is a summary: ... # Diagnostics: __cplusplus = 199711L // MGW - MinGW # Diagnostics: __cplusplus = 199711L // MSC - Microsoft # Diagnostics: __cplusplus = 199711L // ICC - Intel # Diagnostics: __cplusplus = 1 // TCC - Borland # Diagnostics: __cplusplus = 1 // BCC - Borland # Diagnostics: __cplusplus = 1 // WCC - Watcom ... The most newest versions of modern C++ compilers should show: ... # Diagnostics: __cplusplus = 201103L ...

SergeyKostrov · ‎01-16-2016

Regarding information about Open Watcom C++ compiler on http://en.wikipedia.org: Many web-links on http://en.wikipedia.org/wiki/Open_Watcom are broken!

SergeyKostrov · ‎01-16-2016

There is an 'Open Watcom V2 Fork' at http://open-watcom.github.io/open-watcom and this is some update: ... Bellow is list of main differences against Open Watcom 1.9 - New 2-phase build system, OW can be build by platform native C/C++ compiler or by itself - Code generator properly initialize pointers by DLL symbol addresses - DOS version of tools now support long file names (LFN) if appropriate LFN driver is loaded by DOS - OW is ported to 64-bit hosts ( WIN64, Linux X64 ) - Librarian support X64 CPU object modules and libraries - RDOS 32-bit C run-time compact memory model libraries are fixed - Resource compiler and Resource editors support WIN64 executables - OW text editor is now self containing, it can be used as standalone tool without any requirements for any additional files or configuration - Broken C++ compiler pre-compiled header template support is fixed - Many C++ compiler crashes are fixed - Debugger has no length limit for any used environment variable ...