Software Archive
Read-only legacy content

Software Problems and Issues related to Legacy software

SergeyKostrov
Valued Contributor II
428 Views
*** Software Problems and Issues related to Legacy software ***
0 Kudos
12 Replies
SergeyKostrov
Valued Contributor II
428 Views
[ Abstract ] Legacy Software ( LS ) will be always used by some software engineers and developers even if large software companies, like Microsoft, Intel, NVIDIA, Oracle, etc, try to force upgrades to latest versions of software. Unfortunately, sometimes it is Not possible and a customer uses LS due to some internal reasons. Examples of some of these problems with LS will be given along with possible workarounds since it is Not possible to get a fix, or an update, for some version of LS.
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #1 - Inefficient binary codes generation when using '_mm_mul_ps' intrinsic function ] [ C code ] ... _RTALIGNED RTfloat *mfA; _RTALIGNED RTfloat *mfB; _RTALIGNED RTfloat fR[4]; _RTALIGNED __m128 *mfR = ( __m128 * )&fR; mfA = ( RTfloat * )&fA[ii][kk]; mfB = ( RTfloat * )&fB[jj][kk]; *mfR = _mm_mul_ps( *(( const __m128 * )mfA), *(( const __m128 * )mfB) ); ... [ Microsoft C++ compiler ( VS 2005 ) - Test result ] ... Sub-Test 3.2 - Completed: 0.81200 secs ... Note: ~3.64x faster than codes generated by Intel C++ compiler [ Intel C++ compiler v12.1.7 ( u371 ) - Test result ] ... Sub-Test 3.2 - Completed: 2.95300 secs ... Note: ~3.64x slower than codes generated by Microsoft C++ compiler [ Microsoft C++ compiler ( VS 2005 ) - Disassembled codes ] ... 0040BA32 movaps xmm0, xmmword ptr [edx] 0040BA35 movaps xmm2, xmmword ptr [ecx] 0040BA38 mulps xmm0, xmm2 ... [ Intel C++ compiler v12.1.7 ( u371 ) - Disassembled codes ] ... 004012FA movaps xmm1, xmmword ptr [ecx+ebx*4] 004012FE movaps xmm2, xmm0 00401301 mulps xmm1, xmmword ptr [eax+ebx*4] ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #2 - No declaration for _mm_cvtss_f32 intrinsic function in VS 2005 PE ] There is No a declaration for _mm_cvtss_f32 intrinsic function in VS 2005 Professional Edition ( PE ). VS 2005 PE supports SSE and SSE2 instruction sets but by some unexplained reason the function is Not declared in xmmintrin.h header file. The best example where the function could be used is a very useful Horizontal Addition function for SSE2 instruction set: ... /* Horizontal Addition */ float add_horizontal( const F32vec4 &a ) { F32vec4 ftemp = _mm_add_ps( a, _mm_movehl_ps( a, a )); ftemp = _mm_add_ss( ftemp, _mm_shuffle_ps( ftemp, ftemp, 1 ) ); return _mm_cvtss_f32( ftemp ); } ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #2 - No declaration for _mm_cvtss_f32 intrinsic function in VS 2005 PE ] [ Problem #2 - Workaround ] A workaround is very simple: ... /* Horizontal Addition */ float add_horizontal( const F32vec4 &a ) { F32vec4 ftemp = _mm_add_ps( a, _mm_movehl_ps( a, a )); ftemp = _mm_add_ss( ftemp, _mm_shuffle_ps( ftemp, ftemp, 1 ) ); return ( float )ftemp.m128_f32[0]; } ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #3 - Internal Linker Error of Intel C++ Compiler version 12 ( 32-bit ) when option /Qfnalign:16 is used ] ... Compiling with Intel(R) C++ Compiler XE 12.1.7.371 [IA-32]... (Intel C++ Environment) IccTestApp.cpp Linking... (Intel C++ Environment) ... (0): internal error: backend signals xilink: error #10014: problem during multi-file optimization compilation ( code 4 ) xilink: error #10014: problem during multi-file optimization compilation ( code 4 ) ... IccTestApp - 3 error(s), 1 warning(s), 0 remark(s) ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #3 - Internal Linker Error of Intel C++ Compiler version 12 ( 32-bit ) when option /Qfnalign:16 is used ] [ Problem #3 - No Workaround ] There is No a workaround and /Qfnalign:16 compiler option ( align all functions on 16-byte boundary ) should Not be used when C/C++ sources are compiled with Intel C++ compiler version 12.1.7 update 371.
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #4 - Double Unrolling - Intel C++ compiler version 12 ] [ Single Unrolling - Yes ] [ Command Line Options ] .../Qunroll:4 /Qunroll-aggressive... [ C Source Codes - Rolled Loop 1-in-1 ] ... for( j = 0; j < iRows; j += 1 ) // Rolled Iterations - 1-in-1 Iteration { pfC[ iRows*i+j ] += pfA[ iRows*i+k ] * pfB[ iRows*k+j ]; } ... [ Output ] ... Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.68700 secs > Test1099 End < Tests: Completed ... Note: ~4.9x faster when compared with a Double Unrolling case ( Command Line options - Yes / Sources - Yes ). It is faster because the loop was vectorized: ... ..\Common\PrtTests.cpp(31772): (col. 4) remark: LOOP WAS VECTORIZED. ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #4 - Double Unrolling - Intel C++ compiler version 12 ] [ Double Unrolling - Yes ] [ Command Line Options ] .../Qunroll:4 /Qunroll-aggressive... [ C Source Codes - UnRolled Loop 4-in-1 ] ... for( j = 0; j < iRows; j += 4 ) // UnRolled Iterations - 4-in-1 Iteration { pfC[ iRows*i+j ] += pfA[ iRows*i+k ] * pfB[ iRows*k+j ]; pfC[ iRows*i+j+1 ] += pfA[ iRows*i+k ] * pfB[ iRows*k+j+1 ]; pfC[ iRows*i+j+2 ] += pfA[ iRows*i+k ] * pfB[ iRows*k+j+2 ]; pfC[ iRows*i+j+3 ] += pfA[ iRows*i+k ] * pfB[ iRows*k+j+3 ]; } ... [ Output ] ... Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.34400 secs > Test1099 End < Tests: Completed ... Note: ~4.9x slower when compared with a Single Unrolling case ( Post #8 / Command Line options - Yes / Sources - No ).
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #4 - Double Unrolling - Intel C++ compiler version 12 ] [ Double Unrolling - Yes - Performance Summary ] [ Performance Results - UnRolled Loop 2-in-1 ] ... Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.59300 secs > Test1099 End < Tests: Completed ... [ Performance Results - UnRolled Loop 4-in-1 ] ... Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.73400 secs > Test1099 End < Tests: Completed ... [ Performance Results - UnRolled Loop 8-in-1 ] ... Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.28200 secs > Test1099 End < Tests: Completed ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #5 - Fatal error C1001: INTERNAL COMPILER ERROR - VS 2005 PE ] ... #ifdef _RTDATASET_DISPLAYDATA_NO #pragma warning ( disable : 4702 ) // Unreachable code #endif ... ... -------------------Configuration: ScaLib - Win32 Debug-------------------- Compiling... Stdphf.cpp *** ScaLib Message: Compiling with Visual Studio 98 *** *** ScaLib Message: Configuration - Desktop - _WIN32_MSC - DEBUG *** *** ScaLib Message: Compiling for Intel Processing Unit ( 32-bit ) *** ... Compiling... ScaLib.cpp AlgorithmSet.cpp AstroSet.cpp BaseSet.cpp c:\workenv\appsworkdev\appssca\scalib\dataset.h(1157) : fatal error C1001: INTERNAL COMPILER ERROR (compiler file 'msc1.cpp', line 1794) Please choose the Technical Support command on the Visual C++ Help menu, or open the Technical Support help file for more information CommonSet.cpp c:\workenv\appsworkdev\appssca\scalib\commonset.cpp(6) : fatal error C1033: cannot open program database 'c:\workenv\appsworkdev\appssca\scalib\debug\vc60.pdb' ... ... _RTINLINE T * operator[]( RTint iIndex ) { return ( T * )m_ptData2D[ iIndex ]; }; ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #6 - Inline assembler with OpenMP for-loops - Intel C++ compiler version 12 ] ... Compiling with Intel(R) C++ Compiler XE 12.1.7.371 [IA-32]... (Intel C++ Environment) IccTestApp.cpp Linking... (Intel C++ Environment) ipo: warning #11072: ignoring invalid linker directive 'NODEFAULTLIB:libcpmt.lib' xilink: executing 'link' Creating library Release/IccTestApp.lib and object Release/IccTestApp.exp ipo_12365obj.obj : error LNK2019: unresolved external symbol "private: static void __cdecl std::locale::_Locimp::_Locimp_dtor(class std:: locale::_Locimp *)" (?_Locimp_dtor@_Locimp@locale@std@@CAXPAV123@@Z) referenced in function "protected: virtual void * __thiscall std::locale::_Locimp::`scalar deleting destructor'(unsigned int)" (??_G_Locimp@locale@std@@MAEPAXI@Z) Release/IccTestApp.exe : fatal error LNK1120: 1 unresolved externals ...
0 Kudos
SergeyKostrov
Valued Contributor II
428 Views
[ Problem #7 - Allocation of large static arrays - Borland C++ compiler v5.5.1 ] ----- Build started: Project: BccTestApp, Configuration: Release Win32 ------ Performing Makefile project actions *** ScaLib Message: Compiling with Borland C++ compiler v5.5.1 *** *** ScaLib Message: Configuration - Desktop - _WIN32_BCC - RELEASE ( 32-bit ) *** *** ScaLib Message: Advanced ICC v12 Bat-Configuration *** Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000 Borland BccTestApp.cpp: *** ScaLib Message: Compiling with Borland C++ compiler v5.5.1 *** HrtALLib.asm: Turbo Assembler Version 5.0 Copyright (c) 1988, 1996 Borland International Assembling file: HrtALLib.ASM to RELEASE\HrtALLib.obj Error messages: None Warning messages: None Passes: 1 Turbo Incremental Link 5.00 Copyright (c) 1997, 2000 Borland Fatal: Error detected (LME1508) Fatal: Error detected (LME1508) Fatal: Error detected (LME1508) Fatal: Error detected (LME1508) BccTestApp - 4 error(s), 0 warning(s)
0 Kudos
Reply