- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Fastest version of a CRT-function 'memset' ***
[ Abstract ]
Moder C++ compilers allow to inline most CRT-functions. For example, Microsoft and Intel
C++ compilers have 'Enable Intrinsic Functions' option ( /Oi ). When that option is used
a C++ compiler generates highly optimized binary codes instead of calling a CRT-function
from a Run-Time Dynamic Link Library.
An analysis of several C++ compilers was completed in order to evaluate how they deal with
a simple call to CRT-function memset ( initializes a block of memory with a value ).
Link Copied
52 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Release ) - Summary - 32-bit Windows XP SP3 ]
Microsoft C++ compiler ( VS2005 PE ) 32-bit
...
[ CrtMemset ] - Executed in 836 clock cycles
[ CrtMemset ] - Executed in 380 clock cycles
[ CrtMemset ] - Executed in 380 clock cycles
[ CrtMemset ] - Executed in 380 clock cycles
[ CrtMemset ] - Executed in 380 clock cycles
[ CrtMemset ] - Executed in 376 clock cycles
[ CrtMemset ] - Executed in 388 clock cycles
[ CrtMemset ] - Executed in 380 clock cycles
[ CrtMemset ] - Executed in 380 clock cycles
[ CrtMemset ] - Executed in 376 clock cycles
...
Borland C++ compiler v5.5.1 32-bit
...
[ CrtMemset ] - Executed in 256 clock cycles
[ CrtMemset ] - Executed in 492 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 190 clock cycles
[ CrtMemset ] - Executed in 172 clock cycles
[ CrtMemset ] - Executed in 172 clock cycles
[ CrtMemset ] - Executed in 172 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
...
Intel C++ compiler v12.1.7 ( u371 ) 32-bit
...
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 89 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
...
MinGW C++ compiler v5.1.0 32-bit
...
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 162 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 156 clock cycles
...
Watcom C++ compiler v2.0.0 32-bit
...
[ CrtMemset ] - Executed in 736 clock cycles
[ CrtMemset ] - Executed in 212 clock cycles
[ CrtMemset ] - Executed in 260 clock cycles
[ CrtMemset ] - Executed in 212 clock cycles
[ CrtMemset ] - Executed in 220 clock cycles
[ CrtMemset ] - Executed in 212 clock cycles
[ CrtMemset ] - Executed in 212 clock cycles
[ CrtMemset ] - Executed in 260 clock cycles
[ CrtMemset ] - Executed in 264 clock cycles
[ CrtMemset ] - Executed in 212 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Release ) - Final Results - 32-bit Windows XP SP3 ]
Average is 088 clock cycles - Intel C++ compiler v12.1.7 ( u371 ) 32-bit
Average is 179 clock cycles - MinGW C++ compiler v5.1.0 32-bit
Average is 217 clock cycles - Borland C++ compiler v5.5.1 32-bit
Average is 280 clock cycles - Watcom C++ compiler v2.0.0 32-bit
Average is 426 clock cycles - Microsoft C++ compiler ( VS2005 PE ) 32-bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Debug ) - Summary - 64-bit Windows 7 SP1 ]
Microsoft C++ compiler ( VS2008 PE ) 64-bit
...
[ CrtMemset ] - Executed in 324 clock cycles
[ CrtMemset ] - Executed in 196 clock cycles
[ CrtMemset ] - Executed in 204 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
[ CrtMemset ] - Executed in 176 clock cycles
...
Intel C++ compiler v13.1.0 ( u149 ) 64-bit
...
[ CrtMemset ] - Executed in 336 clock cycles
[ CrtMemset ] - Executed in 56 clock cycles
[ CrtMemset ] - Executed in 52 clock cycles
[ CrtMemset ] - Executed in 60 clock cycles
[ CrtMemset ] - Executed in 60 clock cycles
[ CrtMemset ] - Executed in 44 clock cycles
[ CrtMemset ] - Executed in 60 clock cycles
[ CrtMemset ] - Executed in 60 clock cycles
[ CrtMemset ] - Executed in 60 clock cycles
[ CrtMemset ] - Executed in 60 clock cycles
...
MinGW C++ compiler v5.1.0 64-bit
...
[ CrtMemset ] - Executed in 252 clock cycles
[ CrtMemset ] - Executed in 216 clock cycles
[ CrtMemset ] - Executed in 216 clock cycles
[ CrtMemset ] - Executed in 216 clock cycles
[ CrtMemset ] - Executed in 204 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 204 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Debug ) - Final Results - 64-bit Windows 7 SP1 ]
Average is 085 clock cycles - Intel C++ compiler v13.1.0 ( u149 ) 64-bit
Average is 196 clock cycles - Microsoft C++ compiler ( VS2008 PE ) 64-bit
Average is 204 clock cycles - MinGW C++ compiler v5.1.0 64-bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Release ) - Summary - 64-bit Windows 7 SP1 ]
Microsoft C++ compiler ( VS2008 PE ) 64-bit
...
[ CrtMemset ] - Executed in 172 clock cycles
[ CrtMemset ] - Executed in 100 clock cycles
[ CrtMemset ] - Executed in 100 clock cycles
[ CrtMemset ] - Executed in 100 clock cycles
[ CrtMemset ] - Executed in 124 clock cycles
[ CrtMemset ] - Executed in 116 clock cycles
[ CrtMemset ] - Executed in 96 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 88 clock cycles
[ CrtMemset ] - Executed in 112 clock cycles
...
Intel C++ compiler v13.1.0 ( u149 ) 64-bit
...
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 24 clock cycles
[ CrtMemset ] - Executed in 24 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 24 clock cycles
...
MinGW C++ compiler v5.1.0 64-bit
...
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 24 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 28 clock cycles
[ CrtMemset ] - Executed in 48 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Release ) - Final Results - 64-bit Windows 7 SP1 ]
Average is 027 clock cycles - Intel C++ compiler v13.1.0 ( u149 ) 64-bit
Average is 030 clock cycles - MinGW C++ compiler v5.1.0 64-bit
Average is 110 clock cycles - Microsoft C++ compiler ( VS2008 PE ) 64-bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Release ]
[ Optimized version ]
...
00402DEE rdtsc
00402DF0 mov dword ptr [ebp-208h], eax
00402DF6 mov dword ptr [ebp-204h], edx
00402DFC pxor xmm0, xmm0
00402E00 movaps xmmword ptr [ebp-1F8h], xmm0
00402E07 movaps xmmword ptr [ebp-1E8h], xmm0
00402E0E movaps xmmword ptr [ebp-1D8h], xmm0
00402E15 movaps xmmword ptr [ebp-1C8h], xmm0
00402E1C movq mmword ptr [ebp-1B8h], xmm0
00402E24 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Release ]
[ Non Optimized version - #pragma optimize( "", off ) was used ]
...
0040261A rdtsc
0040261C mov dword ptr [ebp-20h], eax
0040261F mov dword ptr [ebp-1Ch], edx
00402622 mov eax, dword ptr [ebp-20h]
00402625 mov edx, dword ptr [ebp-1Ch]
00402628 mov dword ptr [ebp-18h], eax
0040262B mov dword ptr [ebp-14h], edx
0040262E mov eax, dword ptr [ebp+8]
00402631 mov edx, dword ptr [ebp+0Ch]
00402634 mov ecx, dword ptr [ebp+10h]
00402637 mov edi, eax
00402639 mov eax, edx
0040263B and eax, 0FFFFh
00402640 mov ah, al
00402642 mov edx, eax
00402644 shl eax, 10h
00402647 or eax, edx
00402649 mov esi, ecx
0040264B shr ecx, 2
0040264E mov edx, edi
00402650 rep stos dword ptr es:[edi]
00402652 mov ecx, esi
00402654 and ecx, 3
00402657 rep stos byte ptr es:[edi]
00402659 mov eax, edx
0040265B mov dword ptr [ebp-30h], eax
0040265E mov eax, dword ptr [ebp-30h]
00402661 mov dword ptr [ebp-34h], eax
00402664 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit - Release Binary codes ]
...
000000013FEB3552 rdtsc
000000013FEB3554 shl rdx, 20h
000000013FEB3558 or rax, rdx
000000013FEB355B mov qword ptr [rbp+1D0h], rax
000000013FEB3562 vmovups xmmword ptr [r13+10h], xmm6
000000013FEB3568 vmovups xmmword ptr [r13+20h], xmm6
000000013FEB356E vmovups xmmword ptr [r13+30h], xmm6
000000013FEB3574 vmovups xmmword ptr [r13+40h], xmm6
000000013FEB357A vmovups xmmword ptr [r13+50h], xmm6
000000013FEB3580 vmovups xmmword ptr [r13+60h], xmm6
000000013FEB3586 vmovups xmmword ptr [r13], xmm6
000000013FEB358C rdtsc
000000013FEB358E shl rdx, 20h
000000013FEB3592 or rax, rdx
000000013FEB3595 mov qword ptr [rbp+1D8h], rax
000000013FEB359C lea rcx, [13FECF640h]
000000013FEB35A3 mov rdx, qword ptr [rbp+1D8h]
000000013FEB35AA mov rbx, qword ptr [rbp+1D0h]
000000013FEB35B1 sub rdx, rbx
000000013FEB35B4 vzeroupper
000000013FEB35B7 call 000000013FEB6950
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Conclusion ]
For the given Test Case the most efficient binary codes generation and fastest
initialization of a block of memory at run-time was done by Intel C++ compiler ( 32-bit and
64-bit versions ).
Legacy 32-bit C++ compilers from Borland and Watcom outperformed 32-bit Microsoft C++
compiler but it doesn't mean that in a more complex test they will be competitive because
modern C++ compilers have built-in support of SIMD technology and these two legacy C++
compilers don't.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Command Line Options of C++ compilers ]
Command Line Options of C++ compilers used in these performance evaluations ( for Release configurations ) will be provided.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Borland C++ compiler v5.5.1 32-bit ]
-d -O2 -w -D_WIN32_BCC -DNDEBUG -5 -nRelease -eBccTestApp.exe -I"C:\WorkLib\MKL\Include" -L"C:\WorkLib\MKL\Lib\Ia32Bcc" -lS:33554432 BccTestApp.cpp HrtALLib.asm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 32-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-msse2
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-flto
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib"
-Xlinker
--stack=67108864
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 64-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-mavx
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib"
-Xlinker
--stack=1073741824
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit ]
[ Compiler ]
/O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\MscTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Wp64 /Zi /Gd /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2
[ Linker ]
/OUT:"Release/MscTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\MscTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ]
[ Compiler ]
/O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"x64\Release\ScaLibTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt
[ Linker ]
/OUT:"x64\Release/ScaLibTestApp64.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\ScaLibTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:1073741824 /LTCG /DYNAMICBASE:NO /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib64.lib"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ]
[ Compiler ]
/c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE121_300" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /W5 /nologo /Wp64 /Zi /Gd /TP /Qdiag-disable:2012 /Qdiag-disable:2013 /Qdiag-disable:2014 /Qdiag-disable:2015 /Qdiag-disable:2017 /Qdiag-disable:2021 /Qdiag-disable:2022 /Qdiag-disable:2304 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qparallel /Qstd=c++0x /Qrestrict /Qdiag-disable:111,673,10121
/Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2
[ Linker ]
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /MACHINE:X86 /qdiag-disable:111,673,10121
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ]
[ Compiler ]
/c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl
[ Linker ]
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 32-bit ]
WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k268435456 -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2011\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Conclusion 2 ]
For the given Test Case the most efficient binary codes generation and fastest
initialization of a block of memory at run-time was done by Intel C++ compiler ( 32-bit and
64-bit versions ).
Legacy 32-bit C++ compilers from Borland and Watcom outperformed 32-bit Microsoft C++
compiler but it doesn't mean that in a more complex test they will be competitive because
modern C++ compilers have built-in support of SIMD technology and these two legacy C++
compilers don't.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The main speedup in memset, when it's applicable, is from the use of non-temporal/streaming stores. It's usually a library call, not necessarily provided by the compiler.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page