Software Archive
Read-only legacy content
17061 Discussions

Fastest version of a CRT-function 'memset'

SergeyKostrov
Valued Contributor II
2,136 Views
*** Fastest version of a CRT-function 'memset' *** [ Abstract ] Moder C++ compilers allow to inline most CRT-functions. For example, Microsoft and Intel C++ compilers have 'Enable Intrinsic Functions' option ( /Oi ). When that option is used a C++ compiler generates highly optimized binary codes instead of calling a CRT-function from a Run-Time Dynamic Link Library. An analysis of several C++ compilers was completed in order to evaluate how they deal with a simple call to CRT-function memset ( initializes a block of memory with a value ).
0 Kudos
52 Replies
SergeyKostrov
Valued Contributor II
468 Views
[ Performance Evaluation ( Release ) - Summary - 32-bit Windows XP SP3 ] Microsoft C++ compiler ( VS2005 PE ) 32-bit ... [ CrtMemset ] - Executed in 836 clock cycles [ CrtMemset ] - Executed in 380 clock cycles [ CrtMemset ] - Executed in 380 clock cycles [ CrtMemset ] - Executed in 380 clock cycles [ CrtMemset ] - Executed in 380 clock cycles [ CrtMemset ] - Executed in 376 clock cycles [ CrtMemset ] - Executed in 388 clock cycles [ CrtMemset ] - Executed in 380 clock cycles [ CrtMemset ] - Executed in 380 clock cycles [ CrtMemset ] - Executed in 376 clock cycles ... Borland C++ compiler v5.5.1 32-bit ... [ CrtMemset ] - Executed in 256 clock cycles [ CrtMemset ] - Executed in 492 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 190 clock cycles [ CrtMemset ] - Executed in 172 clock cycles [ CrtMemset ] - Executed in 172 clock cycles [ CrtMemset ] - Executed in 172 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 180 clock cycles ... Intel C++ compiler v12.1.7 ( u371 ) 32-bit ... [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 89 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles ... MinGW C++ compiler v5.1.0 32-bit ... [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 162 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 156 clock cycles ... Watcom C++ compiler v2.0.0 32-bit ... [ CrtMemset ] - Executed in 736 clock cycles [ CrtMemset ] - Executed in 212 clock cycles [ CrtMemset ] - Executed in 260 clock cycles [ CrtMemset ] - Executed in 212 clock cycles [ CrtMemset ] - Executed in 220 clock cycles [ CrtMemset ] - Executed in 212 clock cycles [ CrtMemset ] - Executed in 212 clock cycles [ CrtMemset ] - Executed in 260 clock cycles [ CrtMemset ] - Executed in 264 clock cycles [ CrtMemset ] - Executed in 212 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Performance Evaluation ( Release ) - Final Results - 32-bit Windows XP SP3 ] Average is 088 clock cycles - Intel C++ compiler v12.1.7 ( u371 ) 32-bit Average is 179 clock cycles - MinGW C++ compiler v5.1.0 32-bit Average is 217 clock cycles - Borland C++ compiler v5.5.1 32-bit Average is 280 clock cycles - Watcom C++ compiler v2.0.0 32-bit Average is 426 clock cycles - Microsoft C++ compiler ( VS2005 PE ) 32-bit
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Performance Evaluation ( Debug ) - Summary - 64-bit Windows 7 SP1 ] Microsoft C++ compiler ( VS2008 PE ) 64-bit ... [ CrtMemset ] - Executed in 324 clock cycles [ CrtMemset ] - Executed in 196 clock cycles [ CrtMemset ] - Executed in 204 clock cycles [ CrtMemset ] - Executed in 176 clock cycles [ CrtMemset ] - Executed in 176 clock cycles [ CrtMemset ] - Executed in 176 clock cycles [ CrtMemset ] - Executed in 176 clock cycles [ CrtMemset ] - Executed in 176 clock cycles [ CrtMemset ] - Executed in 176 clock cycles [ CrtMemset ] - Executed in 176 clock cycles ... Intel C++ compiler v13.1.0 ( u149 ) 64-bit ... [ CrtMemset ] - Executed in 336 clock cycles [ CrtMemset ] - Executed in 56 clock cycles [ CrtMemset ] - Executed in 52 clock cycles [ CrtMemset ] - Executed in 60 clock cycles [ CrtMemset ] - Executed in 60 clock cycles [ CrtMemset ] - Executed in 44 clock cycles [ CrtMemset ] - Executed in 60 clock cycles [ CrtMemset ] - Executed in 60 clock cycles [ CrtMemset ] - Executed in 60 clock cycles [ CrtMemset ] - Executed in 60 clock cycles ... MinGW C++ compiler v5.1.0 64-bit ... [ CrtMemset ] - Executed in 252 clock cycles [ CrtMemset ] - Executed in 216 clock cycles [ CrtMemset ] - Executed in 216 clock cycles [ CrtMemset ] - Executed in 216 clock cycles [ CrtMemset ] - Executed in 204 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 204 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 184 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Performance Evaluation ( Debug ) - Final Results - 64-bit Windows 7 SP1 ] Average is 085 clock cycles - Intel C++ compiler v13.1.0 ( u149 ) 64-bit Average is 196 clock cycles - Microsoft C++ compiler ( VS2008 PE ) 64-bit Average is 204 clock cycles - MinGW C++ compiler v5.1.0 64-bit
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Performance Evaluation ( Release ) - Summary - 64-bit Windows 7 SP1 ] Microsoft C++ compiler ( VS2008 PE ) 64-bit ... [ CrtMemset ] - Executed in 172 clock cycles [ CrtMemset ] - Executed in 100 clock cycles [ CrtMemset ] - Executed in 100 clock cycles [ CrtMemset ] - Executed in 100 clock cycles [ CrtMemset ] - Executed in 124 clock cycles [ CrtMemset ] - Executed in 116 clock cycles [ CrtMemset ] - Executed in 96 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 88 clock cycles [ CrtMemset ] - Executed in 112 clock cycles ... Intel C++ compiler v13.1.0 ( u149 ) 64-bit ... [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 24 clock cycles [ CrtMemset ] - Executed in 24 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 24 clock cycles ... MinGW C++ compiler v5.1.0 64-bit ... [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 24 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 28 clock cycles [ CrtMemset ] - Executed in 48 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Performance Evaluation ( Release ) - Final Results - 64-bit Windows 7 SP1 ] Average is 027 clock cycles - Intel C++ compiler v13.1.0 ( u149 ) 64-bit Average is 030 clock cycles - MinGW C++ compiler v5.1.0 64-bit Average is 110 clock cycles - Microsoft C++ compiler ( VS2008 PE ) 64-bit
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Release ] [ Optimized version ] ... 00402DEE rdtsc 00402DF0 mov dword ptr [ebp-208h], eax 00402DF6 mov dword ptr [ebp-204h], edx 00402DFC pxor xmm0, xmm0 00402E00 movaps xmmword ptr [ebp-1F8h], xmm0 00402E07 movaps xmmword ptr [ebp-1E8h], xmm0 00402E0E movaps xmmword ptr [ebp-1D8h], xmm0 00402E15 movaps xmmword ptr [ebp-1C8h], xmm0 00402E1C movq mmword ptr [ebp-1B8h], xmm0 00402E24 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Release ] [ Non Optimized version - #pragma optimize( "", off ) was used ] ... 0040261A rdtsc 0040261C mov dword ptr [ebp-20h], eax 0040261F mov dword ptr [ebp-1Ch], edx 00402622 mov eax, dword ptr [ebp-20h] 00402625 mov edx, dword ptr [ebp-1Ch] 00402628 mov dword ptr [ebp-18h], eax 0040262B mov dword ptr [ebp-14h], edx 0040262E mov eax, dword ptr [ebp+8] 00402631 mov edx, dword ptr [ebp+0Ch] 00402634 mov ecx, dword ptr [ebp+10h] 00402637 mov edi, eax 00402639 mov eax, edx 0040263B and eax, 0FFFFh 00402640 mov ah, al 00402642 mov edx, eax 00402644 shl eax, 10h 00402647 or eax, edx 00402649 mov esi, ecx 0040264B shr ecx, 2 0040264E mov edx, edi 00402650 rep stos dword ptr es:[edi] 00402652 mov ecx, esi 00402654 and ecx, 3 00402657 rep stos byte ptr es:[edi] 00402659 mov eax, edx 0040265B mov dword ptr [ebp-30h], eax 0040265E mov eax, dword ptr [ebp-30h] 00402661 mov dword ptr [ebp-34h], eax 00402664 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit - Release Binary codes ] ... 000000013FEB3552 rdtsc 000000013FEB3554 shl rdx, 20h 000000013FEB3558 or rax, rdx 000000013FEB355B mov qword ptr [rbp+1D0h], rax 000000013FEB3562 vmovups xmmword ptr [r13+10h], xmm6 000000013FEB3568 vmovups xmmword ptr [r13+20h], xmm6 000000013FEB356E vmovups xmmword ptr [r13+30h], xmm6 000000013FEB3574 vmovups xmmword ptr [r13+40h], xmm6 000000013FEB357A vmovups xmmword ptr [r13+50h], xmm6 000000013FEB3580 vmovups xmmword ptr [r13+60h], xmm6 000000013FEB3586 vmovups xmmword ptr [r13], xmm6 000000013FEB358C rdtsc 000000013FEB358E shl rdx, 20h 000000013FEB3592 or rax, rdx 000000013FEB3595 mov qword ptr [rbp+1D8h], rax 000000013FEB359C lea rcx, [13FECF640h] 000000013FEB35A3 mov rdx, qword ptr [rbp+1D8h] 000000013FEB35AA mov rbx, qword ptr [rbp+1D0h] 000000013FEB35B1 sub rdx, rbx 000000013FEB35B4 vzeroupper 000000013FEB35B7 call 000000013FEB6950 ...
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Conclusion ] For the given Test Case the most efficient binary codes generation and fastest initialization of a block of memory at run-time was done by Intel C++ compiler ( 32-bit and 64-bit versions ). Legacy 32-bit C++ compilers from Borland and Watcom outperformed 32-bit Microsoft C++ compiler but it doesn't mean that in a more complex test they will be competitive because modern C++ compilers have built-in support of SIMD technology and these two legacy C++ compilers don't.
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Command Line Options of C++ compilers ] Command Line Options of C++ compilers used in these performance evaluations ( for Release configurations ) will be provided.
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Borland C++ compiler v5.5.1 32-bit ] -d -O2 -w -D_WIN32_BCC -DNDEBUG -5 -nRelease -eBccTestApp.exe -I"C:\WorkLib\MKL\Include" -L"C:\WorkLib\MKL\Lib\Ia32Bcc" -lS:33554432 BccTestApp.cpp HrtALLib.asm
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ MinGW C++ compiler v5.1.0 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ MinGW C++ compiler v5.1.0 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\MscTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Wp64 /Zi /Gd /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/MscTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\MscTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"x64\Release\ScaLibTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt [ Linker ] /OUT:"x64\Release/ScaLibTestApp64.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\ScaLibTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:1073741824 /LTCG /DYNAMICBASE:NO /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib64.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE121_300" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /W5 /nologo /Wp64 /Zi /Gd /TP /Qdiag-disable:2012 /Qdiag-disable:2013 /Qdiag-disable:2014 /Qdiag-disable:2015 /Qdiag-disable:2017 /Qdiag-disable:2021 /Qdiag-disable:2022 /Qdiag-disable:2304 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qparallel /Qstd=c++0x /Qrestrict /Qdiag-disable:111,673,10121 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /MACHINE:X86 /qdiag-disable:111,673,10121
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
0 Kudos
SergeyKostrov
Valued Contributor II
468 Views
[ Watcom C++ compiler v2.0.0 32-bit ] WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k268435456 -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2011\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
SergeyKostrov
Valued Contributor II
495 Views
[ Conclusion 2 ] For the given Test Case the most efficient binary codes generation and fastest initialization of a block of memory at run-time was done by Intel C++ compiler ( 32-bit and 64-bit versions ). Legacy 32-bit C++ compilers from Borland and Watcom outperformed 32-bit Microsoft C++ compiler but it doesn't mean that in a more complex test they will be competitive because modern C++ compilers have built-in support of SIMD technology and these two legacy C++ compilers don't.
0 Kudos
TimP
Honored Contributor III
495 Views

The main speedup in memset, when it's applicable, is from the use of non-temporal/streaming stores.  It's usually a library call, not necessarily provided by the compiler.

0 Kudos
Reply