Software Archive
Read-only legacy content
17061 Discussions

Fastest version of a CRT-function 'memset'

SergeyKostrov
Valued Contributor II
1,745 Views
*** Fastest version of a CRT-function 'memset' *** [ Abstract ] Moder C++ compilers allow to inline most CRT-functions. For example, Microsoft and Intel C++ compilers have 'Enable Intrinsic Functions' option ( /Oi ). When that option is used a C++ compiler generates highly optimized binary codes instead of calling a CRT-function from a Run-Time Dynamic Link Library. An analysis of several C++ compilers was completed in order to evaluate how they deal with a simple call to CRT-function memset ( initializes a block of memory with a value ).
0 Kudos
52 Replies
SergeyKostrov
Valued Contributor II
1,030 Views
[ Test Case - C codes ] ... typedef struct tagALIGNOFDATA { RTint iAlignofValue[9]; RTtchar *pszTypeName[9]; } ALIGNOFDATA; ALIGNOFDATA aod; CrtMemset( &aod, 0x0, sizeof( ALIGNOFDATA ) ); ... ... _RTINLINE RTvoid * CrtMemset( RTvoid *pvDest, RTint iValue, RTsize_t iCount ) { _RTvolatile RTuint64 uiClock1 = IrtRdtsc(); memset( pvDest, iValue, iCount ); _RTvolatile RTuint64 uiClock2 = IrtRdtsc(); CrtPrintf( RTU("[ CrtMemset ] - Executed in %u clock cycles\n"), ( RTuint )( uiClock2 - uiClock1 ) ); return ( RTvoid * )pvDest; } ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Evaluation was done using eight C++ compilers ] // 32-bit C++ compilers Microsoft C++ compiler ( VS2005 PE ) 32-bit Borland C++ compiler v5.5.1 32-bit Intel C++ compiler v12.1.7 ( u371 ) 32-bit MinGW C++ compiler v5.1.0 32-bit Watcom C++ compiler v2.0.0 32-bit // 64-bit C++ compilers Microsoft C++ compiler ( VS2008 PE ) 64-bit Intel C++ compiler v13.1.0 ( u149 ) 64-bit MinGW C++ compiler v5.1.0 64-bit
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit - Debug Binary codes ] ... 100143EE rdtsc 100143F0 mov dword ptr [uiClock1], eax 100143F3 mov dword ptr [ebp-8], edx 100143F6 mov eax, dword ptr [iCount] 100143F9 push eax 100143FA mov ecx, dword ptr [iValue] 100143FD push ecx 100143FE mov edx, dword ptr [pvDest] 10014401 push edx 10014402 call @ILT+400(_memset) (10011195h) 10014407 add esp, 0Ch 1001440A rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit - Release Binary codes ] ... 00403583 rdtsc 00403585 push 48h 00403587 lea ecx, [esp+4Ch] 0040358B push 0 0040358D mov dword ptr [esp+20h], eax 00403591 mov dword ptr [esp+24h], edx 00403595 push ecx 00403596 call 004066B0 0040359B rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Borland C++ compiler v5.5.1 32-bit - Debug Binary codes ] ... 00403D77 call 004047FC 00403D7C mov dword ptr [ebp-8], eax 00403D7F mov dword ptr [ebp-4], edx 00403D82 push dword ptr [ebp+10h] 00403D85 push dword ptr [ebp+0Ch] 00403D88 push dword ptr [ebp+8] 00403D8B call 00405AB4 00403D90 add esp, 0Ch 00403D93 call 004047FC ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Borland C++ compiler v5.5.1 32-bit - Release Binary codes ] ... 00402E9E call 00404550 00402EA3 mov dword ptr [ebp-19Ch], eax 00402EA9 mov dword ptr [ebp-198h], edx 00402EAF push 48h 00402EB1 push 0 00402EB3 lea eax, [ebp-2F4h] 00402EB9 push eax 00402EBA call 00405838 00402EBF add esp, 0Ch 00402EC2 call 00404550 ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Debug Binary codes ] ... 00402438 rdtsc 0040243A mov dword ptr [ebp-20h], eax 0040243D mov dword ptr [ebp-1Ch], edx 00402440 mov byte ptr [ebp-2Ch], 1 00402444 mov eax, dword ptr [ebp-20h] 00402447 mov edx, dword ptr [ebp-1Ch] 0040244A mov dword ptr [uiClock1], eax 0040244D mov dword ptr [ebp-14h], edx 00402450 add esp, 0FFFFFFF4h 00402453 mov eax, dword ptr [pvDest] 00402456 mov dword ptr [esp], eax 00402459 mov eax, dword ptr [iValue] 0040245C mov dword ptr [esp+4], eax 00402460 mov eax, dword ptr [iCount] 00402463 mov dword ptr [esp+8], eax 00402467 call memset (414770h) 0040246C add esp, 0Ch 0040246F mov dword ptr [ebp-28h], eax 00402472 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Release Binary codes ] ... 00402DEE rdtsc 00402DF0 mov dword ptr [ebp-208h], eax 00402DF6 mov dword ptr [ebp-204h], edx 00402DFC pxor xmm0, xmm0 00402E00 movaps xmmword ptr [ebp-1F8h], xmm0 00402E07 movaps xmmword ptr [ebp-1E8h], xmm0 00402E0E movaps xmmword ptr [ebp-1D8h], xmm0 00402E15 movaps xmmword ptr [ebp-1C8h], xmm0 00402E1C movq mmword ptr [ebp-1B8h], xmm0 00402E24 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ MinGW C++ compiler v5.1.0 32-bit - Debug Binary codes ] ... 00406459 rdtsc 0040645E mov dword ptr [ebp-10h], eax 00406461 mov dword ptr [ebp-0Ch], edx 00406464 mov eax, dword ptr [ebp+10h] 00406467 mov dword ptr [esp+8], eax 0040646B mov eax, dword ptr [ebp+0Ch] 0040646E mov dword ptr [esp+4], eax 00406472 mov eax, dword ptr [ebp+8] 00406475 mov dword ptr [esp], eax 00406478 call 00406124 0040647D rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ MinGW C++ compiler v5.1.0 32-bit - Release Binary codes ] ... 00401F78 rdtsc 00401F7A mov dword ptr [esp+100h], eax 00401F81 mov dword ptr [esp+104h], edx 00401F88 mov ebx, dword ptr [esp+100h] 00401F8F mov edx, dword ptr [esp+104h] 00401F96 mov eax, dword ptr [esp+0F8h] 00401F9D mov dword ptr [esp], 40B89Ch 00401FA4 mov ecx, dword ptr [esp+0FCh] 00401FAB sub ebx, eax 00401FAD mov dword ptr [esp+4], ebx 00401FB1 call 004079DC 00401FB6 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Watcom C++ compiler v2.0.0 32-bit - Debug Binary codes ] ... 0040560F rdtsc 00405611 mov ecx, eax 00405613 mov eax, edx 00405615 mov dword ptr [ebp-110h], ecx 0040561B mov dword ptr [ebp-10Ch], eax 00405621 mov ebx, dword ptr [ebp-108h] 00405627 mov edx, dword ptr [ebp-104h] 0040562D mov eax, dword ptr [ebp-100h] 00405633 call 0040A1A0 00405638 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Watcom C++ compiler v2.0.0 32-bit - Release Binary codes ] ... 00402D13 rdtsc 00402D15 mov dword ptr [esp+1D0h], eax 00402D1C mov dword ptr [esp+1D4h], edx 00402D23 mov ebx, 48h 00402D28 mov eax, esp 00402D2A xor edx, edx 00402D2C call 00404D50 00402D31 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit - Debug Binary codes ] ... 0000000180003F0C rdtsc 0000000180003F0E shl rdx, 20h 0000000180003F12 or rax, rdx 0000000180003F15 mov qword ptr [uiClock1], rax 0000000180003F1A mov r8, qword ptr [iCount] 0000000180003F1F mov edx, dword ptr [iValue] 0000000180003F23 mov rcx, qword ptr [pvDest] 0000000180003F28 call memset (18000B1A2h) 0000000180003F2D rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit - Release Binary codes ] ... 0000000140003A8C rdtsc 0000000140003A8E shl rdx, 20h 0000000140003A92 lea rcx, [rbp+10h] 0000000140003A96 or rax, rdx 0000000140003A99 xor edx, edx 0000000140003A9B lea r8d, [rdx+70h] 0000000140003A9F mov qword ptr [rbp], rax 0000000140003AA3 call 000000014000BBC0 0000000140003AA8 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit - Debug Binary codes ] ... 000000013FBF291C rdtsc 000000013FBF291E shl rdx, 20h 000000013FBF2922 or rax, rdx 000000013FBF2925 mov qword ptr [rbp+8], rax 000000013FBF2929 mov byte ptr [rbp], 1 000000013FBF292D mov rax, qword ptr [rbp+8] 000000013FBF2931 mov qword ptr [uiClock1], rax 000000013FBF2935 mov rax, qword ptr [pvDest] 000000013FBF2939 mov edx, dword ptr [iValue] 000000013FBF293C mov rcx, qword ptr [iCount] 000000013FBF2940 mov qword ptr [rbp+40h], rcx 000000013FBF2944 mov rcx, rax 000000013FBF2947 mov rax, qword ptr [rbp+40h] 000000013FBF294B mov r8, rax 000000013FBF294E call memset (13FC08BE0h) 000000013FBF2953 mov qword ptr [rbp+18h], rax 000000013FBF2957 rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit - Release Binary codes ] ... 000000013F683552 rdtsc 000000013F683554 shl rdx, 20h 000000013F683558 or rax, rdx 000000013F68355B mov qword ptr [rbp+1D0h], rax 000000013F683562 vmovups xmmword ptr [r13+10h], xmm6 000000013F683568 vmovups xmmword ptr [r13+20h], xmm6 000000013F68356E vmovups xmmword ptr [r13+30h], xmm6 000000013F683574 vmovups xmmword ptr [r13+40h], xmm6 000000013F68357A vmovups xmmword ptr [r13+50h], xmm6 000000013F683580 vmovups xmmword ptr [r13+60h], xmm6 000000013F683586 vmovups xmmword ptr [r13], xmm6 000000013F68358C rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ MinGW C++ compiler v5.1.0 64-bit - Debug Binary codes ] ... 000000000040728C call 0000000000407260 0000000000407291 mov qword ptr [rbp-8], rax 0000000000407295 mov rdx, qword ptr [rbp+20h] 0000000000407299 mov eax, dword ptr [rbp+18h] 000000000040729C mov r8, rdx 000000000040729F mov edx, eax 00000000004072A1 mov rcx, qword ptr [rbp+10h] 00000000004072A5 call 0000000000406DC8 00000000004072AA call 0000000000407260 ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ MinGW C++ compiler v5.1.0 64-bit - Release Binary codes ] ... 0000000000402DD6 rdtsc 0000000000402DD8 shl rdx, 20h 0000000000402DDC or rax, rdx 0000000000402DDF mov qword ptr [rbp+38h], rax 0000000000402DE3 mov rdx, qword ptr [rbp+38h] 0000000000402DE7 mov rcx, qword ptr [rbp+30h] 0000000000402DEB sub edx, ecx 0000000000402DED lea rcx, [40B938h] 0000000000402DF4 call 0000000000407690 0000000000402DFB rdtsc ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,030 Views
[ Performance Evaluation ( Debug ) - Summary - 32-bit Windows XP SP3 ] Microsoft C++ compiler ( VS2005 PE ) 32-bit ... [ CrtMemset ] - Executed in 316 clock cycles [ CrtMemset ] - Executed in 424 clock cycles [ CrtMemset ] - Executed in 388 clock cycles [ CrtMemset ] - Executed in 344 clock cycles [ CrtMemset ] - Executed in 336 clock cycles [ CrtMemset ] - Executed in 336 clock cycles [ CrtMemset ] - Executed in 336 clock cycles [ CrtMemset ] - Executed in 372 clock cycles [ CrtMemset ] - Executed in 336 clock cycles [ CrtMemset ] - Executed in 332 clock cycles ... Borland C++ compiler v5.5.1 32-bit ... [ CrtMemset ] - Executed in 536 clock cycles [ CrtMemset ] - Executed in 188 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 188 clock cycles [ CrtMemset ] - Executed in 188 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 188 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 188 clock cycles ... Intel C++ compiler v12.1.7 ( u371 ) 32-bit ... [ CrtMemset ] - Executed in 344 clock cycles [ CrtMemset ] - Executed in 296 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 184 clock cycles [ CrtMemset ] - Executed in 180 clock cycles [ CrtMemset ] - Executed in 172 clock cycles ... MinGW C++ compiler v5.1.0 32-bit ... [ CrtMemset ] - Executed in 728 clock cycles [ CrtMemset ] - Executed in 412 clock cycles [ CrtMemset ] - Executed in 324 clock cycles [ CrtMemset ] - Executed in 320 clock cycles [ CrtMemset ] - Executed in 328 clock cycles [ CrtMemset ] - Executed in 320 clock cycles [ CrtMemset ] - Executed in 324 clock cycles [ CrtMemset ] - Executed in 324 clock cycles [ CrtMemset ] - Executed in 320 clock cycles [ CrtMemset ] - Executed in 328 clock cycles ... Watcom C++ compiler v2.0.0 32-bit ... [ CrtMemset ] - Executed in 784 clock cycles [ CrtMemset ] - Executed in 268 clock cycles [ CrtMemset ] - Executed in 260 clock cycles [ CrtMemset ] - Executed in 264 clock cycles [ CrtMemset ] - Executed in 264 clock cycles [ CrtMemset ] - Executed in 464 clock cycles [ CrtMemset ] - Executed in 256 clock cycles [ CrtMemset ] - Executed in 264 clock cycles [ CrtMemset ] - Executed in 264 clock cycles [ CrtMemset ] - Executed in 260 clock cycles ...
0 Kudos
SergeyKostrov
Valued Contributor II
939 Views
[ Performance Evaluation ( Debug ) - Final Results - 32-bit Windows XP SP3 ] Average is 208 clock cycles - Intel C++ compiler v12.1.7 ( u371 ) 32-bit Average is 221 clock cycles - Borland C++ compiler v5.5.1 32-bit Average is 335 clock cycles - Watcom C++ compiler v2.0.0 32-bit Average is 352 clock cycles - Microsoft C++ compiler ( VS2005 PE ) 32-bit Average is 373 clock cycles - MinGW C++ compiler v5.1.0 32-bit
0 Kudos
Reply