- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Fastest version of a CRT-function 'memset' ***
[ Abstract ]
Moder C++ compilers allow to inline most CRT-functions. For example, Microsoft and Intel
C++ compilers have 'Enable Intrinsic Functions' option ( /Oi ). When that option is used
a C++ compiler generates highly optimized binary codes instead of calling a CRT-function
from a Run-Time Dynamic Link Library.
An analysis of several C++ compilers was completed in order to evaluate how they deal with
a simple call to CRT-function memset ( initializes a block of memory with a value ).
Link Copied
52 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case - C codes ]
...
typedef struct tagALIGNOFDATA
{
RTint iAlignofValue[9];
RTtchar *pszTypeName[9];
} ALIGNOFDATA;
ALIGNOFDATA aod;
CrtMemset( &aod, 0x0, sizeof( ALIGNOFDATA ) );
...
...
_RTINLINE RTvoid * CrtMemset( RTvoid *pvDest, RTint iValue, RTsize_t iCount )
{
_RTvolatile RTuint64 uiClock1 = IrtRdtsc();
memset( pvDest, iValue, iCount );
_RTvolatile RTuint64 uiClock2 = IrtRdtsc();
CrtPrintf( RTU("[ CrtMemset ] - Executed in %u clock cycles\n"),
( RTuint )( uiClock2 - uiClock1 ) );
return ( RTvoid * )pvDest;
}
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Evaluation was done using eight C++ compilers ]
// 32-bit C++ compilers
Microsoft C++ compiler ( VS2005 PE ) 32-bit
Borland C++ compiler v5.5.1 32-bit
Intel C++ compiler v12.1.7 ( u371 ) 32-bit
MinGW C++ compiler v5.1.0 32-bit
Watcom C++ compiler v2.0.0 32-bit
// 64-bit C++ compilers
Microsoft C++ compiler ( VS2008 PE ) 64-bit
Intel C++ compiler v13.1.0 ( u149 ) 64-bit
MinGW C++ compiler v5.1.0 64-bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit - Debug Binary codes ]
...
100143EE rdtsc
100143F0 mov dword ptr [uiClock1], eax
100143F3 mov dword ptr [ebp-8], edx
100143F6 mov eax, dword ptr [iCount]
100143F9 push eax
100143FA mov ecx, dword ptr [iValue]
100143FD push ecx
100143FE mov edx, dword ptr [pvDest]
10014401 push edx
10014402 call @ILT+400(_memset) (10011195h)
10014407 add esp, 0Ch
1001440A rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit - Release Binary codes ]
...
00403583 rdtsc
00403585 push 48h
00403587 lea ecx, [esp+4Ch]
0040358B push 0
0040358D mov dword ptr [esp+20h], eax
00403591 mov dword ptr [esp+24h], edx
00403595 push ecx
00403596 call 004066B0
0040359B rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Borland C++ compiler v5.5.1 32-bit - Debug Binary codes ]
...
00403D77 call 004047FC
00403D7C mov dword ptr [ebp-8], eax
00403D7F mov dword ptr [ebp-4], edx
00403D82 push dword ptr [ebp+10h]
00403D85 push dword ptr [ebp+0Ch]
00403D88 push dword ptr [ebp+8]
00403D8B call 00405AB4
00403D90 add esp, 0Ch
00403D93 call 004047FC
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Borland C++ compiler v5.5.1 32-bit - Release Binary codes ]
...
00402E9E call 00404550
00402EA3 mov dword ptr [ebp-19Ch], eax
00402EA9 mov dword ptr [ebp-198h], edx
00402EAF push 48h
00402EB1 push 0
00402EB3 lea eax, [ebp-2F4h]
00402EB9 push eax
00402EBA call 00405838
00402EBF add esp, 0Ch
00402EC2 call 00404550
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Debug Binary codes ]
...
00402438 rdtsc
0040243A mov dword ptr [ebp-20h], eax
0040243D mov dword ptr [ebp-1Ch], edx
00402440 mov byte ptr [ebp-2Ch], 1
00402444 mov eax, dword ptr [ebp-20h]
00402447 mov edx, dword ptr [ebp-1Ch]
0040244A mov dword ptr [uiClock1], eax
0040244D mov dword ptr [ebp-14h], edx
00402450 add esp, 0FFFFFFF4h
00402453 mov eax, dword ptr [pvDest]
00402456 mov dword ptr [esp], eax
00402459 mov eax, dword ptr [iValue]
0040245C mov dword ptr [esp+4], eax
00402460 mov eax, dword ptr [iCount]
00402463 mov dword ptr [esp+8], eax
00402467 call memset (414770h)
0040246C add esp, 0Ch
0040246F mov dword ptr [ebp-28h], eax
00402472 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit - Release Binary codes ]
...
00402DEE rdtsc
00402DF0 mov dword ptr [ebp-208h], eax
00402DF6 mov dword ptr [ebp-204h], edx
00402DFC pxor xmm0, xmm0
00402E00 movaps xmmword ptr [ebp-1F8h], xmm0
00402E07 movaps xmmword ptr [ebp-1E8h], xmm0
00402E0E movaps xmmword ptr [ebp-1D8h], xmm0
00402E15 movaps xmmword ptr [ebp-1C8h], xmm0
00402E1C movq mmword ptr [ebp-1B8h], xmm0
00402E24 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 32-bit - Debug Binary codes ]
...
00406459 rdtsc
0040645E mov dword ptr [ebp-10h], eax
00406461 mov dword ptr [ebp-0Ch], edx
00406464 mov eax, dword ptr [ebp+10h]
00406467 mov dword ptr [esp+8], eax
0040646B mov eax, dword ptr [ebp+0Ch]
0040646E mov dword ptr [esp+4], eax
00406472 mov eax, dword ptr [ebp+8]
00406475 mov dword ptr [esp], eax
00406478 call 00406124
0040647D rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 32-bit - Release Binary codes ]
...
00401F78 rdtsc
00401F7A mov dword ptr [esp+100h], eax
00401F81 mov dword ptr [esp+104h], edx
00401F88 mov ebx, dword ptr [esp+100h]
00401F8F mov edx, dword ptr [esp+104h]
00401F96 mov eax, dword ptr [esp+0F8h]
00401F9D mov dword ptr [esp], 40B89Ch
00401FA4 mov ecx, dword ptr [esp+0FCh]
00401FAB sub ebx, eax
00401FAD mov dword ptr [esp+4], ebx
00401FB1 call 004079DC
00401FB6 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 32-bit - Debug Binary codes ]
...
0040560F rdtsc
00405611 mov ecx, eax
00405613 mov eax, edx
00405615 mov dword ptr [ebp-110h], ecx
0040561B mov dword ptr [ebp-10Ch], eax
00405621 mov ebx, dword ptr [ebp-108h]
00405627 mov edx, dword ptr [ebp-104h]
0040562D mov eax, dword ptr [ebp-100h]
00405633 call 0040A1A0
00405638 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 32-bit - Release Binary codes ]
...
00402D13 rdtsc
00402D15 mov dword ptr [esp+1D0h], eax
00402D1C mov dword ptr [esp+1D4h], edx
00402D23 mov ebx, 48h
00402D28 mov eax, esp
00402D2A xor edx, edx
00402D2C call 00404D50
00402D31 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit - Debug Binary codes ]
...
0000000180003F0C rdtsc
0000000180003F0E shl rdx, 20h
0000000180003F12 or rax, rdx
0000000180003F15 mov qword ptr [uiClock1], rax
0000000180003F1A mov r8, qword ptr [iCount]
0000000180003F1F mov edx, dword ptr [iValue]
0000000180003F23 mov rcx, qword ptr [pvDest]
0000000180003F28 call memset (18000B1A2h)
0000000180003F2D rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit - Release Binary codes ]
...
0000000140003A8C rdtsc
0000000140003A8E shl rdx, 20h
0000000140003A92 lea rcx, [rbp+10h]
0000000140003A96 or rax, rdx
0000000140003A99 xor edx, edx
0000000140003A9B lea r8d, [rdx+70h]
0000000140003A9F mov qword ptr [rbp], rax
0000000140003AA3 call 000000014000BBC0
0000000140003AA8 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit - Debug Binary codes ]
...
000000013FBF291C rdtsc
000000013FBF291E shl rdx, 20h
000000013FBF2922 or rax, rdx
000000013FBF2925 mov qword ptr [rbp+8], rax
000000013FBF2929 mov byte ptr [rbp], 1
000000013FBF292D mov rax, qword ptr [rbp+8]
000000013FBF2931 mov qword ptr [uiClock1], rax
000000013FBF2935 mov rax, qword ptr [pvDest]
000000013FBF2939 mov edx, dword ptr [iValue]
000000013FBF293C mov rcx, qword ptr [iCount]
000000013FBF2940 mov qword ptr [rbp+40h], rcx
000000013FBF2944 mov rcx, rax
000000013FBF2947 mov rax, qword ptr [rbp+40h]
000000013FBF294B mov r8, rax
000000013FBF294E call memset (13FC08BE0h)
000000013FBF2953 mov qword ptr [rbp+18h], rax
000000013FBF2957 rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit - Release Binary codes ]
...
000000013F683552 rdtsc
000000013F683554 shl rdx, 20h
000000013F683558 or rax, rdx
000000013F68355B mov qword ptr [rbp+1D0h], rax
000000013F683562 vmovups xmmword ptr [r13+10h], xmm6
000000013F683568 vmovups xmmword ptr [r13+20h], xmm6
000000013F68356E vmovups xmmword ptr [r13+30h], xmm6
000000013F683574 vmovups xmmword ptr [r13+40h], xmm6
000000013F68357A vmovups xmmword ptr [r13+50h], xmm6
000000013F683580 vmovups xmmword ptr [r13+60h], xmm6
000000013F683586 vmovups xmmword ptr [r13], xmm6
000000013F68358C rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 64-bit - Debug Binary codes ]
...
000000000040728C call 0000000000407260
0000000000407291 mov qword ptr [rbp-8], rax
0000000000407295 mov rdx, qword ptr [rbp+20h]
0000000000407299 mov eax, dword ptr [rbp+18h]
000000000040729C mov r8, rdx
000000000040729F mov edx, eax
00000000004072A1 mov rcx, qword ptr [rbp+10h]
00000000004072A5 call 0000000000406DC8
00000000004072AA call 0000000000407260
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 64-bit - Release Binary codes ]
...
0000000000402DD6 rdtsc
0000000000402DD8 shl rdx, 20h
0000000000402DDC or rax, rdx
0000000000402DDF mov qword ptr [rbp+38h], rax
0000000000402DE3 mov rdx, qword ptr [rbp+38h]
0000000000402DE7 mov rcx, qword ptr [rbp+30h]
0000000000402DEB sub edx, ecx
0000000000402DED lea rcx, [40B938h]
0000000000402DF4 call 0000000000407690
0000000000402DFB rdtsc
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Debug ) - Summary - 32-bit Windows XP SP3 ]
Microsoft C++ compiler ( VS2005 PE ) 32-bit
...
[ CrtMemset ] - Executed in 316 clock cycles
[ CrtMemset ] - Executed in 424 clock cycles
[ CrtMemset ] - Executed in 388 clock cycles
[ CrtMemset ] - Executed in 344 clock cycles
[ CrtMemset ] - Executed in 336 clock cycles
[ CrtMemset ] - Executed in 336 clock cycles
[ CrtMemset ] - Executed in 336 clock cycles
[ CrtMemset ] - Executed in 372 clock cycles
[ CrtMemset ] - Executed in 336 clock cycles
[ CrtMemset ] - Executed in 332 clock cycles
...
Borland C++ compiler v5.5.1 32-bit
...
[ CrtMemset ] - Executed in 536 clock cycles
[ CrtMemset ] - Executed in 188 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 188 clock cycles
[ CrtMemset ] - Executed in 188 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 188 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 188 clock cycles
...
Intel C++ compiler v12.1.7 ( u371 ) 32-bit
...
[ CrtMemset ] - Executed in 344 clock cycles
[ CrtMemset ] - Executed in 296 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 184 clock cycles
[ CrtMemset ] - Executed in 180 clock cycles
[ CrtMemset ] - Executed in 172 clock cycles
...
MinGW C++ compiler v5.1.0 32-bit
...
[ CrtMemset ] - Executed in 728 clock cycles
[ CrtMemset ] - Executed in 412 clock cycles
[ CrtMemset ] - Executed in 324 clock cycles
[ CrtMemset ] - Executed in 320 clock cycles
[ CrtMemset ] - Executed in 328 clock cycles
[ CrtMemset ] - Executed in 320 clock cycles
[ CrtMemset ] - Executed in 324 clock cycles
[ CrtMemset ] - Executed in 324 clock cycles
[ CrtMemset ] - Executed in 320 clock cycles
[ CrtMemset ] - Executed in 328 clock cycles
...
Watcom C++ compiler v2.0.0 32-bit
...
[ CrtMemset ] - Executed in 784 clock cycles
[ CrtMemset ] - Executed in 268 clock cycles
[ CrtMemset ] - Executed in 260 clock cycles
[ CrtMemset ] - Executed in 264 clock cycles
[ CrtMemset ] - Executed in 264 clock cycles
[ CrtMemset ] - Executed in 464 clock cycles
[ CrtMemset ] - Executed in 256 clock cycles
[ CrtMemset ] - Executed in 264 clock cycles
[ CrtMemset ] - Executed in 264 clock cycles
[ CrtMemset ] - Executed in 260 clock cycles
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluation ( Debug ) - Final Results - 32-bit Windows XP SP3 ]
Average is 208 clock cycles - Intel C++ compiler v12.1.7 ( u371 ) 32-bit
Average is 221 clock cycles - Borland C++ compiler v5.5.1 32-bit
Average is 335 clock cycles - Watcom C++ compiler v2.0.0 32-bit
Average is 352 clock cycles - Microsoft C++ compiler ( VS2005 PE ) 32-bit
Average is 373 clock cycles - MinGW C++ compiler v5.1.0 32-bit

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page