difference in CPU time for two similar lines, thanks

utrbcutrbc · ‎07-07-2011

Hi all. There is a while loop in my program, where IterZNext, IterZ are pointers to nodes in a list. The nodes in the list are of type struct with a field called "Index".

[cpp]double xx = 20.0;
double yy = 10000.0;
double zz;		
while (IterZNext!=NULL && NextIndex<=NewIndex)
{
	IterZ=IterZNext;
	IterZNext = IterZ->Next;
	if (IterZNext!=NULL)
	{
		zz = xx + yy;
                NextIndex1 = IterZNext->Index; // line (*)
		NextIndex = IterZNext->Index;  // line (**)
		IterZNext->Index;
	}
}[/cpp]

When I profiled my program, I found the line (*)

[cpp]NextIndex1 = IterZNext->Index;[/cpp]

consumes most of CPU time (2.193s), while the line (**)

[cpp]NextIndex = IterZNext->Index;
[/cpp]

which is all most the same with the line (*) only uses 0.093s. I used the Intel VTune Amplifier to see the assembly of these two lines, which is as follows:

[bash]Address	Line	Assembly	               CPU Time Instructions Retired
Line (*):
0x1666	561	mov eax, dword ptr [ebp-0x44]	0.015s	50,000,000
0x1669	561	mov ecx, dword ptr [eax+0x8]		
0x166c	561	mov dword ptr [ebp-0x68], ecx	2.178s	1,614,000,000

Line (**):
0x166f	562	mov byte ptr [ebp-0x155], 0x1	0.039s	80,000,000
0x1676	562	mov eax, dword ptr [ebp-0x44]	0.027s	44,000,000
0x1679	562	mov ecx, dword ptr [eax+0x8]		
0x167c	562	mov dword ptr [ebp-0x5c], ecx	0.026s	94,000,000[/bash]

If I change the order of the line (*) and the line (**), then the program changes to

[cpp]double xx = 20.0;
double yy = 10000.0;
double zz;		
while (IterZNext!=NULL && NextIndex<=NewIndex)
{
	IterZ=IterZNext;
	IterZNext = IterZ->Next;
	if (IterZNext!=NULL)
	{
		zz = xx + yy;
                NextIndex = IterZNext->Index;  // line (**)
                NextIndex1 = IterZNext->Index; // line (*)
		IterZNext->Index;
	}
}[/cpp]

and the result for assembly changes to

[bash]Address	Line	Assembly	CPU Time	Instructions Retired
Line (**):
0x1666	560	mov byte ptr [ebp-0x155], 0x1	0.044s	84,000,000
0x166d	560	mov eax, dword ptr [ebp-0x44]	0.006s	2,000,000
0x1670	560	mov ecx, dword ptr [eax+0x8]	0.001s	4,000,000
0x1673	560	mov dword ptr [ebp-0x5c], ecx	1.193s	1,536,000,000

Line (*):
0x1676	561	mov eax, dword ptr [ebp-0x44]	0.052s	128,000,000
0x1679	561	mov ecx, dword ptr [eax+0x8]		
0x167c	561	mov dword ptr [ebp-0x68], ecx	0.034s	112,000,000[/bash]

In this case, line (**) uses most of CPU time (1.245s) while line (*) only uses 0.086s.

Could someone tell me:
(1) Why does it take so long to make the first assignment? Notice that the line zz=xx+yy only uses 0.058s. Is this related to the cache misses? since all nodes in the list are dynamically genereated.
(2) Why is there huge difference in CPU time between this two lines?

Kirill_R_Intel · ‎07-15-2011

Hi,

This operation is memory write. As it takes so long, it's likely not cached. The second timethe memoryline seems to bealready in cache so write works faster. Cache misses is good thing to monitor, as well as other cache and memory events to get more details about the case. It also maybe interesting to see how it behaves on diffefent compilers, e.g. Intel Compiler and Visual Studio.

sun__lei · ‎10-22-2019

Kirill R. (Intel) wrote:
Hi, I have post a new question "confusing result" in this forum. Could you help me?
This operation is memory write. As it takes so long, it's likely not cached. The second timethe memoryline seems to bealready in cache so write works faster. Cache misses is good thing to monitor, as well as other cache and memory events to get more details about the case. It also maybe interesting to see how it behaves on diffefent compilers, e.g. Intel Compiler and Visual Studio.