- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all. There is a while loop in my program, where IterZNext, IterZ are
pointers to nodes in a list. The nodes in the list are of type struct
with a field called "Index".
When I profiled my program, I found the line (*)
and the result for assembly changes to
Could someone tell me:
(1) Why does it take so long to make the first assignment? Notice that the line zz=xx+yy only uses 0.058s. Is this related to the cache misses? since all nodes in the list are dynamically genereated.
(2) Why is there huge difference in CPU time between this two lines?
[cpp]double xx = 20.0; double yy = 10000.0; double zz; while (IterZNext!=NULL && NextIndex<=NewIndex) { IterZ=IterZNext; IterZNext = IterZ->Next; if (IterZNext!=NULL) { zz = xx + yy; NextIndex1 = IterZNext->Index; // line (*) NextIndex = IterZNext->Index; // line (**) IterZNext->Index; } }[/cpp]
When I profiled my program, I found the line (*)
[cpp]NextIndex1 = IterZNext->Index;[/cpp]consumes most of CPU time (2.193s), while the line (**)
[cpp]NextIndex = IterZNext->Index;which is all most the same with the line (*) only uses 0.093s. I used the Intel VTune Amplifier to see the assembly of these two lines, which is as follows:
[/cpp]
[bash]Address Line Assembly CPU Time Instructions Retired Line (*): 0x1666 561 mov eax, dword ptr [ebp-0x44] 0.015s 50,000,000 0x1669 561 mov ecx, dword ptr [eax+0x8] 0x166c 561 mov dword ptr [ebp-0x68], ecx 2.178s 1,614,000,000 Line (**): 0x166f 562 mov byte ptr [ebp-0x155], 0x1 0.039s 80,000,000 0x1676 562 mov eax, dword ptr [ebp-0x44] 0.027s 44,000,000 0x1679 562 mov ecx, dword ptr [eax+0x8] 0x167c 562 mov dword ptr [ebp-0x5c], ecx 0.026s 94,000,000[/bash]If I change the order of the line (*) and the line (**), then the program changes to
[cpp]double xx = 20.0; double yy = 10000.0; double zz; while (IterZNext!=NULL && NextIndex<=NewIndex) { IterZ=IterZNext; IterZNext = IterZ->Next; if (IterZNext!=NULL) { zz = xx + yy; NextIndex = IterZNext->Index; // line (**) NextIndex1 = IterZNext->Index; // line (*) IterZNext->Index; } }[/cpp]
and the result for assembly changes to
[bash]Address Line Assembly CPU Time Instructions Retired Line (**): 0x1666 560 mov byte ptr [ebp-0x155], 0x1 0.044s 84,000,000 0x166d 560 mov eax, dword ptr [ebp-0x44] 0.006s 2,000,000 0x1670 560 mov ecx, dword ptr [eax+0x8] 0.001s 4,000,000 0x1673 560 mov dword ptr [ebp-0x5c], ecx 1.193s 1,536,000,000 Line (*): 0x1676 561 mov eax, dword ptr [ebp-0x44] 0.052s 128,000,000 0x1679 561 mov ecx, dword ptr [eax+0x8] 0x167c 561 mov dword ptr [ebp-0x68], ecx 0.034s 112,000,000[/bash]In this case, line (**) uses most of CPU time (1.245s) while line (*) only uses 0.086s.
Could someone tell me:
(1) Why does it take so long to make the first assignment? Notice that the line zz=xx+yy only uses 0.058s. Is this related to the cache misses? since all nodes in the list are dynamically genereated.
(2) Why is there huge difference in CPU time between this two lines?
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This operation is memory write. As it takes so long, it's likely not cached. The second timethe memoryline seems to bealready in cache so write works faster. Cache misses is good thing to monitor, as well as other cache and memory events to get more details about the case. It also maybe interesting to see how it behaves on diffefent compilers, e.g. Intel Compiler and Visual Studio.
This operation is memory write. As it takes so long, it's likely not cached. The second timethe memoryline seems to bealready in cache so write works faster. Cache misses is good thing to monitor, as well as other cache and memory events to get more details about the case. It also maybe interesting to see how it behaves on diffefent compilers, e.g. Intel Compiler and Visual Studio.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kirill R. (Intel) wrote:Hi, I have post a new question "confusing result" in this forum. Could you help me?
This operation is memory write. As it takes so long, it's likely not cached. The second timethe memoryline seems to bealready in cache so write works faster. Cache misses is good thing to monitor, as well as other cache and memory events to get more details about the case. It also maybe interesting to see how it behaves on diffefent compilers, e.g. Intel Compiler and Visual Studio.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page