Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Loop Versioning in Intel compiler

zakaria-bendifallah
772 Views
Hi all,
I have a question about the loop versioning in the Intel compiler.
Well, i compiled the SPEC 2006 with ICC -g3 and for one of the
codes (the bwaves code to be more specific) there where a
loop versioning and the generated code was :

=================================================================
404802: 48 f7 c2 0f 00 00 00 test $0xf,%rdx
404809: 0f 84 71 00 00 00 je 404880
40480f: 90 nop
404810: f2 42 0f 10 1c c3 movsd (%rbx,%r8,8),%xmm3
404816: f2 42 0f 10 64 c3 10 movsd 0x10(%rbx,%r8,8),%xmm4
40481d: 66 42 0f 16 5c c3 08 movhpd 0x8(%rbx,%r8,8),%xmm3
404824: 66 42 0f 16 64 c3 18 movhpd 0x18(%rbx,%r8,8),%xmm4
40482b: 66 43 0f 59 1c c1 mulpd (%r9,%r8,8),%xmm3
404831: 66 43 0f 59 64 c1 10 mulpd 0x10(%r9,%r8,8),%xmm4
404838: 66 0f 58 d3 addpd %xmm3,%xmm2
40483c: 66 0f 58 cc addpd %xmm4,%xmm1
404840: f2 42 0f 10 6c c3 20 movsd 0x20(%rbx,%r8,8),%xmm5
404847: f2 42 0f 10 74 c3 30 movsd 0x30(%rbx,%r8,8),%xmm6
40484e: 66 42 0f 16 6c c3 28 movhpd 0x28(%rbx,%r8,8),%xmm5
404855: 66 42 0f 16 74 c3 38 movhpd 0x38(%rbx,%r8,8),%xmm6
40485c: 66 43 0f 59 6c c1 20 mulpd 0x20(%r9,%r8,8),%xmm5
404863: 66 43 0f 59 74 c1 30 mulpd 0x30(%r9,%r8,8),%xmm6
40486a: 66 0f 58 d5 addpd %xmm5,%xmm2
40486e: 66 0f 58 ce addpd %xmm6,%xmm1
404872: 49 83 c0 08 add $0x8,%r8
404876: 4c 3b c1 cmp %rcx,%r8
404879: 72 95 jb 404810
40487b: eb 4e jmp 4048cb
40487d: 48 89 f6 mov %rsi,%rsi
404880: 42 0f 28 1c c3 movaps (%rbx,%r8,8),%xmm3
404885: 42 0f 28 64 c3 10 movaps 0x10(%rbx,%r8,8),%xmm4
40488b: 66 43 0f 59 1c c1 mulpd (%r9,%r8,8),%xmm3
404891: 66 43 0f 59 64 c1 10 mulpd 0x10(%r9,%r8,8),%xmm4
404898: 66 0f 58 d3 addpd %xmm3,%xmm2
40489c: 66 0f 58 cc addpd %xmm4,%xmm1
4048a0: 42 0f 28 6c c3 20 movaps 0x20(%rbx,%r8,8),%xmm5
4048a6: 42 0f 28 74 c3 30 movaps 0x30(%rbx,%r8,8),%xmm6
4048ac: 66 43 0f 59 6c c1 20 mulpd 0x20(%r9,%r8,8),%xmm5
4048b3: 66 43 0f 59 74 c1 30 mulpd 0x30(%r9,%r8,8),%xmm6
4048ba: 66 0f 58 d5 addpd %xmm5,%xmm2
4048be: 66 0f 58 ce addpd %xmm6,%xmm1
4048c2: 49 83 c0 08 add $0x8,%r8
4048c6: 4c 3b c1 cmp %rcx,%r8
4048c9: 72 b5 jb 404880
4048cb: 49 3b cc cmp %r12,%rcx
4048ce: 0f 83 64 14 00 00 jae 405d38
4048d4: f2 0f 10 1c cb movsd (%rbx,%rcx,8),%xmm3
=========================================================================
the first version starts at @ 40480f
the second at @ 404880

So two strange line where there the :
40480f a nop
40487d a mov %rsi,%rsi

so the jump to the second version of the loop is done to the
404880 and not the mov instr 40487d !.
So, i would like to know why these two instructions were generated
and why the second one is a mov and not a nop ?

thanks in advence for your answers :)
0 Kudos
2 Replies
Patrick_F_Intel1
Employee
772 Views
Hello Zakaria,
I see you've already posted this message to the Intel C compiler forum http://software.intel.com/en-us/forums/intel-c-compiler/.
That is the approriate forum.
Pat
0 Kudos
SergeyKostrov
Valued Contributor II
772 Views
I know that my post is too late...

...
404809: 0f 84 71 00 00 00 je 404880
40480F: 90 nop
404810: f2 42 0f 10 1c c3 movsd (%rbx,%r8,8),%xmm3
...

It seems to menop instruction simplyprovides alignment for movsd instruction. Try to divide 40480F by 4!

40480F(base16) = 4212751(base10)

4212751 / 4= 1053187.75
0 Kudos
Reply