Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Loop Versioning in Intel compiler

zakaria-bendifallah
1,264 Views
Hi all,
I have a question about the loop versioning in the Intel compiler.
Well, i compiled the SPEC 2006 with ICC -g3 and for one of the
codes (the bwaves code to be more specific) there where a
loop versioning and the generated code was :

=================================================================
404802: 48 f7 c2 0f 00 00 00 test $0xf,%rdx
404809: 0f 84 71 00 00 00 je 404880
40480f: 90 nop
404810: f2 42 0f 10 1c c3 movsd (%rbx,%r8,8),%xmm3
404816: f2 42 0f 10 64 c3 10 movsd 0x10(%rbx,%r8,8),%xmm4
40481d: 66 42 0f 16 5c c3 08 movhpd 0x8(%rbx,%r8,8),%xmm3
404824: 66 42 0f 16 64 c3 18 movhpd 0x18(%rbx,%r8,8),%xmm4
40482b: 66 43 0f 59 1c c1 mulpd (%r9,%r8,8),%xmm3
404831: 66 43 0f 59 64 c1 10 mulpd 0x10(%r9,%r8,8),%xmm4
404838: 66 0f 58 d3 addpd %xmm3,%xmm2
40483c: 66 0f 58 cc addpd %xmm4,%xmm1
404840: f2 42 0f 10 6c c3 20 movsd 0x20(%rbx,%r8,8),%xmm5
404847: f2 42 0f 10 74 c3 30 movsd 0x30(%rbx,%r8,8),%xmm6
40484e: 66 42 0f 16 6c c3 28 movhpd 0x28(%rbx,%r8,8),%xmm5
404855: 66 42 0f 16 74 c3 38 movhpd 0x38(%rbx,%r8,8),%xmm6
40485c: 66 43 0f 59 6c c1 20 mulpd 0x20(%r9,%r8,8),%xmm5
404863: 66 43 0f 59 74 c1 30 mulpd 0x30(%r9,%r8,8),%xmm6
40486a: 66 0f 58 d5 addpd %xmm5,%xmm2
40486e: 66 0f 58 ce addpd %xmm6,%xmm1
404872: 49 83 c0 08 add $0x8,%r8
404876: 4c 3b c1 cmp %rcx,%r8
404879: 72 95 jb 404810
40487b: eb 4e jmp 4048cb
40487d: 48 89 f6 mov %rsi,%rsi
404880: 42 0f 28 1c c3 movaps (%rbx,%r8,8),%xmm3
404885: 42 0f 28 64 c3 10 movaps 0x10(%rbx,%r8,8),%xmm4
40488b: 66 43 0f 59 1c c1 mulpd (%r9,%r8,8),%xmm3
404891: 66 43 0f 59 64 c1 10 mulpd 0x10(%r9,%r8,8),%xmm4
404898: 66 0f 58 d3 addpd %xmm3,%xmm2
40489c: 66 0f 58 cc addpd %xmm4,%xmm1
4048a0: 42 0f 28 6c c3 20 movaps 0x20(%rbx,%r8,8),%xmm5
4048a6: 42 0f 28 74 c3 30 movaps 0x30(%rbx,%r8,8),%xmm6
4048ac: 66 43 0f 59 6c c1 20 mulpd 0x20(%r9,%r8,8),%xmm5
4048b3: 66 43 0f 59 74 c1 30 mulpd 0x30(%r9,%r8,8),%xmm6
4048ba: 66 0f 58 d5 addpd %xmm5,%xmm2
4048be: 66 0f 58 ce addpd %xmm6,%xmm1
4048c2: 49 83 c0 08 add $0x8,%r8
4048c6: 4c 3b c1 cmp %rcx,%r8
4048c9: 72 b5 jb 404880
4048cb: 49 3b cc cmp %r12,%rcx
4048ce: 0f 83 64 14 00 00 jae 405d38
4048d4: f2 0f 10 1c cb movsd (%rbx,%rcx,8),%xmm3
=========================================================================
the first version starts at @ 40480f
the second at @ 404880

So two strange line where there the :
40480f a nop
40487d a mov %rsi,%rsi

so the jump to the second version of the loop is done to the
404880 and not the mov instr 40487d !.
So, i would like to know why these two instructions were generated
and why the second one is a mov and not a nop ?

thanks in advence for your answers :)
0 Kudos
2 Replies
Patrick_F_Intel1
Employee
1,264 Views
Hello Zakaria,
I see you've already posted this message to the Intel C compiler forum http://software.intel.com/en-us/forums/intel-c-compiler/.
That is the approriate forum.
Pat
0 Kudos
SergeyKostrov
Valued Contributor II
1,264 Views
I know that my post is too late...

...
404809: 0f 84 71 00 00 00 je 404880
40480F: 90 nop
404810: f2 42 0f 10 1c c3 movsd (%rbx,%r8,8),%xmm3
...

It seems to menop instruction simplyprovides alignment for movsd instruction. Try to divide 40480F by 4!

40480F(base16) = 4212751(base10)

4212751 / 4= 1053187.75
0 Kudos
Reply