<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic 0000000000405410 &amp;lt;__intel in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975802#M5614</link>
    <description>0000000000405410 &amp;lt;__intel_ssse3_rep_memcpy&amp;gt;:
  405410:	48 89 f8             	mov    %rdi,%rax
  405413:	48 81 fa 90 00 00 00 	cmp    $0x90,%rdx
  40541a:	73 34                	jae    405450 &amp;lt;__intel_ssse3_rep_memcpy+0x40&amp;gt;
  40541c:	40 38 fe             	cmp    %dil,%sil
  40541f:	76 19                	jbe    40543a &amp;lt;__intel_ssse3_rep_memcpy+0x2a&amp;gt;
  405421:	48 01 d6             	add    %rdx,%rsi
  405424:	48 01 d7             	add    %rdx,%rdi
  405427:	4c 8d 1d 4a 3c 00 00 	lea    0x3c4a(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  40542e:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405432:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405436:	ff e2                	jmpq   *%rdx
  405438:	0f 0b                	ud2    
  40543a:	4c 8d 1d f7 39 00 00 	lea    0x39f7(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  405441:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405445:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405449:	ff e2                	jmpq   *%rdx
  40544b:	0f 0b                	ud2    
  40544d:	0f 1f 00             	nopl   (%rax)
  405450:	40 38 fe             	cmp    %dil,%sil
  405453:	7e 5b                	jle    4054b0 &amp;lt;__intel_ssse3_rep_memcpy+0xa0&amp;gt;
  405455:	f3 0f 6f 06          	movdqu (%rsi),%xmm0
  405459:	49 89 f8             	mov    %rdi,%r8
  40545c:	48 83 e7 f0          	and    $0xfffffffffffffff0,%rdi
  405460:	48 83 c7 10          	add    $0x10,%rdi
  405464:	49 89 f9             	mov    %rdi,%r9
  405467:	4d 29 c1             	sub    %r8,%r9
  40546a:	4c 29 ca             	sub    %r9,%rdx
  40546d:	4c 01 ce             	add    %r9,%rsi
  405470:	49 89 f1             	mov    %rsi,%r9
  405473:	49 83 e1 0f          	and    $0xf,%r9
  405477:	0f 84 93 00 00 00    	je     405510 &amp;lt;__intel_ssse3_rep_memcpy+0x100&amp;gt;
  40547d:	8b 0d 85 56 20 00    	mov    0x205685(%rip),%ecx        # 60ab08 &amp;lt;__libirc_data_cache_size&amp;gt;
  405483:	48 39 ca             	cmp    %rcx,%rdx
  405486:	0f 83 24 18 00 00    	jae    406cb0 &amp;lt;__intel_ssse3_rep_memcpy+0x18a0&amp;gt;
  40548c:	4c 8d 1d 25 3e 00 00 	lea    0x3e25(%rip),%r11        # 4092b8 &amp;lt;.L_2il0floatpacket.29+0x64c&amp;gt;
  405493:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  40549a:	4f 63 0c 8b          	movslq (%r11,%r9,4),%r9
  40549e:	4d 01 d9             	add    %r11,%r9
  4054a1:	41 ff e1             	jmpq   *%r9
  4054a4:	0f 0b                	ud2    
  4054a6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4054ad:	00 00 00 
  4054b0:	8b 0d 52 56 20 00    	mov    0x205652(%rip),%ecx        # 60ab08 &amp;lt;__libirc_data_cache_size&amp;gt;
  4054b6:	48 d1 e1             	shl    %rcx
  4054b9:	48 39 ca             	cmp    %rcx,%rdx
  4054bc:	0f 87 7e 19 00 00    	ja     406e40 &amp;lt;__intel_ssse3_rep_memcpy+0x1a30&amp;gt;
  4054c2:	48 01 d7             	add    %rdx,%rdi
  4054c5:	48 01 d6             	add    %rdx,%rsi
  4054c8:	f3 0f 6f 46 f0       	movdqu -0x10(%rsi),%xmm0
  4054cd:	4c 8d 47 f0          	lea    -0x10(%rdi),%r8
  4054d1:	49 89 f9             	mov    %rdi,%r9
  4054d4:	49 83 e1 0f          	and    $0xf,%r9
  4054d8:	4c 31 cf             	xor    %r9,%rdi
  4054db:	4c 29 ce             	sub    %r9,%rsi
  4054de:	4c 29 ca             	sub    %r9,%rdx
  4054e1:	49 89 f1             	mov    %rsi,%r9
  4054e4:	49 83 e1 0f          	and    $0xf,%r9
  4054e8:	0f 84 c2 00 00 00    	je     4055b0 &amp;lt;__intel_ssse3_rep_memcpy+0x1a0&amp;gt;
  4054ee:	4c 8d 1d 03 3e 00 00 	lea    0x3e03(%rip),%r11        # 4092f8 &amp;lt;.L_2il0floatpacket.29+0x68c&amp;gt;
  4054f5:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4054fc:	4f 63 0c 8b          	movslq (%r11,%r9,4),%r9
  405500:	4d 01 d9             	add    %r11,%r9
  405503:	41 ff e1             	jmpq   *%r9
  405506:	0f 0b                	ud2    
  405508:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40550f:	00 
  405510:	49 89 d1             	mov    %rdx,%r9
  405513:	49 c1 e9 08          	shr    $0x8,%r9
  405517:	49 01 d1             	add    %rdx,%r9
  40551a:	8b 0d ec 55 20 00    	mov    0x2055ec(%rip),%ecx        # 60ab0c &amp;lt;__libirc_data_cache_size_half&amp;gt;
  405520:	49 39 c9             	cmp    %rcx,%r9
  405523:	0f 83 87 17 00 00    	jae    406cb0 &amp;lt;__intel_ssse3_rep_memcpy+0x18a0&amp;gt;
  405529:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405530:	66 0f 6f 0e          	movdqa (%rsi),%xmm1
  405534:	66 0f 7f 0f          	movdqa %xmm1,(%rdi)
  405538:	0f 28 56 10          	movaps 0x10(%rsi),%xmm2
  40553c:	0f 29 57 10          	movaps %xmm2,0x10(%rdi)
  405540:	0f 28 5e 20          	movaps 0x20(%rsi),%xmm3
  405544:	0f 29 5f 20          	movaps %xmm3,0x20(%rdi)
  405548:	0f 28 66 30          	movaps 0x30(%rsi),%xmm4
  40554c:	0f 29 67 30          	movaps %xmm4,0x30(%rdi)
  405550:	0f 28 4e 40          	movaps 0x40(%rsi),%xmm1
  405554:	0f 29 4f 40          	movaps %xmm1,0x40(%rdi)
  405558:	0f 28 56 50          	movaps 0x50(%rsi),%xmm2
  40555c:	0f 29 57 50          	movaps %xmm2,0x50(%rdi)
  405560:	0f 28 5e 60          	movaps 0x60(%rsi),%xmm3
  405564:	0f 29 5f 60          	movaps %xmm3,0x60(%rdi)
  405568:	0f 28 66 70          	movaps 0x70(%rsi),%xmm4
  40556c:	0f 29 67 70          	movaps %xmm4,0x70(%rdi)
  405570:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405577:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  40557e:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  405585:	73 a9                	jae    405530 &amp;lt;__intel_ssse3_rep_memcpy+0x120&amp;gt;
  405587:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  40558c:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405593:	48 01 d6             	add    %rdx,%rsi
  405596:	48 01 d7             	add    %rdx,%rdi
  405599:	4c 8d 1d d8 3a 00 00 	lea    0x3ad8(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  4055a0:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  4055a4:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4055a8:	ff e2                	jmpq   *%rdx
  4055aa:	0f 0b                	ud2    
  4055ac:	0f 1f 40 00          	nopl   0x0(%rax)
  4055b0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4055b7:	0f 28 4e f0          	movaps -0x10(%rsi),%xmm1
  4055bb:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  4055bf:	0f 28 56 e0          	movaps -0x20(%rsi),%xmm2
  4055c3:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  4055c7:	0f 28 5e d0          	movaps -0x30(%rsi),%xmm3
  4055cb:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  4055cf:	0f 28 66 c0          	movaps -0x40(%rsi),%xmm4
  4055d3:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  4055d7:	0f 28 6e b0          	movaps -0x50(%rsi),%xmm5
  4055db:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  4055df:	0f 28 6e a0          	movaps -0x60(%rsi),%xmm5
  4055e3:	0f 29 6f a0          	movaps %xmm5,-0x60(%rdi)
  4055e7:	0f 28 6e 90          	movaps -0x70(%rsi),%xmm5
  4055eb:	0f 29 6f 90          	movaps %xmm5,-0x70(%rdi)
  4055ef:	0f 28 6e 80          	movaps -0x80(%rsi),%xmm5
  4055f3:	0f 29 6f 80          	movaps %xmm5,-0x80(%rdi)
  4055f7:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4055fe:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  405602:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  405606:	73 af                	jae    4055b7 &amp;lt;__intel_ssse3_rep_memcpy+0x1a7&amp;gt;
  405608:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  40560d:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405614:	48 29 d7             	sub    %rdx,%rdi
  405617:	48 29 d6             	sub    %rdx,%rsi
  40561a:	4c 8d 1d 17 38 00 00 	lea    0x3817(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  405621:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405625:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405629:	ff e2                	jmpq   *%rdx
  40562b:	0f 0b                	ud2    
  40562d:	0f 1f 00             	nopl   (%rax)
  405630:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405637:	0f 28 4e ff          	movaps -0x1(%rsi),%xmm1
  40563b:	0f 28 56 0f          	movaps 0xf(%rsi),%xmm2
  40563f:	0f 28 5e 1f          	movaps 0x1f(%rsi),%xmm3
  405643:	0f 28 66 2f          	movaps 0x2f(%rsi),%xmm4
  405647:	0f 28 6e 3f          	movaps 0x3f(%rsi),%xmm5
  40564b:	0f 28 76 4f          	movaps 0x4f(%rsi),%xmm6
  40564f:	0f 28 7e 5f          	movaps 0x5f(%rsi),%xmm7
  405653:	44 0f 28 46 6f       	movaps 0x6f(%rsi),%xmm8
  405658:	44 0f 28 4e 7f       	movaps 0x7f(%rsi),%xmm9
  40565d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  405664:	66 45 0f 3a 0f c8 01 	palignr $0x1,%xmm8,%xmm9
  40566b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  405670:	66 44 0f 3a 0f c7 01 	palignr $0x1,%xmm7,%xmm8
  405677:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  40567c:	66 0f 3a 0f fe 01    	palignr $0x1,%xmm6,%xmm7
  405682:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405686:	66 0f 3a 0f f5 01    	palignr $0x1,%xmm5,%xmm6
  40568c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405690:	66 0f 3a 0f ec 01    	palignr $0x1,%xmm4,%xmm5
  405696:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40569a:	66 0f 3a 0f e3 01    	palignr $0x1,%xmm3,%xmm4
  4056a0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  4056a4:	66 0f 3a 0f da 01    	palignr $0x1,%xmm2,%xmm3
  4056aa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  4056ae:	66 0f 3a 0f d1 01    	palignr $0x1,%xmm1,%xmm2
  4056b4:	0f 29 17             	movaps %xmm2,(%rdi)
  4056b7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  4056be:	0f 83 6c ff ff ff    	jae    405630 &amp;lt;__intel_ssse3_rep_memcpy+0x220&amp;gt;
  4056c4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  4056c9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  4056d0:	48 01 d7             	add    %rdx,%rdi
  4056d3:	48 01 d6             	add    %rdx,%rsi
  4056d6:	4c 8d 1d 9b 39 00 00 	lea    0x399b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  4056dd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  4056e1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4056e5:	ff e2                	jmpq   *%rdx
  4056e7:	0f 0b                	ud2    
  4056e9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4056f0:	0f 28 4e ff          	movaps -0x1(%rsi),%xmm1
  4056f4:	0f 28 56 ef          	movaps -0x11(%rsi),%xmm2
  4056f8:	66 0f 3a 0f ca 01    	palignr $0x1,%xmm2,%xmm1
  4056fe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  405702:	0f 28 5e df          	movaps -0x21(%rsi),%xmm3
  405706:	66 0f 3a 0f d3 01    	palignr $0x1,%xmm3,%xmm2
  40570c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  405710:	0f 28 66 cf          	movaps -0x31(%rsi),%xmm4
  405714:	66 0f 3a 0f dc 01    	palignr $0x1,%xmm4,%xmm3
  40571a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40571e:	0f 28 6e bf          	movaps -0x41(%rsi),%xmm5
  405722:	66 0f 3a 0f e5 01    	palignr $0x1,%xmm5,%xmm4
  405728:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  40572c:	0f 28 76 af          	movaps -0x51(%rsi),%xmm6
  405730:	66 0f 3a 0f ee 01    	palignr $0x1,%xmm6,%xmm5
  405736:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  40573a:	0f 28 7e 9f          	movaps -0x61(%rsi),%xmm7
  40573e:	66 0f 3a 0f f7 01    	palignr $0x1,%xmm7,%xmm6
  405744:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  405748:	44 0f 28 46 8f       	movaps -0x71(%rsi),%xmm8
  40574d:	66 41 0f 3a 0f f8 01 	palignr $0x1,%xmm8,%xmm7
  405754:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  405758:	44 0f 28 8e 7f ff ff 	movaps -0x81(%rsi),%xmm9
  40575f:	ff 
  405760:	66 45 0f 3a 0f c1 01 	palignr $0x1,%xmm9,%xmm8
  405767:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  40576c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405773:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  405777:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  40577b:	0f 83 6f ff ff ff    	jae    4056f0 &amp;lt;__intel_ssse3_rep_memcpy+0x2e0&amp;gt;
  405781:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405786:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40578d:	48 29 d7             	sub    %rdx,%rdi
  405790:	48 29 d6             	sub    %rdx,%rsi
  405793:	4c 8d 1d 9e 36 00 00 	lea    0x369e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40579a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40579e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4057a2:	ff e2                	jmpq   *%rdx
  4057a4:	0f 0b                	ud2    
  4057a6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4057ad:	00 00 00 
  4057b0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4057b7:	0f 28 4e fe          	movaps -0x2(%rsi),%xmm1
  4057bb:	0f 28 56 0e          	movaps 0xe(%rsi),%xmm2
  4057bf:	0f 28 5e 1e          	movaps 0x1e(%rsi),%xmm3
  4057c3:	0f 28 66 2e          	movaps 0x2e(%rsi),%xmm4
  4057c7:	0f 28 6e 3e          	movaps 0x3e(%rsi),%xmm5
  4057cb:	0f 28 76 4e          	movaps 0x4e(%rsi),%xmm6
  4057cf:	0f 28 7e 5e          	movaps 0x5e(%rsi),%xmm7
  4057d3:	44 0f 28 46 6e       	movaps 0x6e(%rsi),%xmm8
  4057d8:	44 0f 28 4e 7e       	movaps 0x7e(%rsi),%xmm9
  4057dd:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  4057e4:	66 45 0f 3a 0f c8 02 	palignr $0x2,%xmm8,%xmm9
  4057eb:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  4057f0:	66 44 0f 3a 0f c7 02 	palignr $0x2,%xmm7,%xmm8
  4057f7:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  4057fc:	66 0f 3a 0f fe 02    	palignr $0x2,%xmm6,%xmm7
  405802:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405806:	66 0f 3a 0f f5 02    	palignr $0x2,%xmm5,%xmm6
  40580c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405810:	66 0f 3a 0f ec 02    	palignr $0x2,%xmm4,%xmm5
  405816:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40581a:	66 0f 3a 0f e3 02    	palignr $0x2,%xmm3,%xmm4
  405820:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  405824:	66 0f 3a 0f da 02    	palignr $0x2,%xmm2,%xmm3
  40582a:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  40582e:	66 0f 3a 0f d1 02    	palignr $0x2,%xmm1,%xmm2
  405834:	0f 29 17             	movaps %xmm2,(%rdi)
  405837:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  40583e:	0f 83 6c ff ff ff    	jae    4057b0 &amp;lt;__intel_ssse3_rep_memcpy+0x3a0&amp;gt;
  405844:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405849:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405850:	48 01 d7             	add    %rdx,%rdi
  405853:	48 01 d6             	add    %rdx,%rsi
  405856:	4c 8d 1d 1b 38 00 00 	lea    0x381b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  40585d:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405861:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405865:	ff e2                	jmpq   *%rdx
  405867:	0f 0b                	ud2    
  405869:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  405870:	0f 28 4e fe          	movaps -0x2(%rsi),%xmm1
  405874:	0f 28 56 ee          	movaps -0x12(%rsi),%xmm2
  405878:	66 0f 3a 0f ca 02    	palignr $0x2,%xmm2,%xmm1
  40587e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  405882:	0f 28 5e de          	movaps -0x22(%rsi),%xmm3
  405886:	66 0f 3a 0f d3 02    	palignr $0x2,%xmm3,%xmm2
  40588c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  405890:	0f 28 66 ce          	movaps -0x32(%rsi),%xmm4
  405894:	66 0f 3a 0f dc 02    	palignr $0x2,%xmm4,%xmm3
  40589a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40589e:	0f 28 6e be          	movaps -0x42(%rsi),%xmm5
  4058a2:	66 0f 3a 0f e5 02    	palignr $0x2,%xmm5,%xmm4
  4058a8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  4058ac:	0f 28 76 ae          	movaps -0x52(%rsi),%xmm6
  4058b0:	66 0f 3a 0f ee 02    	palignr $0x2,%xmm6,%xmm5
  4058b6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  4058ba:	0f 28 7e 9e          	movaps -0x62(%rsi),%xmm7
  4058be:	66 0f 3a 0f f7 02    	palignr $0x2,%xmm7,%xmm6
  4058c4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  4058c8:	44 0f 28 46 8e       	movaps -0x72(%rsi),%xmm8
  4058cd:	66 41 0f 3a 0f f8 02 	palignr $0x2,%xmm8,%xmm7
  4058d4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  4058d8:	44 0f 28 8e 7e ff ff 	movaps -0x82(%rsi),%xmm9
  4058df:	ff 
  4058e0:	66 45 0f 3a 0f c1 02 	palignr $0x2,%xmm9,%xmm8
  4058e7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  4058ec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4058f3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  4058f7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  4058fb:	0f 83 6f ff ff ff    	jae    405870 &amp;lt;__intel_ssse3_rep_memcpy+0x460&amp;gt;
  405901:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405906:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40590d:	48 29 d7             	sub    %rdx,%rdi
  405910:	48 29 d6             	sub    %rdx,%rsi
  405913:	4c 8d 1d 1e 35 00 00 	lea    0x351e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40591a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40591e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405922:	ff e2                	jmpq   *%rdx
  405924:	0f 0b                	ud2    
  405926:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40592d:	00 00 00 
  405930:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405937:	0f 28 4e fd          	movaps -0x3(%rsi),%xmm1
  40593b:	0f 28 56 0d          	movaps 0xd(%rsi),%xmm2
  40593f:	0f 28 5e 1d          	movaps 0x1d(%rsi),%xmm3
  405943:	0f 28 66 2d          	movaps 0x2d(%rsi),%xmm4
  405947:	0f 28 6e 3d          	movaps 0x3d(%rsi),%xmm5
  40594b:	0f 28 76 4d          	movaps 0x4d(%rsi),%xmm6
  40594f:	0f 28 7e 5d          	movaps 0x5d(%rsi),%xmm7
  405953:	44 0f 28 46 6d       	movaps 0x6d(%rsi),%xmm8
  405958:	44 0f 28 4e 7d       	movaps 0x7d(%rsi),%xmm9
  40595d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  405964:	66 45 0f 3a 0f c8 03 	palignr $0x3,%xmm8,%xmm9
  40596b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  405970:	66 44 0f 3a 0f c7 03 	palignr $0x3,%xmm7,%xmm8
  405977:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  40597c:	66 0f 3a 0f fe 03    	palignr $0x3,%xmm6,%xmm7
  405982:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405986:	66 0f 3a 0f f5 03    	palignr $0x3,%xmm5,%xmm6
  40598c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405990:	66 0f 3a 0f ec 03    	palignr $0x3,%xmm4,%xmm5
  405996:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40599a:	66 0f 3a 0f e3 03    	palignr $0x3,%xmm3,%xmm4
  4059a0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  4059a4:	66 0f 3a 0f da 03    	palignr $0x3,%xmm2,%xmm3
  4059aa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  4059ae:	66 0f 3a 0f d1 03    	palignr $0x3,%xmm1,%xmm2
  4059b4:	0f 29 17             	movaps %xmm2,(%rdi)
  4059b7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  4059be:	0f 83 6c ff ff ff    	jae    405930 &amp;lt;__intel_ssse3_rep_memcpy+0x520&amp;gt;
  4059c4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  4059c9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  4059d0:	48 01 d7             	add    %rdx,%rdi
  4059d3:	48 01 d6             	add    %rdx,%rsi
  4059d6:	4c 8d 1d 9b 36 00 00 	lea    0x369b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  4059dd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  4059e1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4059e5:	ff e2                	jmpq   *%rdx
  4059e7:	0f 0b                	ud2    
  4059e9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4059f0:	0f 28 4e fd          	movaps -0x3(%rsi),%xmm1
  4059f4:	0f 28 56 ed          	movaps -0x13(%rsi),%xmm2
  4059f8:	66 0f 3a 0f ca 03    	palignr $0x3,%xmm2,%xmm1
  4059fe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  405a02:	0f 28 5e dd          	movaps -0x23(%rsi),%xmm3
  405a06:	66 0f 3a 0f d3 03    	palignr $0x3,%xmm3,%xmm2
  405a0c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  405a10:	0f 28 66 cd          	movaps -0x33(%rsi),%xmm4
  405a14:	66 0f 3a 0f dc 03    	palignr $0x3,%xmm4,%xmm3
  405a1a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  405a1e:	0f 28 6e bd          	movaps -0x43(%rsi),%xmm5
  405a22:	66 0f 3a 0f e5 03    	palignr $0x3,%xmm5,%xmm4
  405a28:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  405a2c:	0f 28 76 ad          	movaps -0x53(%rsi),%xmm6
  405a30:	66 0f 3a 0f ee 03    	palignr $0x3,%xmm6,%xmm5
  405a36:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  405a3a:	0f 28 7e 9d          	movaps -0x63(%rsi),%xmm7
  405a3e:	66 0f 3a 0f f7 03    	palignr $0x3,%xmm7,%xmm6
  405a44:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  405a48:	44 0f 28 46 8d       	movaps -0x73(%rsi),%xmm8
  405a4d:	66 41 0f 3a 0f f8 03 	palignr $0x3,%xmm8,%xmm7
  405a54:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  405a58:	44 0f 28 8e 7d ff ff 	movaps -0x83(%rsi),%xmm9
  405a5f:	ff 
  405a60:	66 45 0f 3a 0f c1 03 	palignr $0x3,%xmm9,%xmm8
  405a67:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  405a6c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405a73:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  405a77:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  405a7b:	0f 83 6f ff ff ff    	jae    4059f0 &amp;lt;__intel_ssse3_rep_memcpy+0x5e0&amp;gt;
  405a81:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405a86:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405a8d:	48 29 d7             	sub    %rdx,%rdi
  405a90:	48 29 d6             	sub    %rdx,%rsi
  405a93:	4c 8d 1d 9e 33 00 00 	lea    0x339e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  405a9a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405a9e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405aa2:	ff e2                	jmpq   *%rdx
  405aa4:	0f 0b                	ud2    
  405aa6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  405aad:	00 00 00 
  405ab0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405ab7:	0f 28 4e fc          	movaps -0x4(%rsi),%xmm1
  405abb:	0f 28 56 0c          	movaps 0xc(%rsi),%xmm2
  405abf:	0f 28 5e 1c          	movaps 0x1c(%rsi),%xmm3
  405ac3:	0f 28 66 2c          	movaps 0x2c(%rsi),%xmm4
  405ac7:	0f 28 6e 3c          	movaps 0x3c(%rsi),%xmm5
  405acb:	0f 28 76 4c          	movaps 0x4c(%rsi),%xmm6
  405acf:	0f 28 7e 5c          	movaps 0x5c(%rsi),%xmm7
  405ad3:	44 0f 28 46 6c       	movaps 0x6c(%rsi),%xmm8
  405ad8:	44 0f 28 4e 7c       	movaps 0x7c(%rsi),%xmm9
  405add:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  405ae4:	66 45 0f 3a 0f c8 04 	palignr $0x4,%xmm8,%xmm9
  405aeb:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  405af0:	66 44 0f 3a 0f c7 04 	palignr $0x4,%xmm7,%xmm8
  405af7:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  405afc:	66 0f 3a 0f fe 04    	palignr $0x4,%xmm6,%xmm7
  405b02:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405b06:	66 0f 3a 0f f5 04    	palignr $0x4,%xmm5,%xmm6
  405b0c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405b10:	66 0f 3a 0f ec 04    	palignr $0x4,%xmm4,%xmm5
  405b16:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  405b1a:	66 0f 3a 0f e3 04    	palignr $0x4,%xmm3,%xmm4
  405b20:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  405b24:	66 0f 3a 0f da 04    	palignr $0x4,%xmm2,%xmm3
  405b2a:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  405b2e:	66 0f 3a 0f d1 04    	palignr $0x4,%xmm1,%xmm2
  405b34:	0f 29 17             	movaps %xmm2,(%rdi)
  405b37:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  405b3e:	0f 83 6c ff ff ff    	jae    405ab0 &amp;lt;__intel_ssse3_rep_memcpy+0x6a0&amp;gt;
  405b44:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405b49:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405b50:	48 01 d7             	add    %rdx,%rdi
  405b53:	48 01 d6             	add    %rdx,%rsi
  405b56:	4c 8d 1d 1b 35 00 00 	lea    0x351b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  405b5d:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405b61:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405b65:	ff e2                	jmpq   *%rdx
  405b67:	0f 0b                	ud2    
  405b69:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  405b70:	0f 28 4e fc          	movaps -0x4(%rsi),%xmm1
  405b74:	0f 28 56 ec          	movaps -0x14(%rsi),%xmm2
  405b78:	66 0f 3a 0f ca 04    	palignr $0x4,%xmm2,%xmm1
  405b7e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  405b82:	0f 28 5e dc          	movaps -0x24(%rsi),%xmm3
  405b86:	66 0f 3a 0f d3 04    	palignr $0x4,%xmm3,%xmm2
  405b8c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  405b90:	0f 28 66 cc          	movaps -0x34(%rsi),%xmm4
  405b94:	66 0f 3a 0f dc 04    	palignr $0x4,%xmm4,%xmm3
  405b9a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  405b9e:	0f 28 6e bc          	movaps -0x44(%rsi),%xmm5
  405ba2:	66 0f 3a 0f e5 04    	palignr $0x4,%xmm5,%xmm4
  405ba8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  405bac:	0f 28 76 ac          	movaps -0x54(%rsi),%xmm6
  405bb0:	66 0f 3a 0f ee 04    	palignr $0x4,%xmm6,%xmm5
  405bb6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  405bba:	0f 28 7e 9c          	movaps -0x64(%rsi),%xmm7
  405bbe:	66 0f 3a 0f f7 04    	palignr $0x4,%xmm7,%xmm6
  405bc4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  405bc8:	44 0f 28 46 8c       	movaps -0x74(%rsi),%xmm8
  405bcd:	66 41 0f 3a 0f f8 04 	palignr $0x4,%xmm8,%xmm7
  405bd4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  405bd8:	44 0f 28 8e 7c ff ff 	movaps -0x84(%rsi),%xmm9
  405bdf:	ff 
  405be0:	66 45 0f 3a 0f c1 04 	palignr $0x4,%xmm9,%xmm8
  405be7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  405bec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405bf3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  405bf7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  405bfb:	0f 83 6f ff ff ff    	jae    405b70 &amp;lt;__intel_ssse3_rep_memcpy+0x760&amp;gt;
  405c01:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405c06:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405c0d:	48 29 d7             	sub    %rdx,%rdi
  405c10:	48 29 d6             	sub    %rdx,%rsi
  405c13:	4c 8d 1d 1e 32 00 00 	lea    0x321e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  405c1a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405c1e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405c22:	ff e2                	jmpq   *%rdx
  405c24:	0f 0b                	ud2    
  405c26:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  405c2d:	00 00 00 
  405c30:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405c37:	0f 28 4e fb          	movaps -0x5(%rsi),%xmm1
  405c3b:	0f 28 56 0b          	movaps 0xb(%rsi),%xmm2
  405c3f:	0f 28 5e 1b          	movaps 0x1b(%rsi),%xmm3
  405c43:	0f 28 66 2b          	movaps 0x2b(%rsi),%xmm4
  405c47:	0f 28 6e 3b          	movaps 0x3b(%rsi),%xmm5
  405c4b:	0f 28 76 4b          	movaps 0x4b(%rsi),%xmm6
  405c4f:	0f 28 7e 5b          	movaps 0x5b(%rsi),%xmm7
  405c53:	44 0f 28 46 6b       	movaps 0x6b(%rsi),%xmm8
  405c58:	44 0f 28 4e 7b       	movaps 0x7b(%rsi),%xmm9
  405c5d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  405c64:	66 45 0f 3a 0f c8 05 	palignr $0x5,%xmm8,%xmm9
  405c6b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  405c70:	66 44 0f 3a 0f c7 05 	palignr $0x5,%xmm7,%xmm8
  405c77:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  405c7c:	66 0f 3a 0f fe 05    	palignr $0x5,%xmm6,%xmm7
  405c82:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405c86:	66 0f 3a 0f f5 05    	palignr $0x5,%xmm5,%xmm6
  405c8c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405c90:	66 0f 3a 0f ec 05    	palignr $0x5,%xmm4,%xmm5
  405c96:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  405c9a:	66 0f 3a 0f e3 05    	palignr $0x5,%xmm3,%xmm4
  405ca0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  405ca4:	66 0f 3a 0f da 05    	palignr $0x5,%xmm2,%xmm3
  405caa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  405cae:	66 0f 3a 0f d1 05    	palignr $0x5,%xmm1,%xmm2
  405cb4:	0f 29 17             	movaps %xmm2,(%rdi)
  405cb7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  405cbe:	0f 83 6c ff ff ff    	jae    405c30 &amp;lt;__intel_ssse3_rep_memcpy+0x820&amp;gt;
  405cc4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405cc9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405cd0:	48 01 d7             	add    %rdx,%rdi
  405cd3:	48 01 d6             	add    %rdx,%rsi
  405cd6:	4c 8d 1d 9b 33 00 00 	lea    0x339b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  405cdd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405ce1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405ce5:	ff e2                	jmpq   *%rdx
  405ce7:	0f 0b                	ud2    
  405ce9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  405cf0:	0f 28 4e fb          	movaps -0x5(%rsi),%xmm1
  405cf4:	0f 28 56 eb          	movaps -0x15(%rsi),%xmm2
  405cf8:	66 0f 3a 0f ca 05    	palignr $0x5,%xmm2,%xmm1
  405cfe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  405d02:	0f 28 5e db          	movaps -0x25(%rsi),%xmm3
  405d06:	66 0f 3a 0f d3 05    	palignr $0x5,%xmm3,%xmm2
  405d0c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  405d10:	0f 28 66 cb          	movaps -0x35(%rsi),%xmm4
  405d14:	66 0f 3a 0f dc 05    	palignr $0x5,%xmm4,%xmm3
  405d1a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  405d1e:	0f 28 6e bb          	movaps -0x45(%rsi),%xmm5
  405d22:	66 0f 3a 0f e5 05    	palignr $0x5,%xmm5,%xmm4
  405d28:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  405d2c:	0f 28 76 ab          	movaps -0x55(%rsi),%xmm6
  405d30:	66 0f 3a 0f ee 05    	palignr $0x5,%xmm6,%xmm5
  405d36:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  405d3a:	0f 28 7e 9b          	movaps -0x65(%rsi),%xmm7
  405d3e:	66 0f 3a 0f f7 05    	palignr $0x5,%xmm7,%xmm6
  405d44:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  405d48:	44 0f 28 46 8b       	movaps -0x75(%rsi),%xmm8
  405d4d:	66 41 0f 3a 0f f8 05 	palignr $0x5,%xmm8,%xmm7
  405d54:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  405d58:	44 0f 28 8e 7b ff ff 	movaps -0x85(%rsi),%xmm9
  405d5f:	ff 
  405d60:	66 45 0f 3a 0f c1 05 	palignr $0x5,%xmm9,%xmm8
  405d67:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  405d6c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405d73:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  405d77:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  405d7b:	0f 83 6f ff ff ff    	jae    405cf0 &amp;lt;__intel_ssse3_rep_memcpy+0x8e0&amp;gt;
  405d81:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405d86:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405d8d:	48 29 d7             	sub    %rdx,%rdi
  405d90:	48 29 d6             	sub    %rdx,%rsi
  405d93:	4c 8d 1d 9e 30 00 00 	lea    0x309e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  405d9a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405d9e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405da2:	ff e2                	jmpq   *%rdx
  405da4:	0f 0b                	ud2    
  405da6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  405dad:	00 00 00 
  405db0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405db7:	0f 28 4e fa          	movaps -0x6(%rsi),%xmm1
  405dbb:	0f 28 56 0a          	movaps 0xa(%rsi),%xmm2
  405dbf:	0f 28 5e 1a          	movaps 0x1a(%rsi),%xmm3
  405dc3:	0f 28 66 2a          	movaps 0x2a(%rsi),%xmm4
  405dc7:	0f 28 6e 3a          	movaps 0x3a(%rsi),%xmm5
  405dcb:	0f 28 76 4a          	movaps 0x4a(%rsi),%xmm6
  405dcf:	0f 28 7e 5a          	movaps 0x5a(%rsi),%xmm7
  405dd3:	44 0f 28 46 6a       	movaps 0x6a(%rsi),%xmm8
  405dd8:	44 0f 28 4e 7a       	movaps 0x7a(%rsi),%xmm9
  405ddd:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  405de4:	66 45 0f 3a 0f c8 06 	palignr $0x6,%xmm8,%xmm9
  405deb:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  405df0:	66 44 0f 3a 0f c7 06 	palignr $0x6,%xmm7,%xmm8
  405df7:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  405dfc:	66 0f 3a 0f fe 06    	palignr $0x6,%xmm6,%xmm7
  405e02:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405e06:	66 0f 3a 0f f5 06    	palignr $0x6,%xmm5,%xmm6
  405e0c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405e10:	66 0f 3a 0f ec 06    	palignr $0x6,%xmm4,%xmm5
  405e16:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  405e1a:	66 0f 3a 0f e3 06    	palignr $0x6,%xmm3,%xmm4
  405e20:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  405e24:	66 0f 3a 0f da 06    	palignr $0x6,%xmm2,%xmm3
  405e2a:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  405e2e:	66 0f 3a 0f d1 06    	palignr $0x6,%xmm1,%xmm2
  405e34:	0f 29 17             	movaps %xmm2,(%rdi)
  405e37:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  405e3e:	0f 83 6c ff ff ff    	jae    405db0 &amp;lt;__intel_ssse3_rep_memcpy+0x9a0&amp;gt;
  405e44:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405e49:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405e50:	48 01 d7             	add    %rdx,%rdi
  405e53:	48 01 d6             	add    %rdx,%rsi
  405e56:	4c 8d 1d 1b 32 00 00 	lea    0x321b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  405e5d:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405e61:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405e65:	ff e2                	jmpq   *%rdx
  405e67:	0f 0b                	ud2    
  405e69:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  405e70:	0f 28 4e fa          	movaps -0x6(%rsi),%xmm1
  405e74:	0f 28 56 ea          	movaps -0x16(%rsi),%xmm2
  405e78:	66 0f 3a 0f ca 06    	palignr $0x6,%xmm2,%xmm1
  405e7e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  405e82:	0f 28 5e da          	movaps -0x26(%rsi),%xmm3
  405e86:	66 0f 3a 0f d3 06    	palignr $0x6,%xmm3,%xmm2
  405e8c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  405e90:	0f 28 66 ca          	movaps -0x36(%rsi),%xmm4
  405e94:	66 0f 3a 0f dc 06    	palignr $0x6,%xmm4,%xmm3
  405e9a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  405e9e:	0f 28 6e ba          	movaps -0x46(%rsi),%xmm5
  405ea2:	66 0f 3a 0f e5 06    	palignr $0x6,%xmm5,%xmm4
  405ea8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  405eac:	0f 28 76 aa          	movaps -0x56(%rsi),%xmm6
  405eb0:	66 0f 3a 0f ee 06    	palignr $0x6,%xmm6,%xmm5
  405eb6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  405eba:	0f 28 7e 9a          	movaps -0x66(%rsi),%xmm7
  405ebe:	66 0f 3a 0f f7 06    	palignr $0x6,%xmm7,%xmm6
  405ec4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  405ec8:	44 0f 28 46 8a       	movaps -0x76(%rsi),%xmm8
  405ecd:	66 41 0f 3a 0f f8 06 	palignr $0x6,%xmm8,%xmm7
  405ed4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  405ed8:	44 0f 28 8e 7a ff ff 	movaps -0x86(%rsi),%xmm9
  405edf:	ff 
  405ee0:	66 45 0f 3a 0f c1 06 	palignr $0x6,%xmm9,%xmm8
  405ee7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  405eec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405ef3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  405ef7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  405efb:	0f 83 6f ff ff ff    	jae    405e70 &amp;lt;__intel_ssse3_rep_memcpy+0xa60&amp;gt;
  405f01:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405f06:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405f0d:	48 29 d7             	sub    %rdx,%rdi
  405f10:	48 29 d6             	sub    %rdx,%rsi
  405f13:	4c 8d 1d 1e 2f 00 00 	lea    0x2f1e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  405f1a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405f1e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405f22:	ff e2                	jmpq   *%rdx
  405f24:	0f 0b                	ud2    
  405f26:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  405f2d:	00 00 00 
  405f30:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  405f37:	0f 28 4e f9          	movaps -0x7(%rsi),%xmm1
  405f3b:	0f 28 56 09          	movaps 0x9(%rsi),%xmm2
  405f3f:	0f 28 5e 19          	movaps 0x19(%rsi),%xmm3
  405f43:	0f 28 66 29          	movaps 0x29(%rsi),%xmm4
  405f47:	0f 28 6e 39          	movaps 0x39(%rsi),%xmm5
  405f4b:	0f 28 76 49          	movaps 0x49(%rsi),%xmm6
  405f4f:	0f 28 7e 59          	movaps 0x59(%rsi),%xmm7
  405f53:	44 0f 28 46 69       	movaps 0x69(%rsi),%xmm8
  405f58:	44 0f 28 4e 79       	movaps 0x79(%rsi),%xmm9
  405f5d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  405f64:	66 45 0f 3a 0f c8 07 	palignr $0x7,%xmm8,%xmm9
  405f6b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  405f70:	66 44 0f 3a 0f c7 07 	palignr $0x7,%xmm7,%xmm8
  405f77:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  405f7c:	66 0f 3a 0f fe 07    	palignr $0x7,%xmm6,%xmm7
  405f82:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  405f86:	66 0f 3a 0f f5 07    	palignr $0x7,%xmm5,%xmm6
  405f8c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  405f90:	66 0f 3a 0f ec 07    	palignr $0x7,%xmm4,%xmm5
  405f96:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  405f9a:	66 0f 3a 0f e3 07    	palignr $0x7,%xmm3,%xmm4
  405fa0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  405fa4:	66 0f 3a 0f da 07    	palignr $0x7,%xmm2,%xmm3
  405faa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  405fae:	66 0f 3a 0f d1 07    	palignr $0x7,%xmm1,%xmm2
  405fb4:	0f 29 17             	movaps %xmm2,(%rdi)
  405fb7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  405fbe:	0f 83 6c ff ff ff    	jae    405f30 &amp;lt;__intel_ssse3_rep_memcpy+0xb20&amp;gt;
  405fc4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  405fc9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  405fd0:	48 01 d7             	add    %rdx,%rdi
  405fd3:	48 01 d6             	add    %rdx,%rsi
  405fd6:	4c 8d 1d 9b 30 00 00 	lea    0x309b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  405fdd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  405fe1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  405fe5:	ff e2                	jmpq   *%rdx
  405fe7:	0f 0b                	ud2    
  405fe9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  405ff0:	0f 28 4e f9          	movaps -0x7(%rsi),%xmm1
  405ff4:	0f 28 56 e9          	movaps -0x17(%rsi),%xmm2
  405ff8:	66 0f 3a 0f ca 07    	palignr $0x7,%xmm2,%xmm1
  405ffe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406002:	0f 28 5e d9          	movaps -0x27(%rsi),%xmm3
  406006:	66 0f 3a 0f d3 07    	palignr $0x7,%xmm3,%xmm2
  40600c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406010:	0f 28 66 c9          	movaps -0x37(%rsi),%xmm4
  406014:	66 0f 3a 0f dc 07    	palignr $0x7,%xmm4,%xmm3
  40601a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40601e:	0f 28 6e b9          	movaps -0x47(%rsi),%xmm5
  406022:	66 0f 3a 0f e5 07    	palignr $0x7,%xmm5,%xmm4
  406028:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  40602c:	0f 28 76 a9          	movaps -0x57(%rsi),%xmm6
  406030:	66 0f 3a 0f ee 07    	palignr $0x7,%xmm6,%xmm5
  406036:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  40603a:	0f 28 7e 99          	movaps -0x67(%rsi),%xmm7
  40603e:	66 0f 3a 0f f7 07    	palignr $0x7,%xmm7,%xmm6
  406044:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  406048:	44 0f 28 46 89       	movaps -0x77(%rsi),%xmm8
  40604d:	66 41 0f 3a 0f f8 07 	palignr $0x7,%xmm8,%xmm7
  406054:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  406058:	44 0f 28 8e 79 ff ff 	movaps -0x87(%rsi),%xmm9
  40605f:	ff 
  406060:	66 45 0f 3a 0f c1 07 	palignr $0x7,%xmm9,%xmm8
  406067:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  40606c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406073:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406077:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  40607b:	0f 83 6f ff ff ff    	jae    405ff0 &amp;lt;__intel_ssse3_rep_memcpy+0xbe0&amp;gt;
  406081:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406086:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40608d:	48 29 d7             	sub    %rdx,%rdi
  406090:	48 29 d6             	sub    %rdx,%rsi
  406093:	4c 8d 1d 9e 2d 00 00 	lea    0x2d9e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40609a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40609e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4060a2:	ff e2                	jmpq   *%rdx
  4060a4:	0f 0b                	ud2    
  4060a6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4060ad:	00 00 00 
  4060b0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4060b7:	0f 28 4e f8          	movaps -0x8(%rsi),%xmm1
  4060bb:	0f 28 56 08          	movaps 0x8(%rsi),%xmm2
  4060bf:	0f 28 5e 18          	movaps 0x18(%rsi),%xmm3
  4060c3:	0f 28 66 28          	movaps 0x28(%rsi),%xmm4
  4060c7:	0f 28 6e 38          	movaps 0x38(%rsi),%xmm5
  4060cb:	0f 28 76 48          	movaps 0x48(%rsi),%xmm6
  4060cf:	0f 28 7e 58          	movaps 0x58(%rsi),%xmm7
  4060d3:	44 0f 28 46 68       	movaps 0x68(%rsi),%xmm8
  4060d8:	44 0f 28 4e 78       	movaps 0x78(%rsi),%xmm9
  4060dd:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  4060e4:	66 45 0f 3a 0f c8 08 	palignr $0x8,%xmm8,%xmm9
  4060eb:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  4060f0:	66 44 0f 3a 0f c7 08 	palignr $0x8,%xmm7,%xmm8
  4060f7:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  4060fc:	66 0f 3a 0f fe 08    	palignr $0x8,%xmm6,%xmm7
  406102:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406106:	66 0f 3a 0f f5 08    	palignr $0x8,%xmm5,%xmm6
  40610c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406110:	66 0f 3a 0f ec 08    	palignr $0x8,%xmm4,%xmm5
  406116:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40611a:	66 0f 3a 0f e3 08    	palignr $0x8,%xmm3,%xmm4
  406120:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  406124:	66 0f 3a 0f da 08    	palignr $0x8,%xmm2,%xmm3
  40612a:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  40612e:	66 0f 3a 0f d1 08    	palignr $0x8,%xmm1,%xmm2
  406134:	0f 29 17             	movaps %xmm2,(%rdi)
  406137:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  40613e:	0f 83 6c ff ff ff    	jae    4060b0 &amp;lt;__intel_ssse3_rep_memcpy+0xca0&amp;gt;
  406144:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406149:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406150:	48 01 d7             	add    %rdx,%rdi
  406153:	48 01 d6             	add    %rdx,%rsi
  406156:	4c 8d 1d 1b 2f 00 00 	lea    0x2f1b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  40615d:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406161:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406165:	ff e2                	jmpq   *%rdx
  406167:	0f 0b                	ud2    
  406169:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  406170:	0f 28 4e f8          	movaps -0x8(%rsi),%xmm1
  406174:	0f 28 56 e8          	movaps -0x18(%rsi),%xmm2
  406178:	66 0f 3a 0f ca 08    	palignr $0x8,%xmm2,%xmm1
  40617e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406182:	0f 28 5e d8          	movaps -0x28(%rsi),%xmm3
  406186:	66 0f 3a 0f d3 08    	palignr $0x8,%xmm3,%xmm2
  40618c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406190:	0f 28 66 c8          	movaps -0x38(%rsi),%xmm4
  406194:	66 0f 3a 0f dc 08    	palignr $0x8,%xmm4,%xmm3
  40619a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40619e:	0f 28 6e b8          	movaps -0x48(%rsi),%xmm5
  4061a2:	66 0f 3a 0f e5 08    	palignr $0x8,%xmm5,%xmm4
  4061a8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  4061ac:	0f 28 76 a8          	movaps -0x58(%rsi),%xmm6
  4061b0:	66 0f 3a 0f ee 08    	palignr $0x8,%xmm6,%xmm5
  4061b6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  4061ba:	0f 28 7e 98          	movaps -0x68(%rsi),%xmm7
  4061be:	66 0f 3a 0f f7 08    	palignr $0x8,%xmm7,%xmm6
  4061c4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  4061c8:	44 0f 28 46 88       	movaps -0x78(%rsi),%xmm8
  4061cd:	66 41 0f 3a 0f f8 08 	palignr $0x8,%xmm8,%xmm7
  4061d4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  4061d8:	44 0f 28 8e 78 ff ff 	movaps -0x88(%rsi),%xmm9
  4061df:	ff 
  4061e0:	66 45 0f 3a 0f c1 08 	palignr $0x8,%xmm9,%xmm8
  4061e7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  4061ec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4061f3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  4061f7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  4061fb:	0f 83 6f ff ff ff    	jae    406170 &amp;lt;__intel_ssse3_rep_memcpy+0xd60&amp;gt;
  406201:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406206:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40620d:	48 29 d7             	sub    %rdx,%rdi
  406210:	48 29 d6             	sub    %rdx,%rsi
  406213:	4c 8d 1d 1e 2c 00 00 	lea    0x2c1e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40621a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40621e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406222:	ff e2                	jmpq   *%rdx
  406224:	0f 0b                	ud2    
  406226:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40622d:	00 00 00 
  406230:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406237:	0f 28 4e f7          	movaps -0x9(%rsi),%xmm1
  40623b:	0f 28 56 07          	movaps 0x7(%rsi),%xmm2
  40623f:	0f 28 5e 17          	movaps 0x17(%rsi),%xmm3
  406243:	0f 28 66 27          	movaps 0x27(%rsi),%xmm4
  406247:	0f 28 6e 37          	movaps 0x37(%rsi),%xmm5
  40624b:	0f 28 76 47          	movaps 0x47(%rsi),%xmm6
  40624f:	0f 28 7e 57          	movaps 0x57(%rsi),%xmm7
  406253:	44 0f 28 46 67       	movaps 0x67(%rsi),%xmm8
  406258:	44 0f 28 4e 77       	movaps 0x77(%rsi),%xmm9
  40625d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  406264:	66 45 0f 3a 0f c8 09 	palignr $0x9,%xmm8,%xmm9
  40626b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  406270:	66 44 0f 3a 0f c7 09 	palignr $0x9,%xmm7,%xmm8
  406277:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  40627c:	66 0f 3a 0f fe 09    	palignr $0x9,%xmm6,%xmm7
  406282:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406286:	66 0f 3a 0f f5 09    	palignr $0x9,%xmm5,%xmm6
  40628c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406290:	66 0f 3a 0f ec 09    	palignr $0x9,%xmm4,%xmm5
  406296:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40629a:	66 0f 3a 0f e3 09    	palignr $0x9,%xmm3,%xmm4
  4062a0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  4062a4:	66 0f 3a 0f da 09    	palignr $0x9,%xmm2,%xmm3
  4062aa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  4062ae:	66 0f 3a 0f d1 09    	palignr $0x9,%xmm1,%xmm2
  4062b4:	0f 29 17             	movaps %xmm2,(%rdi)
  4062b7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  4062be:	0f 83 6c ff ff ff    	jae    406230 &amp;lt;__intel_ssse3_rep_memcpy+0xe20&amp;gt;
  4062c4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  4062c9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  4062d0:	48 01 d7             	add    %rdx,%rdi
  4062d3:	48 01 d6             	add    %rdx,%rsi
  4062d6:	4c 8d 1d 9b 2d 00 00 	lea    0x2d9b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  4062dd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  4062e1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4062e5:	ff e2                	jmpq   *%rdx
  4062e7:	0f 0b                	ud2    
  4062e9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4062f0:	0f 28 4e f7          	movaps -0x9(%rsi),%xmm1
  4062f4:	0f 28 56 e7          	movaps -0x19(%rsi),%xmm2
  4062f8:	66 0f 3a 0f ca 09    	palignr $0x9,%xmm2,%xmm1
  4062fe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406302:	0f 28 5e d7          	movaps -0x29(%rsi),%xmm3
  406306:	66 0f 3a 0f d3 09    	palignr $0x9,%xmm3,%xmm2
  40630c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406310:	0f 28 66 c7          	movaps -0x39(%rsi),%xmm4
  406314:	66 0f 3a 0f dc 09    	palignr $0x9,%xmm4,%xmm3
  40631a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40631e:	0f 28 6e b7          	movaps -0x49(%rsi),%xmm5
  406322:	66 0f 3a 0f e5 09    	palignr $0x9,%xmm5,%xmm4
  406328:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  40632c:	0f 28 76 a7          	movaps -0x59(%rsi),%xmm6
  406330:	66 0f 3a 0f ee 09    	palignr $0x9,%xmm6,%xmm5
  406336:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  40633a:	0f 28 7e 97          	movaps -0x69(%rsi),%xmm7
  40633e:	66 0f 3a 0f f7 09    	palignr $0x9,%xmm7,%xmm6
  406344:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  406348:	44 0f 28 46 87       	movaps -0x79(%rsi),%xmm8
  40634d:	66 41 0f 3a 0f f8 09 	palignr $0x9,%xmm8,%xmm7
  406354:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  406358:	44 0f 28 8e 77 ff ff 	movaps -0x89(%rsi),%xmm9
  40635f:	ff 
  406360:	66 45 0f 3a 0f c1 09 	palignr $0x9,%xmm9,%xmm8
  406367:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  40636c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406373:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406377:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  40637b:	0f 83 6f ff ff ff    	jae    4062f0 &amp;lt;__intel_ssse3_rep_memcpy+0xee0&amp;gt;
  406381:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406386:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40638d:	48 29 d7             	sub    %rdx,%rdi
  406390:	48 29 d6             	sub    %rdx,%rsi
  406393:	4c 8d 1d 9e 2a 00 00 	lea    0x2a9e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40639a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40639e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4063a2:	ff e2                	jmpq   *%rdx
  4063a4:	0f 0b                	ud2    
  4063a6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4063ad:	00 00 00 
  4063b0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4063b7:	0f 28 4e f6          	movaps -0xa(%rsi),%xmm1
  4063bb:	0f 28 56 06          	movaps 0x6(%rsi),%xmm2
  4063bf:	0f 28 5e 16          	movaps 0x16(%rsi),%xmm3
  4063c3:	0f 28 66 26          	movaps 0x26(%rsi),%xmm4
  4063c7:	0f 28 6e 36          	movaps 0x36(%rsi),%xmm5
  4063cb:	0f 28 76 46          	movaps 0x46(%rsi),%xmm6
  4063cf:	0f 28 7e 56          	movaps 0x56(%rsi),%xmm7
  4063d3:	44 0f 28 46 66       	movaps 0x66(%rsi),%xmm8
  4063d8:	44 0f 28 4e 76       	movaps 0x76(%rsi),%xmm9
  4063dd:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  4063e4:	66 45 0f 3a 0f c8 0a 	palignr $0xa,%xmm8,%xmm9
  4063eb:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  4063f0:	66 44 0f 3a 0f c7 0a 	palignr $0xa,%xmm7,%xmm8
  4063f7:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  4063fc:	66 0f 3a 0f fe 0a    	palignr $0xa,%xmm6,%xmm7
  406402:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406406:	66 0f 3a 0f f5 0a    	palignr $0xa,%xmm5,%xmm6
  40640c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406410:	66 0f 3a 0f ec 0a    	palignr $0xa,%xmm4,%xmm5
  406416:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40641a:	66 0f 3a 0f e3 0a    	palignr $0xa,%xmm3,%xmm4
  406420:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  406424:	66 0f 3a 0f da 0a    	palignr $0xa,%xmm2,%xmm3
  40642a:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  40642e:	66 0f 3a 0f d1 0a    	palignr $0xa,%xmm1,%xmm2
  406434:	0f 29 17             	movaps %xmm2,(%rdi)
  406437:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  40643e:	0f 83 6c ff ff ff    	jae    4063b0 &amp;lt;__intel_ssse3_rep_memcpy+0xfa0&amp;gt;
  406444:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406449:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406450:	48 01 d7             	add    %rdx,%rdi
  406453:	48 01 d6             	add    %rdx,%rsi
  406456:	4c 8d 1d 1b 2c 00 00 	lea    0x2c1b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  40645d:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406461:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406465:	ff e2                	jmpq   *%rdx
  406467:	0f 0b                	ud2    
  406469:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  406470:	0f 28 4e f6          	movaps -0xa(%rsi),%xmm1
  406474:	0f 28 56 e6          	movaps -0x1a(%rsi),%xmm2
  406478:	66 0f 3a 0f ca 0a    	palignr $0xa,%xmm2,%xmm1
  40647e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406482:	0f 28 5e d6          	movaps -0x2a(%rsi),%xmm3
  406486:	66 0f 3a 0f d3 0a    	palignr $0xa,%xmm3,%xmm2
  40648c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406490:	0f 28 66 c6          	movaps -0x3a(%rsi),%xmm4
  406494:	66 0f 3a 0f dc 0a    	palignr $0xa,%xmm4,%xmm3
  40649a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40649e:	0f 28 6e b6          	movaps -0x4a(%rsi),%xmm5
  4064a2:	66 0f 3a 0f e5 0a    	palignr $0xa,%xmm5,%xmm4
  4064a8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  4064ac:	0f 28 76 a6          	movaps -0x5a(%rsi),%xmm6
  4064b0:	66 0f 3a 0f ee 0a    	palignr $0xa,%xmm6,%xmm5
  4064b6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  4064ba:	0f 28 7e 96          	movaps -0x6a(%rsi),%xmm7
  4064be:	66 0f 3a 0f f7 0a    	palignr $0xa,%xmm7,%xmm6
  4064c4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  4064c8:	44 0f 28 46 86       	movaps -0x7a(%rsi),%xmm8
  4064cd:	66 41 0f 3a 0f f8 0a 	palignr $0xa,%xmm8,%xmm7
  4064d4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  4064d8:	44 0f 28 8e 76 ff ff 	movaps -0x8a(%rsi),%xmm9
  4064df:	ff 
  4064e0:	66 45 0f 3a 0f c1 0a 	palignr $0xa,%xmm9,%xmm8
  4064e7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  4064ec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4064f3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  4064f7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  4064fb:	0f 83 6f ff ff ff    	jae    406470 &amp;lt;__intel_ssse3_rep_memcpy+0x1060&amp;gt;
  406501:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406506:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40650d:	48 29 d7             	sub    %rdx,%rdi
  406510:	48 29 d6             	sub    %rdx,%rsi
  406513:	4c 8d 1d 1e 29 00 00 	lea    0x291e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40651a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40651e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406522:	ff e2                	jmpq   *%rdx
  406524:	0f 0b                	ud2    
  406526:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40652d:	00 00 00 
  406530:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406537:	0f 28 4e f5          	movaps -0xb(%rsi),%xmm1
  40653b:	0f 28 56 05          	movaps 0x5(%rsi),%xmm2
  40653f:	0f 28 5e 15          	movaps 0x15(%rsi),%xmm3
  406543:	0f 28 66 25          	movaps 0x25(%rsi),%xmm4
  406547:	0f 28 6e 35          	movaps 0x35(%rsi),%xmm5
  40654b:	0f 28 76 45          	movaps 0x45(%rsi),%xmm6
  40654f:	0f 28 7e 55          	movaps 0x55(%rsi),%xmm7
  406553:	44 0f 28 46 65       	movaps 0x65(%rsi),%xmm8
  406558:	44 0f 28 4e 75       	movaps 0x75(%rsi),%xmm9
  40655d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  406564:	66 45 0f 3a 0f c8 0b 	palignr $0xb,%xmm8,%xmm9
  40656b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  406570:	66 44 0f 3a 0f c7 0b 	palignr $0xb,%xmm7,%xmm8
  406577:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  40657c:	66 0f 3a 0f fe 0b    	palignr $0xb,%xmm6,%xmm7
  406582:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406586:	66 0f 3a 0f f5 0b    	palignr $0xb,%xmm5,%xmm6
  40658c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406590:	66 0f 3a 0f ec 0b    	palignr $0xb,%xmm4,%xmm5
  406596:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40659a:	66 0f 3a 0f e3 0b    	palignr $0xb,%xmm3,%xmm4
  4065a0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  4065a4:	66 0f 3a 0f da 0b    	palignr $0xb,%xmm2,%xmm3
  4065aa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  4065ae:	66 0f 3a 0f d1 0b    	palignr $0xb,%xmm1,%xmm2
  4065b4:	0f 29 17             	movaps %xmm2,(%rdi)
  4065b7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  4065be:	0f 83 6c ff ff ff    	jae    406530 &amp;lt;__intel_ssse3_rep_memcpy+0x1120&amp;gt;
  4065c4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  4065c9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  4065d0:	48 01 d7             	add    %rdx,%rdi
  4065d3:	48 01 d6             	add    %rdx,%rsi
  4065d6:	4c 8d 1d 9b 2a 00 00 	lea    0x2a9b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  4065dd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  4065e1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4065e5:	ff e2                	jmpq   *%rdx
  4065e7:	0f 0b                	ud2    
  4065e9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4065f0:	0f 28 4e f5          	movaps -0xb(%rsi),%xmm1
  4065f4:	0f 28 56 e5          	movaps -0x1b(%rsi),%xmm2
  4065f8:	66 0f 3a 0f ca 0b    	palignr $0xb,%xmm2,%xmm1
  4065fe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406602:	0f 28 5e d5          	movaps -0x2b(%rsi),%xmm3
  406606:	66 0f 3a 0f d3 0b    	palignr $0xb,%xmm3,%xmm2
  40660c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406610:	0f 28 66 c5          	movaps -0x3b(%rsi),%xmm4
  406614:	66 0f 3a 0f dc 0b    	palignr $0xb,%xmm4,%xmm3
  40661a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40661e:	0f 28 6e b5          	movaps -0x4b(%rsi),%xmm5
  406622:	66 0f 3a 0f e5 0b    	palignr $0xb,%xmm5,%xmm4
  406628:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  40662c:	0f 28 76 a5          	movaps -0x5b(%rsi),%xmm6
  406630:	66 0f 3a 0f ee 0b    	palignr $0xb,%xmm6,%xmm5
  406636:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  40663a:	0f 28 7e 95          	movaps -0x6b(%rsi),%xmm7
  40663e:	66 0f 3a 0f f7 0b    	palignr $0xb,%xmm7,%xmm6
  406644:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  406648:	44 0f 28 46 85       	movaps -0x7b(%rsi),%xmm8
  40664d:	66 41 0f 3a 0f f8 0b 	palignr $0xb,%xmm8,%xmm7
  406654:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  406658:	44 0f 28 8e 75 ff ff 	movaps -0x8b(%rsi),%xmm9
  40665f:	ff 
  406660:	66 45 0f 3a 0f c1 0b 	palignr $0xb,%xmm9,%xmm8
  406667:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  40666c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406673:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406677:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  40667b:	0f 83 6f ff ff ff    	jae    4065f0 &amp;lt;__intel_ssse3_rep_memcpy+0x11e0&amp;gt;
  406681:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406686:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40668d:	48 29 d7             	sub    %rdx,%rdi
  406690:	48 29 d6             	sub    %rdx,%rsi
  406693:	4c 8d 1d 9e 27 00 00 	lea    0x279e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40669a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40669e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4066a2:	ff e2                	jmpq   *%rdx
  4066a4:	0f 0b                	ud2    
  4066a6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4066ad:	00 00 00 
  4066b0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4066b7:	66 0f 6f 4e f4       	movdqa -0xc(%rsi),%xmm1
  4066bc:	0f 28 56 04          	movaps 0x4(%rsi),%xmm2
  4066c0:	0f 28 5e 14          	movaps 0x14(%rsi),%xmm3
  4066c4:	0f 28 66 24          	movaps 0x24(%rsi),%xmm4
  4066c8:	0f 28 6e 34          	movaps 0x34(%rsi),%xmm5
  4066cc:	0f 28 76 44          	movaps 0x44(%rsi),%xmm6
  4066d0:	0f 28 7e 54          	movaps 0x54(%rsi),%xmm7
  4066d4:	44 0f 28 46 64       	movaps 0x64(%rsi),%xmm8
  4066d9:	44 0f 28 4e 74       	movaps 0x74(%rsi),%xmm9
  4066de:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  4066e5:	66 45 0f 3a 0f c8 0c 	palignr $0xc,%xmm8,%xmm9
  4066ec:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  4066f1:	66 44 0f 3a 0f c7 0c 	palignr $0xc,%xmm7,%xmm8
  4066f8:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  4066fd:	66 0f 3a 0f fe 0c    	palignr $0xc,%xmm6,%xmm7
  406703:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406707:	66 0f 3a 0f f5 0c    	palignr $0xc,%xmm5,%xmm6
  40670d:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406711:	66 0f 3a 0f ec 0c    	palignr $0xc,%xmm4,%xmm5
  406717:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40671b:	66 0f 3a 0f e3 0c    	palignr $0xc,%xmm3,%xmm4
  406721:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  406725:	66 0f 3a 0f da 0c    	palignr $0xc,%xmm2,%xmm3
  40672b:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  40672f:	66 0f 3a 0f d1 0c    	palignr $0xc,%xmm1,%xmm2
  406735:	0f 29 17             	movaps %xmm2,(%rdi)
  406738:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  40673f:	0f 83 6b ff ff ff    	jae    4066b0 &amp;lt;__intel_ssse3_rep_memcpy+0x12a0&amp;gt;
  406745:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  40674a:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406751:	48 01 d7             	add    %rdx,%rdi
  406754:	48 01 d6             	add    %rdx,%rsi
  406757:	4c 8d 1d 1a 29 00 00 	lea    0x291a(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  40675e:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406762:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406766:	ff e2                	jmpq   *%rdx
  406768:	0f 0b                	ud2    
  40676a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  406770:	0f 28 4e f4          	movaps -0xc(%rsi),%xmm1
  406774:	0f 28 56 e4          	movaps -0x1c(%rsi),%xmm2
  406778:	66 0f 3a 0f ca 0c    	palignr $0xc,%xmm2,%xmm1
  40677e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406782:	0f 28 5e d4          	movaps -0x2c(%rsi),%xmm3
  406786:	66 0f 3a 0f d3 0c    	palignr $0xc,%xmm3,%xmm2
  40678c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406790:	0f 28 66 c4          	movaps -0x3c(%rsi),%xmm4
  406794:	66 0f 3a 0f dc 0c    	palignr $0xc,%xmm4,%xmm3
  40679a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40679e:	0f 28 6e b4          	movaps -0x4c(%rsi),%xmm5
  4067a2:	66 0f 3a 0f e5 0c    	palignr $0xc,%xmm5,%xmm4
  4067a8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  4067ac:	0f 28 76 a4          	movaps -0x5c(%rsi),%xmm6
  4067b0:	66 0f 3a 0f ee 0c    	palignr $0xc,%xmm6,%xmm5
  4067b6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  4067ba:	0f 28 7e 94          	movaps -0x6c(%rsi),%xmm7
  4067be:	66 0f 3a 0f f7 0c    	palignr $0xc,%xmm7,%xmm6
  4067c4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  4067c8:	44 0f 28 46 84       	movaps -0x7c(%rsi),%xmm8
  4067cd:	66 41 0f 3a 0f f8 0c 	palignr $0xc,%xmm8,%xmm7
  4067d4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  4067d8:	44 0f 28 8e 74 ff ff 	movaps -0x8c(%rsi),%xmm9
  4067df:	ff 
  4067e0:	66 45 0f 3a 0f c1 0c 	palignr $0xc,%xmm9,%xmm8
  4067e7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  4067ec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4067f3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  4067f7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  4067fb:	0f 83 6f ff ff ff    	jae    406770 &amp;lt;__intel_ssse3_rep_memcpy+0x1360&amp;gt;
  406801:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406806:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40680d:	48 29 d7             	sub    %rdx,%rdi
  406810:	48 29 d6             	sub    %rdx,%rsi
  406813:	4c 8d 1d 1e 26 00 00 	lea    0x261e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40681a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40681e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406822:	ff e2                	jmpq   *%rdx
  406824:	0f 0b                	ud2    
  406826:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40682d:	00 00 00 
  406830:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406837:	0f 28 4e f3          	movaps -0xd(%rsi),%xmm1
  40683b:	0f 28 56 03          	movaps 0x3(%rsi),%xmm2
  40683f:	0f 28 5e 13          	movaps 0x13(%rsi),%xmm3
  406843:	0f 28 66 23          	movaps 0x23(%rsi),%xmm4
  406847:	0f 28 6e 33          	movaps 0x33(%rsi),%xmm5
  40684b:	0f 28 76 43          	movaps 0x43(%rsi),%xmm6
  40684f:	0f 28 7e 53          	movaps 0x53(%rsi),%xmm7
  406853:	44 0f 28 46 63       	movaps 0x63(%rsi),%xmm8
  406858:	44 0f 28 4e 73       	movaps 0x73(%rsi),%xmm9
  40685d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  406864:	66 45 0f 3a 0f c8 0d 	palignr $0xd,%xmm8,%xmm9
  40686b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  406870:	66 44 0f 3a 0f c7 0d 	palignr $0xd,%xmm7,%xmm8
  406877:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  40687c:	66 0f 3a 0f fe 0d    	palignr $0xd,%xmm6,%xmm7
  406882:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406886:	66 0f 3a 0f f5 0d    	palignr $0xd,%xmm5,%xmm6
  40688c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406890:	66 0f 3a 0f ec 0d    	palignr $0xd,%xmm4,%xmm5
  406896:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  40689a:	66 0f 3a 0f e3 0d    	palignr $0xd,%xmm3,%xmm4
  4068a0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  4068a4:	66 0f 3a 0f da 0d    	palignr $0xd,%xmm2,%xmm3
  4068aa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  4068ae:	66 0f 3a 0f d1 0d    	palignr $0xd,%xmm1,%xmm2
  4068b4:	0f 29 17             	movaps %xmm2,(%rdi)
  4068b7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  4068be:	0f 83 6c ff ff ff    	jae    406830 &amp;lt;__intel_ssse3_rep_memcpy+0x1420&amp;gt;
  4068c4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  4068c9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  4068d0:	48 01 d7             	add    %rdx,%rdi
  4068d3:	48 01 d6             	add    %rdx,%rsi
  4068d6:	4c 8d 1d 9b 27 00 00 	lea    0x279b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  4068dd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  4068e1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4068e5:	ff e2                	jmpq   *%rdx
  4068e7:	0f 0b                	ud2    
  4068e9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4068f0:	0f 28 4e f3          	movaps -0xd(%rsi),%xmm1
  4068f4:	0f 28 56 e3          	movaps -0x1d(%rsi),%xmm2
  4068f8:	66 0f 3a 0f ca 0d    	palignr $0xd,%xmm2,%xmm1
  4068fe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406902:	0f 28 5e d3          	movaps -0x2d(%rsi),%xmm3
  406906:	66 0f 3a 0f d3 0d    	palignr $0xd,%xmm3,%xmm2
  40690c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406910:	0f 28 66 c3          	movaps -0x3d(%rsi),%xmm4
  406914:	66 0f 3a 0f dc 0d    	palignr $0xd,%xmm4,%xmm3
  40691a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  40691e:	0f 28 6e b3          	movaps -0x4d(%rsi),%xmm5
  406922:	66 0f 3a 0f e5 0d    	palignr $0xd,%xmm5,%xmm4
  406928:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  40692c:	0f 28 76 a3          	movaps -0x5d(%rsi),%xmm6
  406930:	66 0f 3a 0f ee 0d    	palignr $0xd,%xmm6,%xmm5
  406936:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  40693a:	0f 28 7e 93          	movaps -0x6d(%rsi),%xmm7
  40693e:	66 0f 3a 0f f7 0d    	palignr $0xd,%xmm7,%xmm6
  406944:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  406948:	44 0f 28 46 83       	movaps -0x7d(%rsi),%xmm8
  40694d:	66 41 0f 3a 0f f8 0d 	palignr $0xd,%xmm8,%xmm7
  406954:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  406958:	44 0f 28 8e 73 ff ff 	movaps -0x8d(%rsi),%xmm9
  40695f:	ff 
  406960:	66 45 0f 3a 0f c1 0d 	palignr $0xd,%xmm9,%xmm8
  406967:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  40696c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406973:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406977:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  40697b:	0f 83 6f ff ff ff    	jae    4068f0 &amp;lt;__intel_ssse3_rep_memcpy+0x14e0&amp;gt;
  406981:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406986:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  40698d:	48 29 d7             	sub    %rdx,%rdi
  406990:	48 29 d6             	sub    %rdx,%rsi
  406993:	4c 8d 1d 9e 24 00 00 	lea    0x249e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  40699a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  40699e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  4069a2:	ff e2                	jmpq   *%rdx
  4069a4:	0f 0b                	ud2    
  4069a6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4069ad:	00 00 00 
  4069b0:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  4069b7:	0f 28 4e f2          	movaps -0xe(%rsi),%xmm1
  4069bb:	0f 28 56 02          	movaps 0x2(%rsi),%xmm2
  4069bf:	0f 28 5e 12          	movaps 0x12(%rsi),%xmm3
  4069c3:	0f 28 66 22          	movaps 0x22(%rsi),%xmm4
  4069c7:	0f 28 6e 32          	movaps 0x32(%rsi),%xmm5
  4069cb:	0f 28 76 42          	movaps 0x42(%rsi),%xmm6
  4069cf:	0f 28 7e 52          	movaps 0x52(%rsi),%xmm7
  4069d3:	44 0f 28 46 62       	movaps 0x62(%rsi),%xmm8
  4069d8:	44 0f 28 4e 72       	movaps 0x72(%rsi),%xmm9
  4069dd:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  4069e4:	66 45 0f 3a 0f c8 0e 	palignr $0xe,%xmm8,%xmm9
  4069eb:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  4069f0:	66 44 0f 3a 0f c7 0e 	palignr $0xe,%xmm7,%xmm8
  4069f7:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  4069fc:	66 0f 3a 0f fe 0e    	palignr $0xe,%xmm6,%xmm7
  406a02:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406a06:	66 0f 3a 0f f5 0e    	palignr $0xe,%xmm5,%xmm6
  406a0c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406a10:	66 0f 3a 0f ec 0e    	palignr $0xe,%xmm4,%xmm5
  406a16:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  406a1a:	66 0f 3a 0f e3 0e    	palignr $0xe,%xmm3,%xmm4
  406a20:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  406a24:	66 0f 3a 0f da 0e    	palignr $0xe,%xmm2,%xmm3
  406a2a:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  406a2e:	66 0f 3a 0f d1 0e    	palignr $0xe,%xmm1,%xmm2
  406a34:	0f 29 17             	movaps %xmm2,(%rdi)
  406a37:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  406a3e:	0f 83 6c ff ff ff    	jae    4069b0 &amp;lt;__intel_ssse3_rep_memcpy+0x15a0&amp;gt;
  406a44:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406a49:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406a50:	48 01 d7             	add    %rdx,%rdi
  406a53:	48 01 d6             	add    %rdx,%rsi
  406a56:	4c 8d 1d 1b 26 00 00 	lea    0x261b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  406a5d:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406a61:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406a65:	ff e2                	jmpq   *%rdx
  406a67:	0f 0b                	ud2    
  406a69:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  406a70:	0f 28 4e f2          	movaps -0xe(%rsi),%xmm1
  406a74:	0f 28 56 e2          	movaps -0x1e(%rsi),%xmm2
  406a78:	66 0f 3a 0f ca 0e    	palignr $0xe,%xmm2,%xmm1
  406a7e:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406a82:	0f 28 5e d2          	movaps -0x2e(%rsi),%xmm3
  406a86:	66 0f 3a 0f d3 0e    	palignr $0xe,%xmm3,%xmm2
  406a8c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406a90:	0f 28 66 c2          	movaps -0x3e(%rsi),%xmm4
  406a94:	66 0f 3a 0f dc 0e    	palignr $0xe,%xmm4,%xmm3
  406a9a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  406a9e:	0f 28 6e b2          	movaps -0x4e(%rsi),%xmm5
  406aa2:	66 0f 3a 0f e5 0e    	palignr $0xe,%xmm5,%xmm4
  406aa8:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  406aac:	0f 28 76 a2          	movaps -0x5e(%rsi),%xmm6
  406ab0:	66 0f 3a 0f ee 0e    	palignr $0xe,%xmm6,%xmm5
  406ab6:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  406aba:	0f 28 7e 92          	movaps -0x6e(%rsi),%xmm7
  406abe:	66 0f 3a 0f f7 0e    	palignr $0xe,%xmm7,%xmm6
  406ac4:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  406ac8:	44 0f 28 46 82       	movaps -0x7e(%rsi),%xmm8
  406acd:	66 41 0f 3a 0f f8 0e 	palignr $0xe,%xmm8,%xmm7
  406ad4:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  406ad8:	44 0f 28 8e 72 ff ff 	movaps -0x8e(%rsi),%xmm9
  406adf:	ff 
  406ae0:	66 45 0f 3a 0f c1 0e 	palignr $0xe,%xmm9,%xmm8
  406ae7:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  406aec:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406af3:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406af7:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  406afb:	0f 83 6f ff ff ff    	jae    406a70 &amp;lt;__intel_ssse3_rep_memcpy+0x1660&amp;gt;
  406b01:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406b06:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406b0d:	48 29 d7             	sub    %rdx,%rdi
  406b10:	48 29 d6             	sub    %rdx,%rsi
  406b13:	4c 8d 1d 1e 23 00 00 	lea    0x231e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  406b1a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406b1e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406b22:	ff e2                	jmpq   *%rdx
  406b24:	0f 0b                	ud2    
  406b26:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  406b2d:	00 00 00 
  406b30:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406b37:	0f 28 4e f1          	movaps -0xf(%rsi),%xmm1
  406b3b:	0f 28 56 01          	movaps 0x1(%rsi),%xmm2
  406b3f:	0f 28 5e 11          	movaps 0x11(%rsi),%xmm3
  406b43:	0f 28 66 21          	movaps 0x21(%rsi),%xmm4
  406b47:	0f 28 6e 31          	movaps 0x31(%rsi),%xmm5
  406b4b:	0f 28 76 41          	movaps 0x41(%rsi),%xmm6
  406b4f:	0f 28 7e 51          	movaps 0x51(%rsi),%xmm7
  406b53:	44 0f 28 46 61       	movaps 0x61(%rsi),%xmm8
  406b58:	44 0f 28 4e 71       	movaps 0x71(%rsi),%xmm9
  406b5d:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  406b64:	66 45 0f 3a 0f c8 0f 	palignr $0xf,%xmm8,%xmm9
  406b6b:	44 0f 29 4f 70       	movaps %xmm9,0x70(%rdi)
  406b70:	66 44 0f 3a 0f c7 0f 	palignr $0xf,%xmm7,%xmm8
  406b77:	44 0f 29 47 60       	movaps %xmm8,0x60(%rdi)
  406b7c:	66 0f 3a 0f fe 0f    	palignr $0xf,%xmm6,%xmm7
  406b82:	0f 29 7f 50          	movaps %xmm7,0x50(%rdi)
  406b86:	66 0f 3a 0f f5 0f    	palignr $0xf,%xmm5,%xmm6
  406b8c:	0f 29 77 40          	movaps %xmm6,0x40(%rdi)
  406b90:	66 0f 3a 0f ec 0f    	palignr $0xf,%xmm4,%xmm5
  406b96:	0f 29 6f 30          	movaps %xmm5,0x30(%rdi)
  406b9a:	66 0f 3a 0f e3 0f    	palignr $0xf,%xmm3,%xmm4
  406ba0:	0f 29 67 20          	movaps %xmm4,0x20(%rdi)
  406ba4:	66 0f 3a 0f da 0f    	palignr $0xf,%xmm2,%xmm3
  406baa:	0f 29 5f 10          	movaps %xmm3,0x10(%rdi)
  406bae:	66 0f 3a 0f d1 0f    	palignr $0xf,%xmm1,%xmm2
  406bb4:	0f 29 17             	movaps %xmm2,(%rdi)
  406bb7:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  406bbe:	0f 83 6c ff ff ff    	jae    406b30 &amp;lt;__intel_ssse3_rep_memcpy+0x1720&amp;gt;
  406bc4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406bc9:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406bd0:	48 01 d7             	add    %rdx,%rdi
  406bd3:	48 01 d6             	add    %rdx,%rsi
  406bd6:	4c 8d 1d 9b 24 00 00 	lea    0x249b(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  406bdd:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406be1:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406be5:	ff e2                	jmpq   *%rdx
  406be7:	0f 0b                	ud2    
  406be9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  406bf0:	0f 28 4e f1          	movaps -0xf(%rsi),%xmm1
  406bf4:	0f 28 56 e1          	movaps -0x1f(%rsi),%xmm2
  406bf8:	66 0f 3a 0f ca 0f    	palignr $0xf,%xmm2,%xmm1
  406bfe:	0f 29 4f f0          	movaps %xmm1,-0x10(%rdi)
  406c02:	0f 28 5e d1          	movaps -0x2f(%rsi),%xmm3
  406c06:	66 0f 3a 0f d3 0f    	palignr $0xf,%xmm3,%xmm2
  406c0c:	0f 29 57 e0          	movaps %xmm2,-0x20(%rdi)
  406c10:	0f 28 66 c1          	movaps -0x3f(%rsi),%xmm4
  406c14:	66 0f 3a 0f dc 0f    	palignr $0xf,%xmm4,%xmm3
  406c1a:	0f 29 5f d0          	movaps %xmm3,-0x30(%rdi)
  406c1e:	0f 28 6e b1          	movaps -0x4f(%rsi),%xmm5
  406c22:	66 0f 3a 0f e5 0f    	palignr $0xf,%xmm5,%xmm4
  406c28:	0f 29 67 c0          	movaps %xmm4,-0x40(%rdi)
  406c2c:	0f 28 76 a1          	movaps -0x5f(%rsi),%xmm6
  406c30:	66 0f 3a 0f ee 0f    	palignr $0xf,%xmm6,%xmm5
  406c36:	0f 29 6f b0          	movaps %xmm5,-0x50(%rdi)
  406c3a:	0f 28 7e 91          	movaps -0x6f(%rsi),%xmm7
  406c3e:	66 0f 3a 0f f7 0f    	palignr $0xf,%xmm7,%xmm6
  406c44:	0f 29 77 a0          	movaps %xmm6,-0x60(%rdi)
  406c48:	44 0f 28 46 81       	movaps -0x7f(%rsi),%xmm8
  406c4d:	66 41 0f 3a 0f f8 0f 	palignr $0xf,%xmm8,%xmm7
  406c54:	0f 29 7f 90          	movaps %xmm7,-0x70(%rdi)
  406c58:	44 0f 28 8e 71 ff ff 	movaps -0x8f(%rsi),%xmm9
  406c5f:	ff 
  406c60:	66 45 0f 3a 0f c1 0f 	palignr $0xf,%xmm9,%xmm8
  406c67:	44 0f 29 47 80       	movaps %xmm8,-0x80(%rdi)
  406c6c:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406c73:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406c77:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  406c7b:	0f 83 6f ff ff ff    	jae    406bf0 &amp;lt;__intel_ssse3_rep_memcpy+0x17e0&amp;gt;
  406c81:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406c86:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406c8d:	48 29 d7             	sub    %rdx,%rdi
  406c90:	48 29 d6             	sub    %rdx,%rsi
  406c93:	4c 8d 1d 9e 21 00 00 	lea    0x219e(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  406c9a:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406c9e:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406ca2:	ff e2                	jmpq   *%rdx
  406ca4:	0f 0b                	ud2    
  406ca6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  406cad:	00 00 00 
  406cb0:	f3 0f 6f 0e          	movdqu (%rsi),%xmm1
  406cb4:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406cb9:	66 0f 7f 0f          	movdqa %xmm1,(%rdi)
  406cbd:	48 83 ea 10          	sub    $0x10,%rdx
  406cc1:	48 83 c6 10          	add    $0x10,%rsi
  406cc5:	48 83 c7 10          	add    $0x10,%rdi
  406cc9:	8b 0d 35 3e 20 00    	mov    0x203e35(%rip),%ecx        # 60ab04 &amp;lt;__libirc_largest_cache_size_half&amp;gt;
  406ccf:	48 39 ca             	cmp    %rcx,%rdx
  406cd2:	77 03                	ja     406cd7 &amp;lt;__intel_ssse3_rep_memcpy+0x18c7&amp;gt;
  406cd4:	48 89 d1             	mov    %rdx,%rcx
  406cd7:	48 29 ca             	sub    %rcx,%rdx
  406cda:	48 81 fa 00 10 00 00 	cmp    $0x1000,%rdx
  406ce1:	0f 86 a6 00 00 00    	jbe    406d8d &amp;lt;__intel_ssse3_rep_memcpy+0x197d&amp;gt;
  406ce7:	49 89 c9             	mov    %rcx,%r9
  406cea:	49 c1 e1 03          	shl    $0x3,%r9
  406cee:	4c 39 ca             	cmp    %r9,%rdx
  406cf1:	76 06                	jbe    406cf9 &amp;lt;__intel_ssse3_rep_memcpy+0x18e9&amp;gt;
  406cf3:	48 01 ca             	add    %rcx,%rdx
  406cf6:	48 31 c9             	xor    %rcx,%rcx
  406cf9:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406d00:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406d07:	0f 18 8e 00 02 00 00 	prefetcht0 0x200(%rsi)
  406d0e:	0f 18 8e 00 03 00 00 	prefetcht0 0x300(%rsi)
  406d15:	f3 0f 6f 06          	movdqu (%rsi),%xmm0
  406d19:	f3 0f 6f 4e 10       	movdqu 0x10(%rsi),%xmm1
  406d1e:	f3 0f 6f 56 20       	movdqu 0x20(%rsi),%xmm2
  406d23:	f3 0f 6f 5e 30       	movdqu 0x30(%rsi),%xmm3
  406d28:	f3 0f 6f 66 40       	movdqu 0x40(%rsi),%xmm4
  406d2d:	f3 0f 6f 6e 50       	movdqu 0x50(%rsi),%xmm5
  406d32:	f3 0f 6f 76 60       	movdqu 0x60(%rsi),%xmm6
  406d37:	f3 0f 6f 7e 70       	movdqu 0x70(%rsi),%xmm7
  406d3c:	0f ae e8             	lfence 
  406d3f:	66 0f e7 07          	movntdq %xmm0,(%rdi)
  406d43:	66 0f e7 4f 10       	movntdq %xmm1,0x10(%rdi)
  406d48:	66 0f e7 57 20       	movntdq %xmm2,0x20(%rdi)
  406d4d:	66 0f e7 5f 30       	movntdq %xmm3,0x30(%rdi)
  406d52:	66 0f e7 67 40       	movntdq %xmm4,0x40(%rdi)
  406d57:	66 0f e7 6f 50       	movntdq %xmm5,0x50(%rdi)
  406d5c:	66 0f e7 77 60       	movntdq %xmm6,0x60(%rdi)
  406d61:	66 0f e7 7f 70       	movntdq %xmm7,0x70(%rdi)
  406d66:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  406d6d:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  406d74:	73 8a                	jae    406d00 &amp;lt;__intel_ssse3_rep_memcpy+0x18f0&amp;gt;
  406d76:	0f ae f8             	sfence 
  406d79:	48 81 f9 80 00 00 00 	cmp    $0x80,%rcx
  406d80:	0f 82 96 00 00 00    	jb     406e1c &amp;lt;__intel_ssse3_rep_memcpy+0x1a0c&amp;gt;
  406d86:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406d8d:	48 01 ca             	add    %rcx,%rdx
  406d90:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406d97:	0f 18 86 c0 01 00 00 	prefetchnta 0x1c0(%rsi)
  406d9e:	0f 18 86 80 02 00 00 	prefetchnta 0x280(%rsi)
  406da5:	0f 18 87 c0 01 00 00 	prefetchnta 0x1c0(%rdi)
  406dac:	0f 18 87 80 02 00 00 	prefetchnta 0x280(%rdi)
  406db3:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406dba:	f3 0f 6f 06          	movdqu (%rsi),%xmm0
  406dbe:	f3 0f 6f 4e 10       	movdqu 0x10(%rsi),%xmm1
  406dc3:	f3 0f 6f 56 20       	movdqu 0x20(%rsi),%xmm2
  406dc8:	f3 0f 6f 5e 30       	movdqu 0x30(%rsi),%xmm3
  406dcd:	f3 0f 6f 66 40       	movdqu 0x40(%rsi),%xmm4
  406dd2:	f3 0f 6f 6e 50       	movdqu 0x50(%rsi),%xmm5
  406dd7:	f3 0f 6f 76 60       	movdqu 0x60(%rsi),%xmm6
  406ddc:	f3 0f 6f 7e 70       	movdqu 0x70(%rsi),%xmm7
  406de1:	66 0f 7f 07          	movdqa %xmm0,(%rdi)
  406de5:	66 0f 7f 4f 10       	movdqa %xmm1,0x10(%rdi)
  406dea:	66 0f 7f 57 20       	movdqa %xmm2,0x20(%rdi)
  406def:	66 0f 7f 5f 30       	movdqa %xmm3,0x30(%rdi)
  406df4:	66 0f 7f 67 40       	movdqa %xmm4,0x40(%rdi)
  406df9:	66 0f 7f 6f 50       	movdqa %xmm5,0x50(%rdi)
  406dfe:	66 0f 7f 77 60       	movdqa %xmm6,0x60(%rdi)
  406e03:	66 0f 7f 7f 70       	movdqa %xmm7,0x70(%rdi)
  406e08:	48 8d b6 80 00 00 00 	lea    0x80(%rsi),%rsi
  406e0f:	48 8d bf 80 00 00 00 	lea    0x80(%rdi),%rdi
  406e16:	0f 83 7b ff ff ff    	jae    406d97 &amp;lt;__intel_ssse3_rep_memcpy+0x1987&amp;gt;
  406e1c:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406e23:	48 01 d6             	add    %rdx,%rsi
  406e26:	48 01 d7             	add    %rdx,%rdi
  406e29:	4c 8d 1d 48 22 00 00 	lea    0x2248(%rip),%r11        # 409078 &amp;lt;.L_2il0floatpacket.29+0x40c&amp;gt;
  406e30:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406e34:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406e38:	ff e2                	jmpq   *%rdx
  406e3a:	0f 0b                	ud2    
  406e3c:	0f 1f 40 00          	nopl   0x0(%rax)
  406e40:	48 01 d6             	add    %rdx,%rsi
  406e43:	48 01 d7             	add    %rdx,%rdi
  406e46:	f3 0f 6f 46 f0       	movdqu -0x10(%rsi),%xmm0
  406e4b:	4c 8d 47 f0          	lea    -0x10(%rdi),%r8
  406e4f:	49 89 f9             	mov    %rdi,%r9
  406e52:	48 83 e7 f0          	and    $0xfffffffffffffff0,%rdi
  406e56:	49 29 f9             	sub    %rdi,%r9
  406e59:	4c 29 ce             	sub    %r9,%rsi
  406e5c:	4c 29 ca             	sub    %r9,%rdx
  406e5f:	8b 0d 9f 3c 20 00    	mov    0x203c9f(%rip),%ecx        # 60ab04 &amp;lt;__libirc_largest_cache_size_half&amp;gt;
  406e65:	48 39 ca             	cmp    %rcx,%rdx
  406e68:	77 03                	ja     406e6d &amp;lt;__intel_ssse3_rep_memcpy+0x1a5d&amp;gt;
  406e6a:	48 89 d1             	mov    %rdx,%rcx
  406e6d:	48 29 ca             	sub    %rcx,%rdx
  406e70:	48 81 fa 00 10 00 00 	cmp    $0x1000,%rdx
  406e77:	0f 86 a4 00 00 00    	jbe    406f21 &amp;lt;__intel_ssse3_rep_memcpy+0x1b11&amp;gt;
  406e7d:	49 89 c9             	mov    %rcx,%r9
  406e80:	49 c1 e1 03          	shl    $0x3,%r9
  406e84:	4c 39 ca             	cmp    %r9,%rdx
  406e87:	76 06                	jbe    406e8f &amp;lt;__intel_ssse3_rep_memcpy+0x1a7f&amp;gt;
  406e89:	48 01 ca             	add    %rcx,%rdx
  406e8c:	48 31 c9             	xor    %rcx,%rcx
  406e8f:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406e96:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406e9d:	0f 18 8e 00 fe ff ff 	prefetcht0 -0x200(%rsi)
  406ea4:	0f 18 8e 00 fd ff ff 	prefetcht0 -0x300(%rsi)
  406eab:	f3 0f 6f 4e f0       	movdqu -0x10(%rsi),%xmm1
  406eb0:	f3 0f 6f 56 e0       	movdqu -0x20(%rsi),%xmm2
  406eb5:	f3 0f 6f 5e d0       	movdqu -0x30(%rsi),%xmm3
  406eba:	f3 0f 6f 66 c0       	movdqu -0x40(%rsi),%xmm4
  406ebf:	f3 0f 6f 6e b0       	movdqu -0x50(%rsi),%xmm5
  406ec4:	f3 0f 6f 76 a0       	movdqu -0x60(%rsi),%xmm6
  406ec9:	f3 0f 6f 7e 90       	movdqu -0x70(%rsi),%xmm7
  406ece:	f3 44 0f 6f 46 80    	movdqu -0x80(%rsi),%xmm8
  406ed4:	0f ae e8             	lfence 
  406ed7:	66 0f e7 4f f0       	movntdq %xmm1,-0x10(%rdi)
  406edc:	66 0f e7 57 e0       	movntdq %xmm2,-0x20(%rdi)
  406ee1:	66 0f e7 5f d0       	movntdq %xmm3,-0x30(%rdi)
  406ee6:	66 0f e7 67 c0       	movntdq %xmm4,-0x40(%rdi)
  406eeb:	66 0f e7 6f b0       	movntdq %xmm5,-0x50(%rdi)
  406ef0:	66 0f e7 77 a0       	movntdq %xmm6,-0x60(%rdi)
  406ef5:	66 0f e7 7f 90       	movntdq %xmm7,-0x70(%rdi)
  406efa:	66 44 0f e7 47 80    	movntdq %xmm8,-0x80(%rdi)
  406f00:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  406f04:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406f08:	73 8c                	jae    406e96 &amp;lt;__intel_ssse3_rep_memcpy+0x1a86&amp;gt;
  406f0a:	0f ae f8             	sfence 
  406f0d:	48 81 f9 80 00 00 00 	cmp    $0x80,%rcx
  406f14:	0f 82 90 00 00 00    	jb     406faa &amp;lt;__intel_ssse3_rep_memcpy+0x1b9a&amp;gt;
  406f1a:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406f21:	48 01 ca             	add    %rcx,%rdx
  406f24:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406f2b:	0f 18 86 40 fe ff ff 	prefetchnta -0x1c0(%rsi)
  406f32:	0f 18 86 80 fd ff ff 	prefetchnta -0x280(%rsi)
  406f39:	0f 18 87 40 fe ff ff 	prefetchnta -0x1c0(%rdi)
  406f40:	0f 18 87 80 fd ff ff 	prefetchnta -0x280(%rdi)
  406f47:	48 81 ea 80 00 00 00 	sub    $0x80,%rdx
  406f4e:	f3 0f 6f 4e f0       	movdqu -0x10(%rsi),%xmm1
  406f53:	f3 0f 6f 56 e0       	movdqu -0x20(%rsi),%xmm2
  406f58:	f3 0f 6f 5e d0       	movdqu -0x30(%rsi),%xmm3
  406f5d:	f3 0f 6f 66 c0       	movdqu -0x40(%rsi),%xmm4
  406f62:	f3 0f 6f 6e b0       	movdqu -0x50(%rsi),%xmm5
  406f67:	f3 0f 6f 76 a0       	movdqu -0x60(%rsi),%xmm6
  406f6c:	f3 0f 6f 7e 90       	movdqu -0x70(%rsi),%xmm7
  406f71:	f3 44 0f 6f 46 80    	movdqu -0x80(%rsi),%xmm8
  406f77:	66 0f 7f 4f f0       	movdqa %xmm1,-0x10(%rdi)
  406f7c:	66 0f 7f 57 e0       	movdqa %xmm2,-0x20(%rdi)
  406f81:	66 0f 7f 5f d0       	movdqa %xmm3,-0x30(%rdi)
  406f86:	66 0f 7f 67 c0       	movdqa %xmm4,-0x40(%rdi)
  406f8b:	66 0f 7f 6f b0       	movdqa %xmm5,-0x50(%rdi)
  406f90:	66 0f 7f 77 a0       	movdqa %xmm6,-0x60(%rdi)
  406f95:	66 0f 7f 7f 90       	movdqa %xmm7,-0x70(%rdi)
  406f9a:	66 44 0f 7f 47 80    	movdqa %xmm8,-0x80(%rdi)
  406fa0:	48 8d 76 80          	lea    -0x80(%rsi),%rsi
  406fa4:	48 8d 7f 80          	lea    -0x80(%rdi),%rdi
  406fa8:	73 81                	jae    406f2b &amp;lt;__intel_ssse3_rep_memcpy+0x1b1b&amp;gt;
  406faa:	f3 41 0f 7f 00       	movdqu %xmm0,(%r8)
  406faf:	48 81 c2 80 00 00 00 	add    $0x80,%rdx
  406fb6:	48 29 d6             	sub    %rdx,%rsi
  406fb9:	48 29 d7             	sub    %rdx,%rdi
  406fbc:	4c 8d 1d 75 1e 00 00 	lea    0x1e75(%rip),%r11        # 408e38 &amp;lt;.L_2il0floatpacket.29+0x1cc&amp;gt;
  406fc3:	49 63 14 93          	movslq (%r11,%rdx,4),%rdx
  406fc7:	49 8d 14 13          	lea    (%r11,%rdx,1),%rdx
  406fcb:	ff e2                	jmpq   *%rdx
  406fcd:	0f 0b                	ud2    
  406fcf:	90                   	nop
  406fd0:	f2 0f f0 46 80       	lddqu  -0x80(%rsi),%xmm0
  406fd5:	f3 0f 7f 47 80       	movdqu %xmm0,-0x80(%rdi)
  406fda:	f2 0f f0 46 90       	lddqu  -0x70(%rsi),%xmm0
  406fdf:	f3 0f 7f 47 90       	movdqu %xmm0,-0x70(%rdi)
  406fe4:	f2 0f f0 46 a0       	lddqu  -0x60(%rsi),%xmm0
  406fe9:	f3 0f 7f 47 a0       	movdqu %xmm0,-0x60(%rdi)
  406fee:	f2 0f f0 46 b0       	lddqu  -0x50(%rsi),%xmm0
  406ff3:	f3 0f 7f 47 b0       	movdqu %xmm0,-0x50(%rdi)
  406ff8:	f2 0f f0 46 c0       	lddqu  -0x40(%rsi),%xmm0
  406ffd:	f3 0f 7f 47 c0       	movdqu %xmm0,-0x40(%rdi)
  407002:	f2 0f f0 46 d0       	lddqu  -0x30(%rsi),%xmm0
  407007:	f3 0f 7f 47 d0       	movdqu %xmm0,-0x30(%rdi)
  40700c:	f2 0f f0 46 e0       	lddqu  -0x20(%rsi),%xmm0
  407011:	f3 0f 7f 47 e0       	movdqu %xmm0,-0x20(%rdi)
  407016:	f2 0f f0 46 f0       	lddqu  -0x10(%rsi),%xmm0
  40701b:	f3 0f 7f 47 f0       	movdqu %xmm0,-0x10(%rdi)
  407020:	c3                   	retq   
  407021:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407028:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40702f:	00 
  407030:	f2 0f f0 86 71 ff ff 	lddqu  -0x8f(%rsi),%xmm0
  407037:	ff 
  407038:	f3 0f 7f 87 71 ff ff 	movdqu %xmm0,-0x8f(%rdi)
  40703f:	ff 
  407040:	f2 0f f0 46 81       	lddqu  -0x7f(%rsi),%xmm0
  407045:	f3 0f 7f 47 81       	movdqu %xmm0,-0x7f(%rdi)
  40704a:	f2 0f f0 46 91       	lddqu  -0x6f(%rsi),%xmm0
  40704f:	f3 0f 7f 47 91       	movdqu %xmm0,-0x6f(%rdi)
  407054:	f2 0f f0 46 a1       	lddqu  -0x5f(%rsi),%xmm0
  407059:	f3 0f 7f 47 a1       	movdqu %xmm0,-0x5f(%rdi)
  40705e:	f2 0f f0 46 b1       	lddqu  -0x4f(%rsi),%xmm0
  407063:	f3 0f 7f 47 b1       	movdqu %xmm0,-0x4f(%rdi)
  407068:	f2 0f f0 46 c1       	lddqu  -0x3f(%rsi),%xmm0
  40706d:	f3 0f 7f 47 c1       	movdqu %xmm0,-0x3f(%rdi)
  407072:	f2 0f f0 46 d1       	lddqu  -0x2f(%rsi),%xmm0
  407077:	f3 0f 7f 47 d1       	movdqu %xmm0,-0x2f(%rdi)
  40707c:	f2 0f f0 46 e1       	lddqu  -0x1f(%rsi),%xmm0
  407081:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407086:	f3 0f 7f 47 e1       	movdqu %xmm0,-0x1f(%rdi)
  40708b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407090:	c3                   	retq   
  407091:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407098:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40709f:	00 
  4070a0:	48 8b 56 f1          	mov    -0xf(%rsi),%rdx
  4070a4:	48 8b 4e f8          	mov    -0x8(%rsi),%rcx
  4070a8:	48 89 57 f1          	mov    %rdx,-0xf(%rdi)
  4070ac:	48 89 4f f8          	mov    %rcx,-0x8(%rdi)
  4070b0:	c3                   	retq   
  4070b1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4070b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4070bf:	00 
  4070c0:	f2 0f f0 86 72 ff ff 	lddqu  -0x8e(%rsi),%xmm0
  4070c7:	ff 
  4070c8:	f3 0f 7f 87 72 ff ff 	movdqu %xmm0,-0x8e(%rdi)
  4070cf:	ff 
  4070d0:	f2 0f f0 46 82       	lddqu  -0x7e(%rsi),%xmm0
  4070d5:	f3 0f 7f 47 82       	movdqu %xmm0,-0x7e(%rdi)
  4070da:	f2 0f f0 46 92       	lddqu  -0x6e(%rsi),%xmm0
  4070df:	f3 0f 7f 47 92       	movdqu %xmm0,-0x6e(%rdi)
  4070e4:	f2 0f f0 46 a2       	lddqu  -0x5e(%rsi),%xmm0
  4070e9:	f3 0f 7f 47 a2       	movdqu %xmm0,-0x5e(%rdi)
  4070ee:	f2 0f f0 46 b2       	lddqu  -0x4e(%rsi),%xmm0
  4070f3:	f3 0f 7f 47 b2       	movdqu %xmm0,-0x4e(%rdi)
  4070f8:	f2 0f f0 46 c2       	lddqu  -0x3e(%rsi),%xmm0
  4070fd:	f3 0f 7f 47 c2       	movdqu %xmm0,-0x3e(%rdi)
  407102:	f2 0f f0 46 d2       	lddqu  -0x2e(%rsi),%xmm0
  407107:	f3 0f 7f 47 d2       	movdqu %xmm0,-0x2e(%rdi)
  40710c:	f2 0f f0 46 e2       	lddqu  -0x1e(%rsi),%xmm0
  407111:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407116:	f3 0f 7f 47 e2       	movdqu %xmm0,-0x1e(%rdi)
  40711b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407120:	c3                   	retq   
  407121:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407128:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40712f:	00 
  407130:	48 8b 56 f2          	mov    -0xe(%rsi),%rdx
  407134:	48 8b 4e f8          	mov    -0x8(%rsi),%rcx
  407138:	48 89 57 f2          	mov    %rdx,-0xe(%rdi)
  40713c:	48 89 4f f8          	mov    %rcx,-0x8(%rdi)
  407140:	c3                   	retq   
  407141:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407148:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40714f:	00 
  407150:	f2 0f f0 86 73 ff ff 	lddqu  -0x8d(%rsi),%xmm0
  407157:	ff 
  407158:	f3 0f 7f 87 73 ff ff 	movdqu %xmm0,-0x8d(%rdi)
  40715f:	ff 
  407160:	f2 0f f0 46 83       	lddqu  -0x7d(%rsi),%xmm0
  407165:	f3 0f 7f 47 83       	movdqu %xmm0,-0x7d(%rdi)
  40716a:	f2 0f f0 46 93       	lddqu  -0x6d(%rsi),%xmm0
  40716f:	f3 0f 7f 47 93       	movdqu %xmm0,-0x6d(%rdi)
  407174:	f2 0f f0 46 a3       	lddqu  -0x5d(%rsi),%xmm0
  407179:	f3 0f 7f 47 a3       	movdqu %xmm0,-0x5d(%rdi)
  40717e:	f2 0f f0 46 b3       	lddqu  -0x4d(%rsi),%xmm0
  407183:	f3 0f 7f 47 b3       	movdqu %xmm0,-0x4d(%rdi)
  407188:	f2 0f f0 46 c3       	lddqu  -0x3d(%rsi),%xmm0
  40718d:	f3 0f 7f 47 c3       	movdqu %xmm0,-0x3d(%rdi)
  407192:	f2 0f f0 46 d3       	lddqu  -0x2d(%rsi),%xmm0
  407197:	f3 0f 7f 47 d3       	movdqu %xmm0,-0x2d(%rdi)
  40719c:	f2 0f f0 46 e3       	lddqu  -0x1d(%rsi),%xmm0
  4071a1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4071a6:	f3 0f 7f 47 e3       	movdqu %xmm0,-0x1d(%rdi)
  4071ab:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4071b0:	c3                   	retq   
  4071b1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4071b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4071bf:	00 
  4071c0:	48 8b 56 f3          	mov    -0xd(%rsi),%rdx
  4071c4:	48 8b 4e f8          	mov    -0x8(%rsi),%rcx
  4071c8:	48 89 57 f3          	mov    %rdx,-0xd(%rdi)
  4071cc:	48 89 4f f8          	mov    %rcx,-0x8(%rdi)
  4071d0:	c3                   	retq   
  4071d1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4071d8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4071df:	00 
  4071e0:	f2 0f f0 86 74 ff ff 	lddqu  -0x8c(%rsi),%xmm0
  4071e7:	ff 
  4071e8:	f3 0f 7f 87 74 ff ff 	movdqu %xmm0,-0x8c(%rdi)
  4071ef:	ff 
  4071f0:	f2 0f f0 46 84       	lddqu  -0x7c(%rsi),%xmm0
  4071f5:	f3 0f 7f 47 84       	movdqu %xmm0,-0x7c(%rdi)
  4071fa:	f2 0f f0 46 94       	lddqu  -0x6c(%rsi),%xmm0
  4071ff:	f3 0f 7f 47 94       	movdqu %xmm0,-0x6c(%rdi)
  407204:	f2 0f f0 46 a4       	lddqu  -0x5c(%rsi),%xmm0
  407209:	f3 0f 7f 47 a4       	movdqu %xmm0,-0x5c(%rdi)
  40720e:	f2 0f f0 46 b4       	lddqu  -0x4c(%rsi),%xmm0
  407213:	f3 0f 7f 47 b4       	movdqu %xmm0,-0x4c(%rdi)
  407218:	f2 0f f0 46 c4       	lddqu  -0x3c(%rsi),%xmm0
  40721d:	f3 0f 7f 47 c4       	movdqu %xmm0,-0x3c(%rdi)
  407222:	f2 0f f0 46 d4       	lddqu  -0x2c(%rsi),%xmm0
  407227:	f3 0f 7f 47 d4       	movdqu %xmm0,-0x2c(%rdi)
  40722c:	f2 0f f0 46 e4       	lddqu  -0x1c(%rsi),%xmm0
  407231:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407236:	f3 0f 7f 47 e4       	movdqu %xmm0,-0x1c(%rdi)
  40723b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407240:	c3                   	retq   
  407241:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407248:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40724f:	00 
  407250:	48 8b 56 f4          	mov    -0xc(%rsi),%rdx
  407254:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  407257:	48 89 57 f4          	mov    %rdx,-0xc(%rdi)
  40725b:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  40725e:	c3                   	retq   
  40725f:	90                   	nop
  407260:	f2 0f f0 86 75 ff ff 	lddqu  -0x8b(%rsi),%xmm0
  407267:	ff 
  407268:	f3 0f 7f 87 75 ff ff 	movdqu %xmm0,-0x8b(%rdi)
  40726f:	ff 
  407270:	f2 0f f0 46 85       	lddqu  -0x7b(%rsi),%xmm0
  407275:	f3 0f 7f 47 85       	movdqu %xmm0,-0x7b(%rdi)
  40727a:	f2 0f f0 46 95       	lddqu  -0x6b(%rsi),%xmm0
  40727f:	f3 0f 7f 47 95       	movdqu %xmm0,-0x6b(%rdi)
  407284:	f2 0f f0 46 a5       	lddqu  -0x5b(%rsi),%xmm0
  407289:	f3 0f 7f 47 a5       	movdqu %xmm0,-0x5b(%rdi)
  40728e:	f2 0f f0 46 b5       	lddqu  -0x4b(%rsi),%xmm0
  407293:	f3 0f 7f 47 b5       	movdqu %xmm0,-0x4b(%rdi)
  407298:	f2 0f f0 46 c5       	lddqu  -0x3b(%rsi),%xmm0
  40729d:	f3 0f 7f 47 c5       	movdqu %xmm0,-0x3b(%rdi)
  4072a2:	f2 0f f0 46 d5       	lddqu  -0x2b(%rsi),%xmm0
  4072a7:	f3 0f 7f 47 d5       	movdqu %xmm0,-0x2b(%rdi)
  4072ac:	f2 0f f0 46 e5       	lddqu  -0x1b(%rsi),%xmm0
  4072b1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4072b6:	f3 0f 7f 47 e5       	movdqu %xmm0,-0x1b(%rdi)
  4072bb:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4072c0:	c3                   	retq   
  4072c1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4072c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4072cf:	00 
  4072d0:	48 8b 56 f5          	mov    -0xb(%rsi),%rdx
  4072d4:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  4072d7:	48 89 57 f5          	mov    %rdx,-0xb(%rdi)
  4072db:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  4072de:	c3                   	retq   
  4072df:	90                   	nop
  4072e0:	f2 0f f0 86 76 ff ff 	lddqu  -0x8a(%rsi),%xmm0
  4072e7:	ff 
  4072e8:	f3 0f 7f 87 76 ff ff 	movdqu %xmm0,-0x8a(%rdi)
  4072ef:	ff 
  4072f0:	f2 0f f0 46 86       	lddqu  -0x7a(%rsi),%xmm0
  4072f5:	f3 0f 7f 47 86       	movdqu %xmm0,-0x7a(%rdi)
  4072fa:	f2 0f f0 46 96       	lddqu  -0x6a(%rsi),%xmm0
  4072ff:	f3 0f 7f 47 96       	movdqu %xmm0,-0x6a(%rdi)
  407304:	f2 0f f0 46 a6       	lddqu  -0x5a(%rsi),%xmm0
  407309:	f3 0f 7f 47 a6       	movdqu %xmm0,-0x5a(%rdi)
  40730e:	f2 0f f0 46 b6       	lddqu  -0x4a(%rsi),%xmm0
  407313:	f3 0f 7f 47 b6       	movdqu %xmm0,-0x4a(%rdi)
  407318:	f2 0f f0 46 c6       	lddqu  -0x3a(%rsi),%xmm0
  40731d:	f3 0f 7f 47 c6       	movdqu %xmm0,-0x3a(%rdi)
  407322:	f2 0f f0 46 d6       	lddqu  -0x2a(%rsi),%xmm0
  407327:	f3 0f 7f 47 d6       	movdqu %xmm0,-0x2a(%rdi)
  40732c:	f2 0f f0 46 e6       	lddqu  -0x1a(%rsi),%xmm0
  407331:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407336:	f3 0f 7f 47 e6       	movdqu %xmm0,-0x1a(%rdi)
  40733b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407340:	c3                   	retq   
  407341:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407348:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40734f:	00 
  407350:	48 8b 56 f6          	mov    -0xa(%rsi),%rdx
  407354:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  407357:	48 89 57 f6          	mov    %rdx,-0xa(%rdi)
  40735b:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  40735e:	c3                   	retq   
  40735f:	90                   	nop
  407360:	f2 0f f0 86 77 ff ff 	lddqu  -0x89(%rsi),%xmm0
  407367:	ff 
  407368:	f3 0f 7f 87 77 ff ff 	movdqu %xmm0,-0x89(%rdi)
  40736f:	ff 
  407370:	f2 0f f0 46 87       	lddqu  -0x79(%rsi),%xmm0
  407375:	f3 0f 7f 47 87       	movdqu %xmm0,-0x79(%rdi)
  40737a:	f2 0f f0 46 97       	lddqu  -0x69(%rsi),%xmm0
  40737f:	f3 0f 7f 47 97       	movdqu %xmm0,-0x69(%rdi)
  407384:	f2 0f f0 46 a7       	lddqu  -0x59(%rsi),%xmm0
  407389:	f3 0f 7f 47 a7       	movdqu %xmm0,-0x59(%rdi)
  40738e:	f2 0f f0 46 b7       	lddqu  -0x49(%rsi),%xmm0
  407393:	f3 0f 7f 47 b7       	movdqu %xmm0,-0x49(%rdi)
  407398:	f2 0f f0 46 c7       	lddqu  -0x39(%rsi),%xmm0
  40739d:	f3 0f 7f 47 c7       	movdqu %xmm0,-0x39(%rdi)
  4073a2:	f2 0f f0 46 d7       	lddqu  -0x29(%rsi),%xmm0
  4073a7:	f3 0f 7f 47 d7       	movdqu %xmm0,-0x29(%rdi)
  4073ac:	f2 0f f0 46 e7       	lddqu  -0x19(%rsi),%xmm0
  4073b1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4073b6:	f3 0f 7f 47 e7       	movdqu %xmm0,-0x19(%rdi)
  4073bb:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4073c0:	c3                   	retq   
  4073c1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4073c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4073cf:	00 
  4073d0:	48 8b 56 f7          	mov    -0x9(%rsi),%rdx
  4073d4:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  4073d7:	48 89 57 f7          	mov    %rdx,-0x9(%rdi)
  4073db:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  4073de:	c3                   	retq   
  4073df:	90                   	nop
  4073e0:	f2 0f f0 86 78 ff ff 	lddqu  -0x88(%rsi),%xmm0
  4073e7:	ff 
  4073e8:	f3 0f 7f 87 78 ff ff 	movdqu %xmm0,-0x88(%rdi)
  4073ef:	ff 
  4073f0:	f2 0f f0 46 88       	lddqu  -0x78(%rsi),%xmm0
  4073f5:	f3 0f 7f 47 88       	movdqu %xmm0,-0x78(%rdi)
  4073fa:	f2 0f f0 46 98       	lddqu  -0x68(%rsi),%xmm0
  4073ff:	f3 0f 7f 47 98       	movdqu %xmm0,-0x68(%rdi)
  407404:	f2 0f f0 46 a8       	lddqu  -0x58(%rsi),%xmm0
  407409:	f3 0f 7f 47 a8       	movdqu %xmm0,-0x58(%rdi)
  40740e:	f2 0f f0 46 b8       	lddqu  -0x48(%rsi),%xmm0
  407413:	f3 0f 7f 47 b8       	movdqu %xmm0,-0x48(%rdi)
  407418:	f2 0f f0 46 c8       	lddqu  -0x38(%rsi),%xmm0
  40741d:	f3 0f 7f 47 c8       	movdqu %xmm0,-0x38(%rdi)
  407422:	f2 0f f0 46 d8       	lddqu  -0x28(%rsi),%xmm0
  407427:	f3 0f 7f 47 d8       	movdqu %xmm0,-0x28(%rdi)
  40742c:	f2 0f f0 46 e8       	lddqu  -0x18(%rsi),%xmm0
  407431:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407436:	f3 0f 7f 47 e8       	movdqu %xmm0,-0x18(%rdi)
  40743b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407440:	c3                   	retq   
  407441:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407448:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40744f:	00 
  407450:	48 8b 56 f8          	mov    -0x8(%rsi),%rdx
  407454:	48 89 57 f8          	mov    %rdx,-0x8(%rdi)
  407458:	c3                   	retq   
  407459:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407460:	f2 0f f0 86 79 ff ff 	lddqu  -0x87(%rsi),%xmm0
  407467:	ff 
  407468:	f3 0f 7f 87 79 ff ff 	movdqu %xmm0,-0x87(%rdi)
  40746f:	ff 
  407470:	f2 0f f0 46 89       	lddqu  -0x77(%rsi),%xmm0
  407475:	f3 0f 7f 47 89       	movdqu %xmm0,-0x77(%rdi)
  40747a:	f2 0f f0 46 99       	lddqu  -0x67(%rsi),%xmm0
  40747f:	f3 0f 7f 47 99       	movdqu %xmm0,-0x67(%rdi)
  407484:	f2 0f f0 46 a9       	lddqu  -0x57(%rsi),%xmm0
  407489:	f3 0f 7f 47 a9       	movdqu %xmm0,-0x57(%rdi)
  40748e:	f2 0f f0 46 b9       	lddqu  -0x47(%rsi),%xmm0
  407493:	f3 0f 7f 47 b9       	movdqu %xmm0,-0x47(%rdi)
  407498:	f2 0f f0 46 c9       	lddqu  -0x37(%rsi),%xmm0
  40749d:	f3 0f 7f 47 c9       	movdqu %xmm0,-0x37(%rdi)
  4074a2:	f2 0f f0 46 d9       	lddqu  -0x27(%rsi),%xmm0
  4074a7:	f3 0f 7f 47 d9       	movdqu %xmm0,-0x27(%rdi)
  4074ac:	f2 0f f0 46 e9       	lddqu  -0x17(%rsi),%xmm0
  4074b1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4074b6:	f3 0f 7f 47 e9       	movdqu %xmm0,-0x17(%rdi)
  4074bb:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4074c0:	c3                   	retq   
  4074c1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4074c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4074cf:	00 
  4074d0:	8b 56 f9             	mov    -0x7(%rsi),%edx
  4074d3:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  4074d6:	89 57 f9             	mov    %edx,-0x7(%rdi)
  4074d9:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  4074dc:	c3                   	retq   
  4074dd:	0f 1f 00             	nopl   (%rax)
  4074e0:	f2 0f f0 86 7a ff ff 	lddqu  -0x86(%rsi),%xmm0
  4074e7:	ff 
  4074e8:	f3 0f 7f 87 7a ff ff 	movdqu %xmm0,-0x86(%rdi)
  4074ef:	ff 
  4074f0:	f2 0f f0 46 8a       	lddqu  -0x76(%rsi),%xmm0
  4074f5:	f3 0f 7f 47 8a       	movdqu %xmm0,-0x76(%rdi)
  4074fa:	f2 0f f0 46 9a       	lddqu  -0x66(%rsi),%xmm0
  4074ff:	f3 0f 7f 47 9a       	movdqu %xmm0,-0x66(%rdi)
  407504:	f2 0f f0 46 aa       	lddqu  -0x56(%rsi),%xmm0
  407509:	f3 0f 7f 47 aa       	movdqu %xmm0,-0x56(%rdi)
  40750e:	f2 0f f0 46 ba       	lddqu  -0x46(%rsi),%xmm0
  407513:	f3 0f 7f 47 ba       	movdqu %xmm0,-0x46(%rdi)
  407518:	f2 0f f0 46 ca       	lddqu  -0x36(%rsi),%xmm0
  40751d:	f3 0f 7f 47 ca       	movdqu %xmm0,-0x36(%rdi)
  407522:	f2 0f f0 46 da       	lddqu  -0x26(%rsi),%xmm0
  407527:	f3 0f 7f 47 da       	movdqu %xmm0,-0x26(%rdi)
  40752c:	f2 0f f0 46 ea       	lddqu  -0x16(%rsi),%xmm0
  407531:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407536:	f3 0f 7f 47 ea       	movdqu %xmm0,-0x16(%rdi)
  40753b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407540:	c3                   	retq   
  407541:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407548:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40754f:	00 
  407550:	8b 56 fa             	mov    -0x6(%rsi),%edx
  407553:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  407556:	89 57 fa             	mov    %edx,-0x6(%rdi)
  407559:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  40755c:	c3                   	retq   
  40755d:	0f 1f 00             	nopl   (%rax)
  407560:	f2 0f f0 86 7b ff ff 	lddqu  -0x85(%rsi),%xmm0
  407567:	ff 
  407568:	f3 0f 7f 87 7b ff ff 	movdqu %xmm0,-0x85(%rdi)
  40756f:	ff 
  407570:	f2 0f f0 46 8b       	lddqu  -0x75(%rsi),%xmm0
  407575:	f3 0f 7f 47 8b       	movdqu %xmm0,-0x75(%rdi)
  40757a:	f2 0f f0 46 9b       	lddqu  -0x65(%rsi),%xmm0
  40757f:	f3 0f 7f 47 9b       	movdqu %xmm0,-0x65(%rdi)
  407584:	f2 0f f0 46 ab       	lddqu  -0x55(%rsi),%xmm0
  407589:	f3 0f 7f 47 ab       	movdqu %xmm0,-0x55(%rdi)
  40758e:	f2 0f f0 46 bb       	lddqu  -0x45(%rsi),%xmm0
  407593:	f3 0f 7f 47 bb       	movdqu %xmm0,-0x45(%rdi)
  407598:	f2 0f f0 46 cb       	lddqu  -0x35(%rsi),%xmm0
  40759d:	f3 0f 7f 47 cb       	movdqu %xmm0,-0x35(%rdi)
  4075a2:	f2 0f f0 46 db       	lddqu  -0x25(%rsi),%xmm0
  4075a7:	f3 0f 7f 47 db       	movdqu %xmm0,-0x25(%rdi)
  4075ac:	f2 0f f0 46 eb       	lddqu  -0x15(%rsi),%xmm0
  4075b1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4075b6:	f3 0f 7f 47 eb       	movdqu %xmm0,-0x15(%rdi)
  4075bb:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4075c0:	c3                   	retq   
  4075c1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4075c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4075cf:	00 
  4075d0:	8b 56 fb             	mov    -0x5(%rsi),%edx
  4075d3:	8b 4e fc             	mov    -0x4(%rsi),%ecx
  4075d6:	89 57 fb             	mov    %edx,-0x5(%rdi)
  4075d9:	89 4f fc             	mov    %ecx,-0x4(%rdi)
  4075dc:	c3                   	retq   
  4075dd:	0f 1f 00             	nopl   (%rax)
  4075e0:	f2 0f f0 86 7c ff ff 	lddqu  -0x84(%rsi),%xmm0
  4075e7:	ff 
  4075e8:	f3 0f 7f 87 7c ff ff 	movdqu %xmm0,-0x84(%rdi)
  4075ef:	ff 
  4075f0:	f2 0f f0 46 8c       	lddqu  -0x74(%rsi),%xmm0
  4075f5:	f3 0f 7f 47 8c       	movdqu %xmm0,-0x74(%rdi)
  4075fa:	f2 0f f0 46 9c       	lddqu  -0x64(%rsi),%xmm0
  4075ff:	f3 0f 7f 47 9c       	movdqu %xmm0,-0x64(%rdi)
  407604:	f2 0f f0 46 ac       	lddqu  -0x54(%rsi),%xmm0
  407609:	f3 0f 7f 47 ac       	movdqu %xmm0,-0x54(%rdi)
  40760e:	f2 0f f0 46 bc       	lddqu  -0x44(%rsi),%xmm0
  407613:	f3 0f 7f 47 bc       	movdqu %xmm0,-0x44(%rdi)
  407618:	f2 0f f0 46 cc       	lddqu  -0x34(%rsi),%xmm0
  40761d:	f3 0f 7f 47 cc       	movdqu %xmm0,-0x34(%rdi)
  407622:	f2 0f f0 46 dc       	lddqu  -0x24(%rsi),%xmm0
  407627:	f3 0f 7f 47 dc       	movdqu %xmm0,-0x24(%rdi)
  40762c:	f2 0f f0 46 ec       	lddqu  -0x14(%rsi),%xmm0
  407631:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407636:	f3 0f 7f 47 ec       	movdqu %xmm0,-0x14(%rdi)
  40763b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407640:	c3                   	retq   
  407641:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407648:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40764f:	00 
  407650:	8b 56 fc             	mov    -0x4(%rsi),%edx
  407653:	89 57 fc             	mov    %edx,-0x4(%rdi)
  407656:	c3                   	retq   
  407657:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  40765e:	00 00 
  407660:	f2 0f f0 86 7d ff ff 	lddqu  -0x83(%rsi),%xmm0
  407667:	ff 
  407668:	f3 0f 7f 87 7d ff ff 	movdqu %xmm0,-0x83(%rdi)
  40766f:	ff 
  407670:	f2 0f f0 46 8d       	lddqu  -0x73(%rsi),%xmm0
  407675:	f3 0f 7f 47 8d       	movdqu %xmm0,-0x73(%rdi)
  40767a:	f2 0f f0 46 9d       	lddqu  -0x63(%rsi),%xmm0
  40767f:	f3 0f 7f 47 9d       	movdqu %xmm0,-0x63(%rdi)
  407684:	f2 0f f0 46 ad       	lddqu  -0x53(%rsi),%xmm0
  407689:	f3 0f 7f 47 ad       	movdqu %xmm0,-0x53(%rdi)
  40768e:	f2 0f f0 46 bd       	lddqu  -0x43(%rsi),%xmm0
  407693:	f3 0f 7f 47 bd       	movdqu %xmm0,-0x43(%rdi)
  407698:	f2 0f f0 46 cd       	lddqu  -0x33(%rsi),%xmm0
  40769d:	f3 0f 7f 47 cd       	movdqu %xmm0,-0x33(%rdi)
  4076a2:	f2 0f f0 46 dd       	lddqu  -0x23(%rsi),%xmm0
  4076a7:	f3 0f 7f 47 dd       	movdqu %xmm0,-0x23(%rdi)
  4076ac:	f2 0f f0 46 ed       	lddqu  -0x13(%rsi),%xmm0
  4076b1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4076b6:	f3 0f 7f 47 ed       	movdqu %xmm0,-0x13(%rdi)
  4076bb:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4076c0:	c3                   	retq   
  4076c1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4076c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4076cf:	00 
  4076d0:	66 8b 56 fd          	mov    -0x3(%rsi),%dx
  4076d4:	66 8b 4e fe          	mov    -0x2(%rsi),%cx
  4076d8:	66 89 57 fd          	mov    %dx,-0x3(%rdi)
  4076dc:	66 89 4f fe          	mov    %cx,-0x2(%rdi)
  4076e0:	c3                   	retq   
  4076e1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4076e8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4076ef:	00 
  4076f0:	f2 0f f0 86 7e ff ff 	lddqu  -0x82(%rsi),%xmm0
  4076f7:	ff 
  4076f8:	f3 0f 7f 87 7e ff ff 	movdqu %xmm0,-0x82(%rdi)
  4076ff:	ff 
  407700:	f2 0f f0 46 8e       	lddqu  -0x72(%rsi),%xmm0
  407705:	f3 0f 7f 47 8e       	movdqu %xmm0,-0x72(%rdi)
  40770a:	f2 0f f0 46 9e       	lddqu  -0x62(%rsi),%xmm0
  40770f:	f3 0f 7f 47 9e       	movdqu %xmm0,-0x62(%rdi)
  407714:	f2 0f f0 46 ae       	lddqu  -0x52(%rsi),%xmm0
  407719:	f3 0f 7f 47 ae       	movdqu %xmm0,-0x52(%rdi)
  40771e:	f2 0f f0 46 be       	lddqu  -0x42(%rsi),%xmm0
  407723:	f3 0f 7f 47 be       	movdqu %xmm0,-0x42(%rdi)
  407728:	f2 0f f0 46 ce       	lddqu  -0x32(%rsi),%xmm0
  40772d:	f3 0f 7f 47 ce       	movdqu %xmm0,-0x32(%rdi)
  407732:	f2 0f f0 46 de       	lddqu  -0x22(%rsi),%xmm0
  407737:	f3 0f 7f 47 de       	movdqu %xmm0,-0x22(%rdi)
  40773c:	f2 0f f0 46 ee       	lddqu  -0x12(%rsi),%xmm0
  407741:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  407746:	f3 0f 7f 47 ee       	movdqu %xmm0,-0x12(%rdi)
  40774b:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  407750:	c3                   	retq   
  407751:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407758:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  40775f:	00 
  407760:	0f b7 56 fe          	movzwl -0x2(%rsi),%edx
  407764:	66 89 57 fe          	mov    %dx,-0x2(%rdi)
  407768:	c3                   	retq   
  407769:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407770:	f2 0f f0 86 7f ff ff 	lddqu  -0x81(%rsi),%xmm0
  407777:	ff 
  407778:	f3 0f 7f 87 7f ff ff 	movdqu %xmm0,-0x81(%rdi)
  40777f:	ff 
  407780:	f2 0f f0 46 8f       	lddqu  -0x71(%rsi),%xmm0
  407785:	f3 0f 7f 47 8f       	movdqu %xmm0,-0x71(%rdi)
  40778a:	f2 0f f0 46 9f       	lddqu  -0x61(%rsi),%xmm0
  40778f:	f3 0f 7f 47 9f       	movdqu %xmm0,-0x61(%rdi)
  407794:	f2 0f f0 46 af       	lddqu  -0x51(%rsi),%xmm0
  407799:	f3 0f 7f 47 af       	movdqu %xmm0,-0x51(%rdi)
  40779e:	f2 0f f0 46 bf       	lddqu  -0x41(%rsi),%xmm0
  4077a3:	f3 0f 7f 47 bf       	movdqu %xmm0,-0x41(%rdi)
  4077a8:	f2 0f f0 46 cf       	lddqu  -0x31(%rsi),%xmm0
  4077ad:	f3 0f 7f 47 cf       	movdqu %xmm0,-0x31(%rdi)
  4077b2:	f2 0f f0 46 df       	lddqu  -0x21(%rsi),%xmm0
  4077b7:	f3 0f 7f 47 df       	movdqu %xmm0,-0x21(%rdi)
  4077bc:	f2 0f f0 46 ef       	lddqu  -0x11(%rsi),%xmm0
  4077c1:	f2 0f f0 4e f0       	lddqu  -0x10(%rsi),%xmm1
  4077c6:	f3 0f 7f 47 ef       	movdqu %xmm0,-0x11(%rdi)
  4077cb:	f3 0f 7f 4f f0       	movdqu %xmm1,-0x10(%rdi)
  4077d0:	c3                   	retq   
  4077d1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4077d8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4077df:	00 
  4077e0:	0f b6 56 ff          	movzbl -0x1(%rsi),%edx
  4077e4:	88 57 ff             	mov    %dl,-0x1(%rdi)
  4077e7:	c3                   	retq   
  4077e8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4077ef:	00 
  4077f0:	f2 0f f0 46 70       	lddqu  0x70(%rsi),%xmm0
  4077f5:	f3 0f 7f 47 70       	movdqu %xmm0,0x70(%rdi)
  4077fa:	f2 0f f0 46 60       	lddqu  0x60(%rsi),%xmm0
  4077ff:	f3 0f 7f 47 60       	movdqu %xmm0,0x60(%rdi)
  407804:	f2 0f f0 46 50       	lddqu  0x50(%rsi),%xmm0
  407809:	f3 0f 7f 47 50       	movdqu %xmm0,0x50(%rdi)
  40780e:	f2 0f f0 46 40       	lddqu  0x40(%rsi),%xmm0
  407813:	f3 0f 7f 47 40       	movdqu %xmm0,0x40(%rdi)
  407818:	f2 0f f0 46 30       	lddqu  0x30(%rsi),%xmm0
  40781d:	f3 0f 7f 47 30       	movdqu %xmm0,0x30(%rdi)
  407822:	f2 0f f0 46 20       	lddqu  0x20(%rsi),%xmm0
  407827:	f3 0f 7f 47 20       	movdqu %xmm0,0x20(%rdi)
  40782c:	f2 0f f0 46 10       	lddqu  0x10(%rsi),%xmm0
  407831:	f3 0f 7f 47 10       	movdqu %xmm0,0x10(%rdi)
  407836:	f2 0f f0 06          	lddqu  (%rsi),%xmm0
  40783a:	f3 0f 7f 07          	movdqu %xmm0,(%rdi)
  40783e:	c3                   	retq   
  40783f:	90                   	nop
  407840:	f2 0f f0 46 7f       	lddqu  0x7f(%rsi),%xmm0
  407845:	f3 0f 7f 47 7f       	movdqu %xmm0,0x7f(%rdi)
  40784a:	f2 0f f0 46 6f       	lddqu  0x6f(%rsi),%xmm0
  40784f:	f3 0f 7f 47 6f       	movdqu %xmm0,0x6f(%rdi)
  407854:	f2 0f f0 46 5f       	lddqu  0x5f(%rsi),%xmm0
  407859:	f3 0f 7f 47 5f       	movdqu %xmm0,0x5f(%rdi)
  40785e:	f2 0f f0 46 4f       	lddqu  0x4f(%rsi),%xmm0
  407863:	f3 0f 7f 47 4f       	movdqu %xmm0,0x4f(%rdi)
  407868:	f2 0f f0 46 3f       	lddqu  0x3f(%rsi),%xmm0
  40786d:	f3 0f 7f 47 3f       	movdqu %xmm0,0x3f(%rdi)
  407872:	f2 0f f0 46 2f       	lddqu  0x2f(%rsi),%xmm0
  407877:	f3 0f 7f 47 2f       	movdqu %xmm0,0x2f(%rdi)
  40787c:	f2 0f f0 46 1f       	lddqu  0x1f(%rsi),%xmm0
  407881:	f3 0f 7f 47 1f       	movdqu %xmm0,0x1f(%rdi)
  407886:	f2 0f f0 46 0f       	lddqu  0xf(%rsi),%xmm0
  40788b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  40788f:	f3 0f 7f 47 0f       	movdqu %xmm0,0xf(%rdi)
  407894:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407898:	c3                   	retq   
  407899:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4078a0:	48 8b 56 07          	mov    0x7(%rsi),%rdx
  4078a4:	48 8b 0e             	mov    (%rsi),%rcx
  4078a7:	48 89 57 07          	mov    %rdx,0x7(%rdi)
  4078ab:	48 89 0f             	mov    %rcx,(%rdi)
  4078ae:	c3                   	retq   
  4078af:	90                   	nop
  4078b0:	f2 0f f0 46 7e       	lddqu  0x7e(%rsi),%xmm0
  4078b5:	f3 0f 7f 47 7e       	movdqu %xmm0,0x7e(%rdi)
  4078ba:	f2 0f f0 46 6e       	lddqu  0x6e(%rsi),%xmm0
  4078bf:	f3 0f 7f 47 6e       	movdqu %xmm0,0x6e(%rdi)
  4078c4:	f2 0f f0 46 5e       	lddqu  0x5e(%rsi),%xmm0
  4078c9:	f3 0f 7f 47 5e       	movdqu %xmm0,0x5e(%rdi)
  4078ce:	f2 0f f0 46 4e       	lddqu  0x4e(%rsi),%xmm0
  4078d3:	f3 0f 7f 47 4e       	movdqu %xmm0,0x4e(%rdi)
  4078d8:	f2 0f f0 46 3e       	lddqu  0x3e(%rsi),%xmm0
  4078dd:	f3 0f 7f 47 3e       	movdqu %xmm0,0x3e(%rdi)
  4078e2:	f2 0f f0 46 2e       	lddqu  0x2e(%rsi),%xmm0
  4078e7:	f3 0f 7f 47 2e       	movdqu %xmm0,0x2e(%rdi)
  4078ec:	f2 0f f0 46 1e       	lddqu  0x1e(%rsi),%xmm0
  4078f1:	f3 0f 7f 47 1e       	movdqu %xmm0,0x1e(%rdi)
  4078f6:	f2 0f f0 46 0e       	lddqu  0xe(%rsi),%xmm0
  4078fb:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  4078ff:	f3 0f 7f 47 0e       	movdqu %xmm0,0xe(%rdi)
  407904:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407908:	c3                   	retq   
  407909:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407910:	48 8b 56 06          	mov    0x6(%rsi),%rdx
  407914:	48 8b 0e             	mov    (%rsi),%rcx
  407917:	48 89 57 06          	mov    %rdx,0x6(%rdi)
  40791b:	48 89 0f             	mov    %rcx,(%rdi)
  40791e:	c3                   	retq   
  40791f:	90                   	nop
  407920:	f2 0f f0 46 7d       	lddqu  0x7d(%rsi),%xmm0
  407925:	f3 0f 7f 47 7d       	movdqu %xmm0,0x7d(%rdi)
  40792a:	f2 0f f0 46 6d       	lddqu  0x6d(%rsi),%xmm0
  40792f:	f3 0f 7f 47 6d       	movdqu %xmm0,0x6d(%rdi)
  407934:	f2 0f f0 46 5d       	lddqu  0x5d(%rsi),%xmm0
  407939:	f3 0f 7f 47 5d       	movdqu %xmm0,0x5d(%rdi)
  40793e:	f2 0f f0 46 4d       	lddqu  0x4d(%rsi),%xmm0
  407943:	f3 0f 7f 47 4d       	movdqu %xmm0,0x4d(%rdi)
  407948:	f2 0f f0 46 3d       	lddqu  0x3d(%rsi),%xmm0
  40794d:	f3 0f 7f 47 3d       	movdqu %xmm0,0x3d(%rdi)
  407952:	f2 0f f0 46 2d       	lddqu  0x2d(%rsi),%xmm0
  407957:	f3 0f 7f 47 2d       	movdqu %xmm0,0x2d(%rdi)
  40795c:	f2 0f f0 46 1d       	lddqu  0x1d(%rsi),%xmm0
  407961:	f3 0f 7f 47 1d       	movdqu %xmm0,0x1d(%rdi)
  407966:	f2 0f f0 46 0d       	lddqu  0xd(%rsi),%xmm0
  40796b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  40796f:	f3 0f 7f 47 0d       	movdqu %xmm0,0xd(%rdi)
  407974:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407978:	c3                   	retq   
  407979:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407980:	48 8b 56 05          	mov    0x5(%rsi),%rdx
  407984:	48 8b 0e             	mov    (%rsi),%rcx
  407987:	48 89 57 05          	mov    %rdx,0x5(%rdi)
  40798b:	48 89 0f             	mov    %rcx,(%rdi)
  40798e:	c3                   	retq   
  40798f:	90                   	nop
  407990:	f2 0f f0 46 7c       	lddqu  0x7c(%rsi),%xmm0
  407995:	f3 0f 7f 47 7c       	movdqu %xmm0,0x7c(%rdi)
  40799a:	f2 0f f0 46 6c       	lddqu  0x6c(%rsi),%xmm0
  40799f:	f3 0f 7f 47 6c       	movdqu %xmm0,0x6c(%rdi)
  4079a4:	f2 0f f0 46 5c       	lddqu  0x5c(%rsi),%xmm0
  4079a9:	f3 0f 7f 47 5c       	movdqu %xmm0,0x5c(%rdi)
  4079ae:	f2 0f f0 46 4c       	lddqu  0x4c(%rsi),%xmm0
  4079b3:	f3 0f 7f 47 4c       	movdqu %xmm0,0x4c(%rdi)
  4079b8:	f2 0f f0 46 3c       	lddqu  0x3c(%rsi),%xmm0
  4079bd:	f3 0f 7f 47 3c       	movdqu %xmm0,0x3c(%rdi)
  4079c2:	f2 0f f0 46 2c       	lddqu  0x2c(%rsi),%xmm0
  4079c7:	f3 0f 7f 47 2c       	movdqu %xmm0,0x2c(%rdi)
  4079cc:	f2 0f f0 46 1c       	lddqu  0x1c(%rsi),%xmm0
  4079d1:	f3 0f 7f 47 1c       	movdqu %xmm0,0x1c(%rdi)
  4079d6:	f2 0f f0 46 0c       	lddqu  0xc(%rsi),%xmm0
  4079db:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  4079df:	f3 0f 7f 47 0c       	movdqu %xmm0,0xc(%rdi)
  4079e4:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  4079e8:	c3                   	retq   
  4079e9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  4079f0:	48 8b 56 04          	mov    0x4(%rsi),%rdx
  4079f4:	48 8b 0e             	mov    (%rsi),%rcx
  4079f7:	48 89 57 04          	mov    %rdx,0x4(%rdi)
  4079fb:	48 89 0f             	mov    %rcx,(%rdi)
  4079fe:	c3                   	retq   
  4079ff:	90                   	nop
  407a00:	f2 0f f0 46 7b       	lddqu  0x7b(%rsi),%xmm0
  407a05:	f3 0f 7f 47 7b       	movdqu %xmm0,0x7b(%rdi)
  407a0a:	f2 0f f0 46 6b       	lddqu  0x6b(%rsi),%xmm0
  407a0f:	f3 0f 7f 47 6b       	movdqu %xmm0,0x6b(%rdi)
  407a14:	f2 0f f0 46 5b       	lddqu  0x5b(%rsi),%xmm0
  407a19:	f3 0f 7f 47 5b       	movdqu %xmm0,0x5b(%rdi)
  407a1e:	f2 0f f0 46 4b       	lddqu  0x4b(%rsi),%xmm0
  407a23:	f3 0f 7f 47 4b       	movdqu %xmm0,0x4b(%rdi)
  407a28:	f2 0f f0 46 3b       	lddqu  0x3b(%rsi),%xmm0
  407a2d:	f3 0f 7f 47 3b       	movdqu %xmm0,0x3b(%rdi)
  407a32:	f2 0f f0 46 2b       	lddqu  0x2b(%rsi),%xmm0
  407a37:	f3 0f 7f 47 2b       	movdqu %xmm0,0x2b(%rdi)
  407a3c:	f2 0f f0 46 1b       	lddqu  0x1b(%rsi),%xmm0
  407a41:	f3 0f 7f 47 1b       	movdqu %xmm0,0x1b(%rdi)
  407a46:	f2 0f f0 46 0b       	lddqu  0xb(%rsi),%xmm0
  407a4b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407a4f:	f3 0f 7f 47 0b       	movdqu %xmm0,0xb(%rdi)
  407a54:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407a58:	c3                   	retq   
  407a59:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407a60:	48 8b 56 03          	mov    0x3(%rsi),%rdx
  407a64:	48 8b 0e             	mov    (%rsi),%rcx
  407a67:	48 89 57 03          	mov    %rdx,0x3(%rdi)
  407a6b:	48 89 0f             	mov    %rcx,(%rdi)
  407a6e:	c3                   	retq   
  407a6f:	90                   	nop
  407a70:	f2 0f f0 46 7a       	lddqu  0x7a(%rsi),%xmm0
  407a75:	f3 0f 7f 47 7a       	movdqu %xmm0,0x7a(%rdi)
  407a7a:	f2 0f f0 46 6a       	lddqu  0x6a(%rsi),%xmm0
  407a7f:	f3 0f 7f 47 6a       	movdqu %xmm0,0x6a(%rdi)
  407a84:	f2 0f f0 46 5a       	lddqu  0x5a(%rsi),%xmm0
  407a89:	f3 0f 7f 47 5a       	movdqu %xmm0,0x5a(%rdi)
  407a8e:	f2 0f f0 46 4a       	lddqu  0x4a(%rsi),%xmm0
  407a93:	f3 0f 7f 47 4a       	movdqu %xmm0,0x4a(%rdi)
  407a98:	f2 0f f0 46 3a       	lddqu  0x3a(%rsi),%xmm0
  407a9d:	f3 0f 7f 47 3a       	movdqu %xmm0,0x3a(%rdi)
  407aa2:	f2 0f f0 46 2a       	lddqu  0x2a(%rsi),%xmm0
  407aa7:	f3 0f 7f 47 2a       	movdqu %xmm0,0x2a(%rdi)
  407aac:	f2 0f f0 46 1a       	lddqu  0x1a(%rsi),%xmm0
  407ab1:	f3 0f 7f 47 1a       	movdqu %xmm0,0x1a(%rdi)
  407ab6:	f2 0f f0 46 0a       	lddqu  0xa(%rsi),%xmm0
  407abb:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407abf:	f3 0f 7f 47 0a       	movdqu %xmm0,0xa(%rdi)
  407ac4:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407ac8:	c3                   	retq   
  407ac9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407ad0:	48 8b 56 02          	mov    0x2(%rsi),%rdx
  407ad4:	48 8b 0e             	mov    (%rsi),%rcx
  407ad7:	48 89 57 02          	mov    %rdx,0x2(%rdi)
  407adb:	48 89 0f             	mov    %rcx,(%rdi)
  407ade:	c3                   	retq   
  407adf:	90                   	nop
  407ae0:	f2 0f f0 46 79       	lddqu  0x79(%rsi),%xmm0
  407ae5:	f3 0f 7f 47 79       	movdqu %xmm0,0x79(%rdi)
  407aea:	f2 0f f0 46 69       	lddqu  0x69(%rsi),%xmm0
  407aef:	f3 0f 7f 47 69       	movdqu %xmm0,0x69(%rdi)
  407af4:	f2 0f f0 46 59       	lddqu  0x59(%rsi),%xmm0
  407af9:	f3 0f 7f 47 59       	movdqu %xmm0,0x59(%rdi)
  407afe:	f2 0f f0 46 49       	lddqu  0x49(%rsi),%xmm0
  407b03:	f3 0f 7f 47 49       	movdqu %xmm0,0x49(%rdi)
  407b08:	f2 0f f0 46 39       	lddqu  0x39(%rsi),%xmm0
  407b0d:	f3 0f 7f 47 39       	movdqu %xmm0,0x39(%rdi)
  407b12:	f2 0f f0 46 29       	lddqu  0x29(%rsi),%xmm0
  407b17:	f3 0f 7f 47 29       	movdqu %xmm0,0x29(%rdi)
  407b1c:	f2 0f f0 46 19       	lddqu  0x19(%rsi),%xmm0
  407b21:	f3 0f 7f 47 19       	movdqu %xmm0,0x19(%rdi)
  407b26:	f2 0f f0 46 09       	lddqu  0x9(%rsi),%xmm0
  407b2b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407b2f:	f3 0f 7f 47 09       	movdqu %xmm0,0x9(%rdi)
  407b34:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407b38:	c3                   	retq   
  407b39:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407b40:	48 8b 56 01          	mov    0x1(%rsi),%rdx
  407b44:	48 8b 0e             	mov    (%rsi),%rcx
  407b47:	48 89 57 01          	mov    %rdx,0x1(%rdi)
  407b4b:	48 89 0f             	mov    %rcx,(%rdi)
  407b4e:	c3                   	retq   
  407b4f:	90                   	nop
  407b50:	f2 0f f0 46 78       	lddqu  0x78(%rsi),%xmm0
  407b55:	f3 0f 7f 47 78       	movdqu %xmm0,0x78(%rdi)
  407b5a:	f2 0f f0 46 68       	lddqu  0x68(%rsi),%xmm0
  407b5f:	f3 0f 7f 47 68       	movdqu %xmm0,0x68(%rdi)
  407b64:	f2 0f f0 46 58       	lddqu  0x58(%rsi),%xmm0
  407b69:	f3 0f 7f 47 58       	movdqu %xmm0,0x58(%rdi)
  407b6e:	f2 0f f0 46 48       	lddqu  0x48(%rsi),%xmm0
  407b73:	f3 0f 7f 47 48       	movdqu %xmm0,0x48(%rdi)
  407b78:	f2 0f f0 46 38       	lddqu  0x38(%rsi),%xmm0
  407b7d:	f3 0f 7f 47 38       	movdqu %xmm0,0x38(%rdi)
  407b82:	f2 0f f0 46 28       	lddqu  0x28(%rsi),%xmm0
  407b87:	f3 0f 7f 47 28       	movdqu %xmm0,0x28(%rdi)
  407b8c:	f2 0f f0 46 18       	lddqu  0x18(%rsi),%xmm0
  407b91:	f3 0f 7f 47 18       	movdqu %xmm0,0x18(%rdi)
  407b96:	f2 0f f0 46 08       	lddqu  0x8(%rsi),%xmm0
  407b9b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407b9f:	f3 0f 7f 47 08       	movdqu %xmm0,0x8(%rdi)
  407ba4:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407ba8:	c3                   	retq   
  407ba9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407bb0:	48 8b 16             	mov    (%rsi),%rdx
  407bb3:	48 89 17             	mov    %rdx,(%rdi)
  407bb6:	c3                   	retq   
  407bb7:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  407bbe:	00 00 
  407bc0:	f2 0f f0 46 77       	lddqu  0x77(%rsi),%xmm0
  407bc5:	f3 0f 7f 47 77       	movdqu %xmm0,0x77(%rdi)
  407bca:	f2 0f f0 46 67       	lddqu  0x67(%rsi),%xmm0
  407bcf:	f3 0f 7f 47 67       	movdqu %xmm0,0x67(%rdi)
  407bd4:	f2 0f f0 46 57       	lddqu  0x57(%rsi),%xmm0
  407bd9:	f3 0f 7f 47 57       	movdqu %xmm0,0x57(%rdi)
  407bde:	f2 0f f0 46 47       	lddqu  0x47(%rsi),%xmm0
  407be3:	f3 0f 7f 47 47       	movdqu %xmm0,0x47(%rdi)
  407be8:	f2 0f f0 46 37       	lddqu  0x37(%rsi),%xmm0
  407bed:	f3 0f 7f 47 37       	movdqu %xmm0,0x37(%rdi)
  407bf2:	f2 0f f0 46 27       	lddqu  0x27(%rsi),%xmm0
  407bf7:	f3 0f 7f 47 27       	movdqu %xmm0,0x27(%rdi)
  407bfc:	f2 0f f0 46 17       	lddqu  0x17(%rsi),%xmm0
  407c01:	f3 0f 7f 47 17       	movdqu %xmm0,0x17(%rdi)
  407c06:	f2 0f f0 46 07       	lddqu  0x7(%rsi),%xmm0
  407c0b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407c0f:	f3 0f 7f 47 07       	movdqu %xmm0,0x7(%rdi)
  407c14:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407c18:	c3                   	retq   
  407c19:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407c20:	8b 56 03             	mov    0x3(%rsi),%edx
  407c23:	8b 0e                	mov    (%rsi),%ecx
  407c25:	89 57 03             	mov    %edx,0x3(%rdi)
  407c28:	89 0f                	mov    %ecx,(%rdi)
  407c2a:	c3                   	retq   
  407c2b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  407c30:	f2 0f f0 46 76       	lddqu  0x76(%rsi),%xmm0
  407c35:	f3 0f 7f 47 76       	movdqu %xmm0,0x76(%rdi)
  407c3a:	f2 0f f0 46 66       	lddqu  0x66(%rsi),%xmm0
  407c3f:	f3 0f 7f 47 66       	movdqu %xmm0,0x66(%rdi)
  407c44:	f2 0f f0 46 56       	lddqu  0x56(%rsi),%xmm0
  407c49:	f3 0f 7f 47 56       	movdqu %xmm0,0x56(%rdi)
  407c4e:	f2 0f f0 46 46       	lddqu  0x46(%rsi),%xmm0
  407c53:	f3 0f 7f 47 46       	movdqu %xmm0,0x46(%rdi)
  407c58:	f2 0f f0 46 36       	lddqu  0x36(%rsi),%xmm0
  407c5d:	f3 0f 7f 47 36       	movdqu %xmm0,0x36(%rdi)
  407c62:	f2 0f f0 46 26       	lddqu  0x26(%rsi),%xmm0
  407c67:	f3 0f 7f 47 26       	movdqu %xmm0,0x26(%rdi)
  407c6c:	f2 0f f0 46 16       	lddqu  0x16(%rsi),%xmm0
  407c71:	f3 0f 7f 47 16       	movdqu %xmm0,0x16(%rdi)
  407c76:	f2 0f f0 46 06       	lddqu  0x6(%rsi),%xmm0
  407c7b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407c7f:	f3 0f 7f 47 06       	movdqu %xmm0,0x6(%rdi)
  407c84:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407c88:	c3                   	retq   
  407c89:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407c90:	8b 56 02             	mov    0x2(%rsi),%edx
  407c93:	8b 0e                	mov    (%rsi),%ecx
  407c95:	89 57 02             	mov    %edx,0x2(%rdi)
  407c98:	89 0f                	mov    %ecx,(%rdi)
  407c9a:	c3                   	retq   
  407c9b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  407ca0:	f2 0f f0 46 75       	lddqu  0x75(%rsi),%xmm0
  407ca5:	f3 0f 7f 47 75       	movdqu %xmm0,0x75(%rdi)
  407caa:	f2 0f f0 46 65       	lddqu  0x65(%rsi),%xmm0
  407caf:	f3 0f 7f 47 65       	movdqu %xmm0,0x65(%rdi)
  407cb4:	f2 0f f0 46 55       	lddqu  0x55(%rsi),%xmm0
  407cb9:	f3 0f 7f 47 55       	movdqu %xmm0,0x55(%rdi)
  407cbe:	f2 0f f0 46 45       	lddqu  0x45(%rsi),%xmm0
  407cc3:	f3 0f 7f 47 45       	movdqu %xmm0,0x45(%rdi)
  407cc8:	f2 0f f0 46 35       	lddqu  0x35(%rsi),%xmm0
  407ccd:	f3 0f 7f 47 35       	movdqu %xmm0,0x35(%rdi)
  407cd2:	f2 0f f0 46 25       	lddqu  0x25(%rsi),%xmm0
  407cd7:	f3 0f 7f 47 25       	movdqu %xmm0,0x25(%rdi)
  407cdc:	f2 0f f0 46 15       	lddqu  0x15(%rsi),%xmm0
  407ce1:	f3 0f 7f 47 15       	movdqu %xmm0,0x15(%rdi)
  407ce6:	f2 0f f0 46 05       	lddqu  0x5(%rsi),%xmm0
  407ceb:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407cef:	f3 0f 7f 47 05       	movdqu %xmm0,0x5(%rdi)
  407cf4:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407cf8:	c3                   	retq   
  407cf9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407d00:	8b 56 01             	mov    0x1(%rsi),%edx
  407d03:	8b 0e                	mov    (%rsi),%ecx
  407d05:	89 57 01             	mov    %edx,0x1(%rdi)
  407d08:	89 0f                	mov    %ecx,(%rdi)
  407d0a:	c3                   	retq   
  407d0b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  407d10:	f2 0f f0 46 74       	lddqu  0x74(%rsi),%xmm0
  407d15:	f3 0f 7f 47 74       	movdqu %xmm0,0x74(%rdi)
  407d1a:	f2 0f f0 46 64       	lddqu  0x64(%rsi),%xmm0
  407d1f:	f3 0f 7f 47 64       	movdqu %xmm0,0x64(%rdi)
  407d24:	f2 0f f0 46 54       	lddqu  0x54(%rsi),%xmm0
  407d29:	f3 0f 7f 47 54       	movdqu %xmm0,0x54(%rdi)
  407d2e:	f2 0f f0 46 44       	lddqu  0x44(%rsi),%xmm0
  407d33:	f3 0f 7f 47 44       	movdqu %xmm0,0x44(%rdi)
  407d38:	f2 0f f0 46 34       	lddqu  0x34(%rsi),%xmm0
  407d3d:	f3 0f 7f 47 34       	movdqu %xmm0,0x34(%rdi)
  407d42:	f2 0f f0 46 24       	lddqu  0x24(%rsi),%xmm0
  407d47:	f3 0f 7f 47 24       	movdqu %xmm0,0x24(%rdi)
  407d4c:	f2 0f f0 46 14       	lddqu  0x14(%rsi),%xmm0
  407d51:	f3 0f 7f 47 14       	movdqu %xmm0,0x14(%rdi)
  407d56:	f2 0f f0 46 04       	lddqu  0x4(%rsi),%xmm0
  407d5b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407d5f:	f3 0f 7f 47 04       	movdqu %xmm0,0x4(%rdi)
  407d64:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407d68:	c3                   	retq   
  407d69:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407d70:	8b 16                	mov    (%rsi),%edx
  407d72:	89 17                	mov    %edx,(%rdi)
  407d74:	c3                   	retq   
  407d75:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  407d7a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  407d80:	f2 0f f0 46 73       	lddqu  0x73(%rsi),%xmm0
  407d85:	f3 0f 7f 47 73       	movdqu %xmm0,0x73(%rdi)
  407d8a:	f2 0f f0 46 63       	lddqu  0x63(%rsi),%xmm0
  407d8f:	f3 0f 7f 47 63       	movdqu %xmm0,0x63(%rdi)
  407d94:	f2 0f f0 46 53       	lddqu  0x53(%rsi),%xmm0
  407d99:	f3 0f 7f 47 53       	movdqu %xmm0,0x53(%rdi)
  407d9e:	f2 0f f0 46 43       	lddqu  0x43(%rsi),%xmm0
  407da3:	f3 0f 7f 47 43       	movdqu %xmm0,0x43(%rdi)
  407da8:	f2 0f f0 46 33       	lddqu  0x33(%rsi),%xmm0
  407dad:	f3 0f 7f 47 33       	movdqu %xmm0,0x33(%rdi)
  407db2:	f2 0f f0 46 23       	lddqu  0x23(%rsi),%xmm0
  407db7:	f3 0f 7f 47 23       	movdqu %xmm0,0x23(%rdi)
  407dbc:	f2 0f f0 46 13       	lddqu  0x13(%rsi),%xmm0
  407dc1:	f3 0f 7f 47 13       	movdqu %xmm0,0x13(%rdi)
  407dc6:	f2 0f f0 46 03       	lddqu  0x3(%rsi),%xmm0
  407dcb:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407dcf:	f3 0f 7f 47 03       	movdqu %xmm0,0x3(%rdi)
  407dd4:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407dd8:	c3                   	retq   
  407dd9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407de0:	66 8b 56 01          	mov    0x1(%rsi),%dx
  407de4:	66 8b 0e             	mov    (%rsi),%cx
  407de7:	66 89 57 01          	mov    %dx,0x1(%rdi)
  407deb:	66 89 0f             	mov    %cx,(%rdi)
  407dee:	c3                   	retq   
  407def:	90                   	nop
  407df0:	f2 0f f0 46 72       	lddqu  0x72(%rsi),%xmm0
  407df5:	f3 0f 7f 47 72       	movdqu %xmm0,0x72(%rdi)
  407dfa:	f2 0f f0 46 62       	lddqu  0x62(%rsi),%xmm0
  407dff:	f3 0f 7f 47 62       	movdqu %xmm0,0x62(%rdi)
  407e04:	f2 0f f0 46 52       	lddqu  0x52(%rsi),%xmm0
  407e09:	f3 0f 7f 47 52       	movdqu %xmm0,0x52(%rdi)
  407e0e:	f2 0f f0 46 42       	lddqu  0x42(%rsi),%xmm0
  407e13:	f3 0f 7f 47 42       	movdqu %xmm0,0x42(%rdi)
  407e18:	f2 0f f0 46 32       	lddqu  0x32(%rsi),%xmm0
  407e1d:	f3 0f 7f 47 32       	movdqu %xmm0,0x32(%rdi)
  407e22:	f2 0f f0 46 22       	lddqu  0x22(%rsi),%xmm0
  407e27:	f3 0f 7f 47 22       	movdqu %xmm0,0x22(%rdi)
  407e2c:	f2 0f f0 46 12       	lddqu  0x12(%rsi),%xmm0
  407e31:	f3 0f 7f 47 12       	movdqu %xmm0,0x12(%rdi)
  407e36:	f2 0f f0 46 02       	lddqu  0x2(%rsi),%xmm0
  407e3b:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407e3f:	f3 0f 7f 47 02       	movdqu %xmm0,0x2(%rdi)
  407e44:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407e48:	c3                   	retq   
  407e49:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407e50:	0f b7 16             	movzwl (%rsi),%edx
  407e53:	66 89 17             	mov    %dx,(%rdi)
  407e56:	c3                   	retq   
  407e57:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  407e5e:	00 00 
  407e60:	f2 0f f0 46 71       	lddqu  0x71(%rsi),%xmm0
  407e65:	f3 0f 7f 47 71       	movdqu %xmm0,0x71(%rdi)
  407e6a:	f2 0f f0 46 61       	lddqu  0x61(%rsi),%xmm0
  407e6f:	f3 0f 7f 47 61       	movdqu %xmm0,0x61(%rdi)
  407e74:	f2 0f f0 46 51       	lddqu  0x51(%rsi),%xmm0
  407e79:	f3 0f 7f 47 51       	movdqu %xmm0,0x51(%rdi)
  407e7e:	f2 0f f0 46 41       	lddqu  0x41(%rsi),%xmm0
  407e83:	f3 0f 7f 47 41       	movdqu %xmm0,0x41(%rdi)
  407e88:	f2 0f f0 46 31       	lddqu  0x31(%rsi),%xmm0
  407e8d:	f3 0f 7f 47 31       	movdqu %xmm0,0x31(%rdi)
  407e92:	f2 0f f0 46 21       	lddqu  0x21(%rsi),%xmm0
  407e97:	f3 0f 7f 47 21       	movdqu %xmm0,0x21(%rdi)
  407e9c:	f2 0f f0 46 11       	lddqu  0x11(%rsi),%xmm0
  407ea1:	f3 0f 7f 47 11       	movdqu %xmm0,0x11(%rdi)
  407ea6:	f2 0f f0 46 01       	lddqu  0x1(%rsi),%xmm0
  407eab:	f2 0f f0 0e          	lddqu  (%rsi),%xmm1
  407eaf:	f3 0f 7f 47 01       	movdqu %xmm0,0x1(%rdi)
  407eb4:	f3 0f 7f 0f          	movdqu %xmm1,(%rdi)
  407eb8:	c3                   	retq   
  407eb9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  407ec0:	0f b6 16             	movzbl (%rsi),%edx
  407ec3:	88 17                	mov    %dl,(%rdi)
  407ec5:	c3                   	retq   
  407ec6:	90                   	nop
  407ec7:	90                   	nop
  407ec8:	90                   	nop
  407ec9:	90                   	nop
  407eca:	90                   	nop
  407ecb:	90                   	nop
  407ecc:	90                   	nop
  407ecd:	90                   	nop
  407ece:	90                   	nop
  407ecf:	90                   	nop</description>
    <pubDate>Wed, 19 Dec 2012 19:53:52 GMT</pubDate>
    <dc:creator>zhengda1936</dc:creator>
    <dc:date>2012-12-19T19:53:52Z</dc:date>
    <item>
      <title>The best method for inter-processor data communication</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975772#M5584</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;I measure the performance of memory copy in a NUMA machine with 4&amp;nbsp;Xeon(R) CPU E5-4620 processors. When I copy data in the local memory, I can get up to almost 10GB/s. However, when I copy data from remote memory, I get much worse performance, only around 1GB/s. I use memcpy() to copy data and each copy is a page size (4KB).&lt;/P&gt;
&lt;P&gt;I wonder if Intel processors provides special instructions for inter-processor data movement. I know Intel use QPI for inter-processor communication. Does it expose some interface for programmers? Is the performance above the best I can get?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Da&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Dec 2012 21:01:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975772#M5584</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-14T21:01:22Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...I wonder if Intel</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975773#M5585</link>
      <description>&amp;gt;&amp;gt;...I wonder if Intel processors provides special instructions for inter-processor data movement...

Please take a look at Instructions Set Reference located at: www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

&amp;gt;&amp;gt;...I get much worse performance, only around 1GB/s...

Access to a &lt;STRONG&gt;local&lt;/STRONG&gt; memory is always faster, however the 10x drop in performance is significant. Does it really so big in case of an access to a &lt;STRONG&gt;foreign&lt;/STRONG&gt; memory?</description>
      <pubDate>Sat, 15 Dec 2012 07:21:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975773#M5585</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-12-15T07:21:19Z</dc:date>
    </item>
    <item>
      <title>The best option is to</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975774#M5586</link>
      <description>The best option is to disassemble memcpy() function and look at its machine code implementation.Rep prefix combined with movsd instruction are used to copy the memory in large quantity.
I think that interprocessor communication at the lowest level is managed by the hardware itself.I do not know if there is some kind of programming interface exposed to the programmer in order to manage and control programmatically inter-processor communication.
Regarding documentation I would like to recommend you to read Intel chipset documentation which probably does contain some information regarding inter-processor communication.</description>
      <pubDate>Sat, 15 Dec 2012 08:39:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975774#M5586</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-12-15T08:39:23Z</dc:date>
    </item>
    <item>
      <title>It may be worthwhile to check</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975775#M5587</link>
      <description>It may be worthwhile to check your memcpy() version and your data alignments.  At one time, the __intel_fast_memcpy() substitution made by Intel compilers could be a great help.  Any up to date memcpy() ought to take advantage of simd nontemporal instructions in cases where alignment is compatible.  rep movsd should be used only where alignment requires it.  memcpy() supplied by early 64-bit linux distros was extremely poor.   You might want to experiment with 16, 32, and 64-byte alignments for both source and destination.
corei7-2 CPUs were supposed to be designed to improve performance of rep mov loops such as 32-bit gcc might create, but there would still be an advantage in setting alignment so as to use simd instructions.   Some past CPUs performed poorly with rep mov loops.
In connection with illyapolak's remark, it would be interesting to use a profiler such as VTune or oprofile to show which instructions are actually used in your slow case.
I'm not sure what causes might be suspect for a slowdown such as you quote on that platform; more than a 2x penalty for remote memory would be disappointing.  You should check whether the RAM is compatible and properly distributed among the slots.</description>
      <pubDate>Sun, 16 Dec 2012 00:41:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975775#M5587</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-12-16T00:41:40Z</dc:date>
    </item>
    <item>
      <title>Sorry for a question not</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975776#M5588</link>
      <description>Sorry for a question not related to the subject.

&amp;gt;&amp;gt;...NUMA machine with 4 Xeon(R) CPU E5-4620 processors...

Are these NUMA computers expensive? How much could cost a cheapest computer that supports NUMA architecture? Thanks in advance.

Note: I'm asking because I couldn't find an answer on the web.</description>
      <pubDate>Sun, 16 Dec 2012 02:18:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975776#M5588</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-12-16T02:18:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...NUMA machine with 4 Xeon</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975777#M5589</link>
      <description>&amp;gt;&amp;gt;...NUMA machine with 4 Xeon(R) CPU E5-4620 processors...&amp;gt;&amp;gt;&amp;gt;
Albeit no Intel-based chipset , but you can calculate the price of the motherboard and the cpus.
Please follow this link :http://www.tyan.com/product_SKU_spec.aspx?ProductType=MB&amp;amp;pid=670&amp;amp;SKU=600000180
And this link :http://www.pcsuperstore.com/products/11113480-Tyan-S8812WGM3NR.html

For Intel-based chipset motherboards please follow these links:http://www.supermicro.com/products/motherboard/Xeon/C600/X9QR7-TF.cfm
&lt;A href="http://www.alvio.com/xABK_PID1237628_supermicro-computer_mbd-x9qr7-tf-o_romley-quad-socket-sas2-ipmi-20-retail_amd-socket-f-1207-motherboards.html" target="_blank"&gt;http://www.alvio.com/xABK_PID1237628_supermicro-computer_mbd-x9qr7-tf-o_romley-quad-socket-sas2-ipmi-20-retail_amd-socket-f-1207-motherboards.html&lt;/A&gt;
Whole system can easily reach the price of 2500-3000$.
For the complete solutions follow this link:
&lt;A href="https://community.intel.com/www.supermicro.com/xeon_mp/http://www.alvio.com/xABK_PID1237628_supermicro-computer_mbd-x9qr7-tf-o_romley-quad-socket-sas2-ipmi-20-retail_amd-socket-f-1207-motherboards.html" target="_blank"&gt;www.supermicro.com/xeon_mp/http://www.alvio.com/xABK_PID1237628_supermicro-computer_mbd-x9qr7-tf-o_romley-quad-socket-sas2-ipmi-20-retail_amd-socket-f-1207-motherboards.html&lt;/A&gt;</description>
      <pubDate>Sun, 16 Dec 2012 07:28:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975777#M5589</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-12-16T07:28:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;You should check whether</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975778#M5590</link>
      <description>&amp;gt;&amp;gt;&amp;gt;You should check whether the RAM is compatible and properly distributed among the slots.&amp;gt;&amp;gt;&amp;gt;
Lower hardware layer in the form of chipset's memory controller or on-die memory controller should be also accounted for the poor memory performance.
Regarding the internal implementation of the memcpy() , it is highly probable that compiler could  wrongfully implement rep movsb instead of rep movsd instruction.</description>
      <pubDate>Sun, 16 Dec 2012 08:00:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975778#M5590</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-12-16T08:00:30Z</dc:date>
    </item>
    <item>
      <title>Thank you for all your</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975779#M5591</link>
      <description>Thank you for all your suggestions.

I started with checking the assembly code of a small test code:
char src[4096];
char dest[4096];

int main()
{
	memcpy(dest, src, sizeof(dest));
}

gcc compiles the code into the following assembly code:
0000000000000000 &lt;MAIN&gt;:
   0:	bf 00 00 00 00       	mov    $0x0,%edi
   5:	be 00 00 00 00       	mov    $0x0,%esi
   a:	b9 00 02 00 00       	mov    $0x200,%ecx
   f:	f3 48 a5             	rep movsq %ds:(%rsi),%es:(%rdi)
  12:	c3                   	retq 
The code is pretty straightforward and as expected.

The Intel compiler compiles it into:
00000000004005c0 &lt;MAIN&gt;:
  4005c0:       55                      push   %rbp
  4005c1:       48 89 e5                mov    %rsp,%rbp
  4005c4:       48 83 e4 80             and    $0xffffffffffffff80,%rsp
  4005c8:       48 81 ec 80 00 00 00    sub    $0x80,%rsp
  4005cf:       bf 03 00 00 00          mov    $0x3,%edi
  4005d4:       e8 c7 00 00 00          callq  4006a0 &amp;lt;__intel_new_proc_init&amp;gt;
  4005d9:       0f ae 1c 24             stmxcsr (%rsp)
  4005dd:       bf 00 9b 60 00          mov    $0x609b00,%edi
  4005e2:       be 00 ab 60 00          mov    $0x60ab00,%esi
  4005e7:       81 0c 24 40 80 00 00    orl    $0x8040,(%rsp)
  4005ee:       ba 00 10 00 00          mov    $0x1000,%edx
  4005f3:       0f ae 14 24             ldmxcsr (%rsp)
  4005f7:       e8 54 00 00 00          callq  400650 &amp;lt;_intel_fast_memcpy&amp;gt;
  4005fc:       33 c0                   xor    %eax,%eax
  4005fe:       48 89 ec                mov    %rbp,%rsp
  400601:       5d                      pop    %rbp
  400602:       c3                      retq   
  400603:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  400608:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40060f:       00
So the Intel compiler uses _intel_fast_memcpy.

From the performance perspective, the executable compiled by the Interl compiler isn't faster than compiled by gcc at all. I tried using VTune to profile the compiled code, and it shows me _intel_fast_memcpy uses most time, but it doesn't show me which instructions in _intel_fast_memcpy is time consuming.&lt;/MAIN&gt;&lt;/MAIN&gt;</description>
      <pubDate>Mon, 17 Dec 2012 22:48:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975779#M5591</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-17T22:48:07Z</dc:date>
    </item>
    <item>
      <title>I know this question is more</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975780#M5592</link>
      <description>I know this question is more related to the topics in other sections: how do I profile the code in the external library. As I said, I can't see instructions in _intel_fast_memcpy. I tried to link the C library to my program statically, then I got an error as:
$ amplxe-cl -collect hotspots ./rand-memcpy 1 8
Error: Binary file of the analysis target does not contain symbols required for profiling. See documentation for more details.
Error: Valid pthread_setcancelstate symbol is not found in the static binary of the analysis target.
So how do I profile _intel_fast_memcpy?

Thanks,
Da</description>
      <pubDate>Mon, 17 Dec 2012 22:58:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975780#M5592</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-17T22:58:45Z</dc:date>
    </item>
    <item>
      <title>I guess you're looking at 32</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975781#M5593</link>
      <description>I guess you're looking at 32-bit gcc, which appears to optimize for short or non-aligned copies.  If you use icc -static-intel, with long enough copies, you should be able to collect data to view __intel_fast_memcpy in assembly view.  It might be interesting to see whether the results change with alignment.</description>
      <pubDate>Tue, 18 Dec 2012 02:32:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975781#M5593</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-12-18T02:32:57Z</dc:date>
    </item>
    <item>
      <title>@zhengda1936</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975782#M5594</link>
      <description>@zhengda1936

Please follow this call instruction 4005f7: e8 54 00 00 00 callq 400650 &amp;lt;_intel_fast_memcpy&amp;gt;
It would be interesting to see the exact machine code implementation of that function.</description>
      <pubDate>Tue, 18 Dec 2012 06:13:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975782#M5594</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-12-18T06:13:53Z</dc:date>
    </item>
    <item>
      <title>Guys,</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975783#M5595</link>
      <description>Guys,

1. Please take a look / read the original post again because I really don't understand these continued pushes to disassemble / debug a &lt;STRONG&gt;memcpy&lt;/STRONG&gt; function currently used in his tests
.
2. The user is &lt;STRONG&gt;on a NUMA system&lt;/STRONG&gt; and this is a "different world" ( I don't have access to any such system at the moment )

The user clearly described that:
...
&lt;STRONG&gt;When I copy data in the local memory&lt;/STRONG&gt;, &lt;STRONG&gt;I can get&lt;/STRONG&gt; up to almost &lt;STRONG&gt;10GB/s&lt;/STRONG&gt;. However, &lt;STRONG&gt;when I copy data !!! from !!! remote memory&lt;/STRONG&gt;, &lt;STRONG&gt;I get&lt;/STRONG&gt; much worse performance, only around &lt;STRONG&gt;1GB/s&lt;/STRONG&gt;
...

He uses the &lt;STRONG&gt;same memcpy&lt;/STRONG&gt; function &lt;STRONG&gt;in both cases&lt;/STRONG&gt; and possibly experiences some hardware issue ( I can be wrong here ) and it has to be considered / taken into account. When he reads data from the remote memory it looks like he simply switches Source and Destination pointers in the the &lt;STRONG&gt;same memcpy&lt;/STRONG&gt; function.

A question to &lt;STRONG&gt;zhengda1936&lt;/STRONG&gt;,

Could you post C/C++ source codes of your test-case, please?</description>
      <pubDate>Tue, 18 Dec 2012 15:01:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975783#M5595</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-12-18T15:01:10Z</dc:date>
    </item>
    <item>
      <title>Quote:TimP (Intel) wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975784#M5596</link>
      <description>&lt;BLOCKQUOTE&gt;TimP (Intel) wrote:&lt;BR /&gt;&lt;P&gt;I guess you're looking at 32-bit gcc, which appears to optimize for short or non-aligned copies.  If you use icc -static-intel, with long enough copies, you should be able to collect data to view __intel_fast_memcpy in assembly view.  It might be interesting to see whether the results change with alignment.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
No, I run my program in a 64-bit Linux, and I have switched to the Intel compiler. 

I have tried -static-intel, but it seems it doesn't the intel library statically. With or without -static-intel, the compiled executable is exactly the same (I ran cmp to the two versions of executables). If I add -static as a linker option, I got the same error when I ran the executable under Vtune. 

BTW, I copy 1G memory and the memory is aligned to a page size.

Da</description>
      <pubDate>Tue, 18 Dec 2012 20:16:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975784#M5596</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-18T20:16:16Z</dc:date>
    </item>
    <item>
      <title>If it helps, I post my code</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975785#M5597</link>
      <description>If it helps, I post my code here. Basically, it allocates 1GB of memory aligned with a page size in a NUMA node, and tries to copy memory to a specified NUMA node.

#include &lt;STDIO.H&gt;
#include &lt;UNISTD.H&gt;
#include &lt;SYS&gt;
#include &lt;SYS&gt;
#include &lt;FCNTL.H&gt;
#include &lt;SYS&gt;
#include &lt;STDLIB.H&gt;
#include &lt;SYS&gt;
#include &lt;STRING.H&gt;
#include &lt;NUMA.H&gt;
#include &lt;NUMAIF.H&gt;

#define NUM_THREADS 64
#define PAGE_SIZE 4096
#define ENTRY_SIZE PAGE_SIZE
#define ARRAY_SIZE 1073741824

off_t *offset;
unsigned int nentries;
int nthreads;
struct timeval global_start;
char *array;
char *dst_arr;

void permute_offset(off_t *offset, int num)
{
	int i;
	for (i = num - 1; i &amp;gt;= 1; i--) {
		int j = random() % i;
		off_t tmp = offset&lt;J&gt;;
		offset&lt;J&gt; = offset&lt;I&gt;;
		offset&lt;I&gt; = tmp;
	}
}

float time_diff(struct timeval time1, struct timeval time2)
{
	return time2.tv_sec - time1.tv_sec
			+ ((float)(time2.tv_usec - time1.tv_usec))/1000000;
}

void rand_read(void *arg)
{
	int fd;
	ssize_t ret;
	int i, j, start_i, end_i;
	ssize_t read_bytes = 0;
	struct timeval start_time, end_time;

	start_i = (long) arg;
	end_i = start_i + nentries / nthreads;
	gettimeofday(&amp;amp;start_time, NULL);
	for (j = 0; j &amp;lt; 8; j++) {
		for (i = start_i; i &amp;lt; end_i; i++) {
			memcpy(dst_arr + offset&lt;I&gt;, array + offset&lt;I&gt;, ENTRY_SIZE);
			read_bytes += ENTRY_SIZE;
		}
	}
	gettimeofday(&amp;amp;end_time, NULL);
	printf("read %ld bytes, start at %f seconds, takes %f seconds\n",
			read_bytes, time_diff(global_start, start_time),
			time_diff(start_time, end_time));
	
	pthread_exit((void *) read_bytes);
}

int main(int argc, char *argv[])
{
	int ret;
	int i;
	struct timeval start_time, end_time;
	ssize_t read_bytes = 0;
	pthread_t threads[NUM_THREADS];
	/* the number of entries the array can contain. */
	int node;

	if (argc != 3) {
		fprintf(stderr, "read node_id num_threads\n");
		exit(1);
	}

	nentries = ARRAY_SIZE / ENTRY_SIZE;
	node = atoi(argv[1]);
	offset = valloc(sizeof(*offset) * nentries);
	for(i = 0; i &amp;lt; nentries; i++) {
		offset&lt;I&gt; = ((off_t) i) * ENTRY_SIZE;
	}
	permute_offset(offset, nentries);

#if 0
	int ncpus = numa_num_configured_cpus();
	printf("there are %d cores in the machine\n", ncpus);
	for (i = 0; i &amp;lt; ncpus; i++) {
		printf("cpu %d belongs to node %d\n",
			i, numa_node_of_cpu(i));
	}
#endif
	/* bind to node 0. */
	nodemask_t nodemask;
	nodemask_zero(&amp;amp;nodemask);
	nodemask_set_compat(&amp;amp;nodemask, 0);
	unsigned long maxnode = NUMA_NUM_NODES;
	if (set_mempolicy(MPOL_BIND,
				(unsigned long *) &amp;amp;nodemask, maxnode) &amp;lt; 0) {
		perror("set_mempolicy");
		exit(1);
	}
	printf("run on node 0\n");
	if (numa_run_on_node(0) &amp;lt; 0) {
		perror("numa_run_on_node");
		exit(1);
	}

	array = valloc(ARRAY_SIZE);
	/* we need to avoid the cost of page fault. */
	for (i = 0; i &amp;lt; ARRAY_SIZE; i += PAGE_SIZE)
		array&lt;I&gt; = 0;
	dst_arr = valloc(ARRAY_SIZE);
	/* we need to avoid the cost of page fault. */
	for (i = 0; i &amp;lt; ARRAY_SIZE; i += PAGE_SIZE)
		dst_arr&lt;I&gt; = 0;

	printf("run on node %d\n", node);
	if (numa_run_on_node(node) &amp;lt; 0) {
		perror("numa_run_on_node");
		exit(1);
	}

	nthreads = atoi(argv[2]);
	if (nthreads &amp;gt; NUM_THREADS) {
		fprintf(stderr, "too many threads\n");
		exit(1);
	}

	ret = setpriority(PRIO_PROCESS, getpid(), -20);
	if (ret &amp;lt; 0) {
		perror("setpriority");
		exit(1);
	}

	gettimeofday(&amp;amp;start_time, NULL);
	global_start = start_time;
	for (i = 0; i &amp;lt; nthreads; i++) {
		ret = pthread_create(&amp;amp;threads&lt;I&gt;, NULL,
				rand_read, (void *) (long) (nentries / nthreads * i));
		if (ret) {
			perror("pthread_create");
			exit(1);
		}
	}

	for (i = 0; i &amp;lt; nthreads; i++) {
		ssize_t size;
		ret = pthread_join(threads&lt;I&gt;, (void **) &amp;amp;size);
		if (ret) {
			perror("pthread_join");
			exit(1);
		}
		read_bytes += size;
	}
	gettimeofday(&amp;amp;end_time, NULL);
	printf("read %ld bytes, takes %f seconds\n",
			read_bytes, end_time.tv_sec - start_time.tv_sec
			+ ((float)(end_time.tv_usec - start_time.tv_usec))/1000000);
}&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/J&gt;&lt;/J&gt;&lt;/NUMAIF.H&gt;&lt;/NUMA.H&gt;&lt;/STRING.H&gt;&lt;/SYS&gt;&lt;/STDLIB.H&gt;&lt;/SYS&gt;&lt;/FCNTL.H&gt;&lt;/SYS&gt;&lt;/SYS&gt;&lt;/UNISTD.H&gt;&lt;/STDIO.H&gt;</description>
      <pubDate>Tue, 18 Dec 2012 20:25:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975785#M5597</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-18T20:25:08Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;The user is on a NUMA</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975786#M5598</link>
      <description>&amp;gt;&amp;gt;&amp;gt;The user is on a NUMA system and this is a "different world" ( I don't have access to any such system at the moment )&amp;gt;&amp;gt;&amp;gt;

.It could very helpful if  @zhengda1936 could post his hardware configuration.I'm sure that he has a quad CPU motherboard probably manufactured by TYAN or SuperMicro.

&amp;gt;&amp;gt;&amp;gt; Please take a look / read the original post again because I really don't understand these continued pushes to disassemble / debug a memcpy function currently used in his tests&amp;gt;&amp;gt;&amp;gt;

I think that  disassembling memcpy() function and revealing its exact machine code implementation could provide us with the some insight into
what is going under the hood.As I stated earlier in my post there is possibility that 'rep movsb' instruction is used by the compiler.
I do not exclude the possibility of some hardware related issue.

&amp;gt;&amp;gt;&amp;gt;When I copy data in the local memory, I can get up to almost 10GB/s. However, when I copy data !!! from !!! remote memory, I get much worse performance, only around 1GB/s&amp;gt;&amp;gt;&amp;gt;

It is obvious that in NUMA  architecture one can expect memory transfer speed degradation when for example CPU 0  is accessing non local memory(remote memory) from its relative "point of view".</description>
      <pubDate>Tue, 18 Dec 2012 20:37:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975786#M5598</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-12-18T20:37:55Z</dc:date>
    </item>
    <item>
      <title>Sure, I can provide the</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975787#M5599</link>
      <description>Sure, I can provide the hardware configuration. Here is the CPU info:
          description: CPU
          product: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
          vendor: Intel Corp.
          physical id: 400
          bus info: cpu@0
          version: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
          slot: CPU1
          size: 2200MHz
          capacity: 3600MHz
          width: 64 bits
          clock: 2905MHz

So each CPU has 8 cores and there are 4 CPUs in the machine.

Memory info:
             description: DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
             product: M393B2G70BH0-YH9
             vendor: 00CE00B300CE
             physical id: 0
             serial: 342F3D9C
             slot: DIMM_A1
             size: 16GiB
             width: 64 bits
             clock: 1333MHz (0.8ns)

What other hardware configuration info I should provide to help you diagnose?</description>
      <pubDate>Tue, 18 Dec 2012 20:50:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975787#M5599</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-18T20:50:09Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;So each CPU has 8 cores</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975788#M5600</link>
      <description>&amp;gt;&amp;gt;&amp;gt;So each CPU has 8 cores and there are 4 CPUs in the machine.&amp;gt;&amp;gt;&amp;gt;

Do you mean 8 threads/4 cores per CPU?
Have you experienced earlier memory speed degradation?</description>
      <pubDate>Tue, 18 Dec 2012 21:04:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975788#M5600</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-12-18T21:04:00Z</dc:date>
    </item>
    <item>
      <title>As I show above, gcc compiled</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975789#M5601</link>
      <description>As I show above, gcc compiled memcpy into "rep movsq", and Intel compiler invokes __intel_fast_memcpy.
I used perf to profile my program, and it seems it eventually invokes __intel_ssse3_rep_memcpy and the most time-consuming instructions are:
    0.06 :          405657:       movaps -0x10(%rsi),%xmm1
   38.23 :          40565b:       movaps %xmm1,-0x10(%rdi)
    1.00 :          40565f:       movaps -0x20(%rsi),%xmm2
    0.06 :          405663:       movaps %xmm2,-0x20(%rdi)
    0.18 :          405667:       movaps -0x30(%rsi),%xmm3
    0.06 :          40566b:       movaps %xmm3,-0x30(%rdi)
    0.05 :          40566f:       movaps -0x40(%rsi),%xmm4
    0.01 :          405673:       movaps %xmm4,-0x40(%rdi)
    0.10 :          405677:       movaps -0x50(%rsi),%xmm5
   41.82 :          40567b:       movaps %xmm5,-0x50(%rdi)
    0.47 :          40567f:       movaps -0x60(%rsi),%xmm5
    0.03 :          405683:       movaps %xmm5,-0x60(%rdi)
    0.06 :          405687:       movaps -0x70(%rsi),%xmm5
    0.01 :          40568b:       movaps %xmm5,-0x70(%rdi)
    0.04 :          40568f:       movaps -0x80(%rsi),%xmm5
    0.01 :          405693:       movaps %xmm5,-0x80(%rdi)
It seems one data copy triggers moving 64 bytes to the remote node, so only the first data copy consumes most CPU time. I thought one data copy would trigger moving 128 bytes to a remote node (since the cache line is 128 bytes).</description>
      <pubDate>Tue, 18 Dec 2012 21:43:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975789#M5601</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-18T21:43:35Z</dc:date>
    </item>
    <item>
      <title>Quote:iliyapolak wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975790#M5602</link>
      <description>&lt;BLOCKQUOTE&gt;iliyapolak wrote:&lt;BR /&gt;&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;So each CPU has 8 cores and there are 4 CPUs in the machine.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;Do you mean 8 threads/4 cores per CPU?&lt;BR /&gt;
Have you experienced earlier memory speed degradation?&lt;/P&gt;&lt;/BLOCKQUOTE&gt;

No, 16 threads/8 cores per CPU. 
What do you mean by earlier memory speed degradation?
The local memory copy speed is expected.</description>
      <pubDate>Tue, 18 Dec 2012 21:46:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975790#M5602</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2012-12-18T21:46:48Z</dc:date>
    </item>
    <item>
      <title>Hi Da,</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975791#M5603</link>
      <description>Hi Da,

It is always a right thing to follow a &lt;STRONG&gt;top-to-down approach&lt;/STRONG&gt; when investigating a problem. That is:

- Source codes -&amp;gt;
- Analysis -&amp;gt;
- Is there a hardware problem?
- Are there any logical errors in the codes? -&amp;gt;
- Could I reproduce a problem? -&amp;gt;
- Could I simplify the test-case? -&amp;gt;
- Could I remove some dependencies on 3rd party software components -&amp;gt;
- Why does my application crash ( if this is the case )? -&amp;gt;
- What else could be wrong with &lt;STRONG&gt;my codes&lt;/STRONG&gt;?
- Etc.

It means, that if a C/C++ developer will try to do some investigation in opposite way, that is following a &lt;STRONG&gt;down-to-top approach&lt;/STRONG&gt; ( dissassembling first all the rest later ), a significant amount of a project time could be wasted.

From my point of view a &lt;STRONG&gt;Summary&lt;/STRONG&gt; of the problem could look like:

- Possible logical problem with the test-case ( very high possibility )
- Possible oversubscription of the processing threads ( high possibility )
- Possible hardware issue with the NUMA system ( very low possibility )
- Possible problem with CRT memcpy function ( low possibility )

A simplified test-case is needed &lt;STRONG&gt;without changing priorities&lt;/STRONG&gt; of any threads or a process and ideally it would be nice to have just one thread of normal priority. This is needed to verify that NUMA system doesn't have any hardware issues.

A logic for the simplified test-case could look like:

- one thread test application
- allocate a memory block in a 'local' memory
- copy some data ( some number of times to get an average time )
- invalidate cache lines somehow
- read some data ( some number of times to get an average time )
- save performance numbers
- allocate a memory block in a 'remote' memory
- copy some data ( some number of times to get an average time )
- invalidate cache lines somehow
- read some data ( some number of times to get an average time )
- save performance numbers
- compare results
- repeat the test with more threads ( increase by 2 every time ) until it reaches 64

&lt;STRONG&gt;1.&lt;/STRONG&gt; After a very quick code review of the test-case I noticed that a priority of the executing process is changed:
...
setpriority( PRIO_PROCESS, getpid(), -20 );
...
Why do you change the priority of the process?

&lt;STRONG&gt;2.&lt;/STRONG&gt; In order to clear &lt;STRONG&gt;any uncertanties&lt;/STRONG&gt; with the 'memcpy' function I recommend to replace it with an external pure C function ( a couple of minutes to implement, right? )

&lt;STRONG&gt;3.&lt;/STRONG&gt; A Virtual Memory Manager ( VMM ) on any OS should have 'Above Normal' or 'High' priority. If processing thread(s) in some test have higher priorities then VMM will be preempted most of the time and any memory operations using 'mem'-like CRT functions will be affected. Also, there will be a performance degradation of the whole operating system. If processing thread(s) have lower priorities, like 'Below Normal' or 'Idle', then they will be preempted most of the time and performance of the test will be affected.

&lt;STRONG&gt;4.&lt;/STRONG&gt; A brief high-level overview of the test-case will also help

Best regards,
Sergey</description>
      <pubDate>Tue, 18 Dec 2012 23:08:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/The-best-method-for-inter-processor-data-communication/m-p/975791#M5603</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-12-18T23:08:00Z</dc:date>
    </item>
  </channel>
</rss>

