<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic dcopy and MOVAPS (request for comments) in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899484#M11151</link>
    <description>I've just run into some difficulty compiling and running a fortran code. After &lt;BR /&gt;a while I was able to reproduce the problem using the following toy code:&lt;BR /&gt;&lt;BR /&gt; program test&lt;BR /&gt;&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer :: maxorb&lt;BR /&gt; parameter(maxorb=1024)&lt;BR /&gt;&lt;BR /&gt; type :: t4_type&lt;BR /&gt; integer :: firstint&lt;BR /&gt; integer :: occ(maxorb)&lt;BR /&gt; end type t4_type&lt;BR /&gt;&lt;BR /&gt; type :: t4_type_2&lt;BR /&gt; integer :: firstint, secondint&lt;BR /&gt; integer :: occ(maxorb)&lt;BR /&gt; end type t4_type_2&lt;BR /&gt;&lt;BR /&gt; integer norb, i, occi(maxorb)&lt;BR /&gt;&lt;BR /&gt; type (t4_type) :: t4&lt;BR /&gt; type (t4_type_2) :: t42&lt;BR /&gt;&lt;BR /&gt; real*8 :: pop(maxorb)&lt;BR /&gt;&lt;BR /&gt; do i=1,maxorb&lt;BR /&gt; pop(i) = i&lt;BR /&gt; t4%occ(i) = 0&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt; norb = 250&lt;BR /&gt;&lt;BR /&gt; call testa (norb,pop,occi)&lt;BR /&gt; print *, "first"&lt;BR /&gt; call testa (norb,pop,t42%occ)&lt;BR /&gt; print *, "second"&lt;BR /&gt; call testa (norb,pop,t4%occ)&lt;BR /&gt; print *, "third"&lt;BR /&gt;&lt;BR /&gt; end&lt;BR /&gt;&lt;BR /&gt; subroutine testa (norb,pop,occ)&lt;BR /&gt;&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer norb&lt;BR /&gt; real*8 pop(1), occ&lt;BR /&gt;&lt;BR /&gt; call dcopy(norb,pop,1,occ,1)&lt;BR /&gt; &lt;BR /&gt; return&lt;BR /&gt; end&lt;BR /&gt;&lt;BR /&gt;I am copying 250 double into three different arrays of 1024 integer, so &lt;BR /&gt;it should be a "legitimate" operation. If I compile the code using the &lt;BR /&gt;intel ifort 9.1 20070109 (I am using mkl_8.1.1):&lt;BR /&gt;&lt;BR /&gt;$ ifort -check all -r8 -g -o test main.F90 -L/opt/intel/mkl/lib/em64t &lt;BR /&gt; -lmkl -lguide -lpthread&lt;BR /&gt;&lt;BR /&gt;I obtain a segmentation fault running the "test" program on an Intel &lt;BR /&gt;Xeon CPU 5160 3.00GHz:&lt;BR /&gt;&lt;BR /&gt;$ ./test &lt;BR /&gt; first&lt;BR /&gt; second&lt;BR /&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;BR /&gt;Image PC Routine Line Source &lt;BR /&gt;libpthread.so.0 000000307930C430 Unknown Unknown Unknown&lt;BR /&gt;libmkl_mc.so 0000002A959B8CD5 Unknown Unknown Unknown&lt;BR /&gt;&lt;BR /&gt;Now the first two calls to testa() work perfectly, but not the third.&lt;BR /&gt;Using gdb is quite easy to see that the problem occurs inside a &lt;BR /&gt;function named Steps1_X8_Y16_Loop32, and specifically here:&lt;BR /&gt;&lt;BR /&gt;0x0000002a959b8cd5 &lt;STEPS1_X8_Y16_LOOP32&gt;: movaps %xmm0,0x0(%rcx)&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;(gdb) print $rcx&lt;BR /&gt;$1 = 5684076&lt;BR /&gt;(gdb) x/2g $rcx&lt;BR /&gt;0x56bb6c &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;&lt;BR /&gt;it seems a valid memory address but not aligned on a 16-byte boundary as &lt;BR /&gt;requested by MOVAPS instruction, is it correct ? (The same code works on an &lt;BR /&gt;Intel Pentium D CPU 3.00GHz.) &lt;BR /&gt;&lt;BR /&gt;thanks in advance for your help&lt;BR /&gt;loriano&lt;BR /&gt;&lt;BR /&gt;&lt;/TEST_&gt;&lt;/STEPS1_X8_Y16_LOOP32&gt;</description>
    <pubDate>Thu, 19 Feb 2009 17:53:47 GMT</pubDate>
    <dc:creator>redo</dc:creator>
    <dc:date>2009-02-19T17:53:47Z</dc:date>
    <item>
      <title>dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899484#M11151</link>
      <description>I've just run into some difficulty compiling and running a fortran code. After &lt;BR /&gt;a while I was able to reproduce the problem using the following toy code:&lt;BR /&gt;&lt;BR /&gt; program test&lt;BR /&gt;&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer :: maxorb&lt;BR /&gt; parameter(maxorb=1024)&lt;BR /&gt;&lt;BR /&gt; type :: t4_type&lt;BR /&gt; integer :: firstint&lt;BR /&gt; integer :: occ(maxorb)&lt;BR /&gt; end type t4_type&lt;BR /&gt;&lt;BR /&gt; type :: t4_type_2&lt;BR /&gt; integer :: firstint, secondint&lt;BR /&gt; integer :: occ(maxorb)&lt;BR /&gt; end type t4_type_2&lt;BR /&gt;&lt;BR /&gt; integer norb, i, occi(maxorb)&lt;BR /&gt;&lt;BR /&gt; type (t4_type) :: t4&lt;BR /&gt; type (t4_type_2) :: t42&lt;BR /&gt;&lt;BR /&gt; real*8 :: pop(maxorb)&lt;BR /&gt;&lt;BR /&gt; do i=1,maxorb&lt;BR /&gt; pop(i) = i&lt;BR /&gt; t4%occ(i) = 0&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt; norb = 250&lt;BR /&gt;&lt;BR /&gt; call testa (norb,pop,occi)&lt;BR /&gt; print *, "first"&lt;BR /&gt; call testa (norb,pop,t42%occ)&lt;BR /&gt; print *, "second"&lt;BR /&gt; call testa (norb,pop,t4%occ)&lt;BR /&gt; print *, "third"&lt;BR /&gt;&lt;BR /&gt; end&lt;BR /&gt;&lt;BR /&gt; subroutine testa (norb,pop,occ)&lt;BR /&gt;&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer norb&lt;BR /&gt; real*8 pop(1), occ&lt;BR /&gt;&lt;BR /&gt; call dcopy(norb,pop,1,occ,1)&lt;BR /&gt; &lt;BR /&gt; return&lt;BR /&gt; end&lt;BR /&gt;&lt;BR /&gt;I am copying 250 double into three different arrays of 1024 integer, so &lt;BR /&gt;it should be a "legitimate" operation. If I compile the code using the &lt;BR /&gt;intel ifort 9.1 20070109 (I am using mkl_8.1.1):&lt;BR /&gt;&lt;BR /&gt;$ ifort -check all -r8 -g -o test main.F90 -L/opt/intel/mkl/lib/em64t &lt;BR /&gt; -lmkl -lguide -lpthread&lt;BR /&gt;&lt;BR /&gt;I obtain a segmentation fault running the "test" program on an Intel &lt;BR /&gt;Xeon CPU 5160 3.00GHz:&lt;BR /&gt;&lt;BR /&gt;$ ./test &lt;BR /&gt; first&lt;BR /&gt; second&lt;BR /&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;BR /&gt;Image PC Routine Line Source &lt;BR /&gt;libpthread.so.0 000000307930C430 Unknown Unknown Unknown&lt;BR /&gt;libmkl_mc.so 0000002A959B8CD5 Unknown Unknown Unknown&lt;BR /&gt;&lt;BR /&gt;Now the first two calls to testa() work perfectly, but not the third.&lt;BR /&gt;Using gdb is quite easy to see that the problem occurs inside a &lt;BR /&gt;function named Steps1_X8_Y16_Loop32, and specifically here:&lt;BR /&gt;&lt;BR /&gt;0x0000002a959b8cd5 &lt;STEPS1_X8_Y16_LOOP32&gt;: movaps %xmm0,0x0(%rcx)&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;(gdb) print $rcx&lt;BR /&gt;$1 = 5684076&lt;BR /&gt;(gdb) x/2g $rcx&lt;BR /&gt;0x56bb6c &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;&lt;BR /&gt;it seems a valid memory address but not aligned on a 16-byte boundary as &lt;BR /&gt;requested by MOVAPS instruction, is it correct ? (The same code works on an &lt;BR /&gt;Intel Pentium D CPU 3.00GHz.) &lt;BR /&gt;&lt;BR /&gt;thanks in advance for your help&lt;BR /&gt;loriano&lt;BR /&gt;&lt;BR /&gt;&lt;/TEST_&gt;&lt;/STEPS1_X8_Y16_LOOP32&gt;</description>
      <pubDate>Thu, 19 Feb 2009 17:53:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899484#M11151</guid>
      <dc:creator>redo</dc:creator>
      <dc:date>2009-02-19T17:53:47Z</dc:date>
    </item>
    <item>
      <title>Re: dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899485#M11152</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
It's difficult to support such an old version of MKL. I thought that MKL dcopy would not be threaded, and it should perform its own peeling to reach an aligned boundary before using movaps to store data. If it is not accounted for by a different number of threads, it's hard to account for a different behavior between CPUs as similar as yours.&lt;BR /&gt;My impression of MKL dcopy is that it's not supported for performance tuning, only for compatibility with existing source code, now that current Intel compilers make automatic fast_memcpy substitutions, so this seems to be a seldom visited subject.&lt;BR /&gt;</description>
      <pubDate>Thu, 19 Feb 2009 18:43:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899485#M11152</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-02-19T18:43:41Z</dc:date>
    </item>
    <item>
      <title>Re: dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899486#M11153</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; It's difficult to support such an old version of MKL. I thought that MKL dcopy would not be threaded, and it should perform its own peeling to reach an aligned boundary before using movaps to store data. If it is not accounted for by a different number of threads, it's hard to account for a different behavior between CPUs as similar as yours.&lt;BR /&gt;My impression of MKL dcopy is that it's not supported for performance tuning, only for compatibility with existing source code, now that current Intel compilers make automatic fast_memcpy substitutions, so this seems to be a seldom visited subject.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Dear tim thanks for your comments. Following your suggestion &lt;BR /&gt;I tried using the new version of the Intel compiler (ifort 11.0 20081105),&lt;BR /&gt;as well as a new version of the Intel Math Kernel Library (mkl 10.1.0.015). &lt;BR /&gt;I tried the same: &lt;BR /&gt;&lt;BR /&gt;$ ifort -check all -r8 -g -o test main.F90 &lt;BR /&gt; -L/home/redo/intel/mkl/10.1.0.015/lib/em64t -lmkl -lguide&lt;BR /&gt;&lt;BR /&gt;and again the program runs without any problem on the Pentium D,&lt;BR /&gt;instead when I run it on the Xeon:&lt;BR /&gt;&lt;BR /&gt;$ ./test &lt;BR /&gt; first&lt;BR /&gt; second&lt;BR /&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;BR /&gt;Image PC Routine Line Source &lt;BR /&gt;libpthread.so.0 000000307930C430 Unknown Unknown Unknown&lt;BR /&gt;libmkl_mc.so 0000002A96D19664 Unknown Unknown Unknown &lt;BR /&gt;&lt;BR /&gt;0x0000002a96d19664 &lt;STEPS1_X8_Y16_LOOP32GAS_1&gt;: movaps %xmm0,(%rcx)&lt;BR /&gt;&lt;BR /&gt;(gdb) print $rcx&lt;BR /&gt;$1 = 5963436&lt;BR /&gt;(gdb) x/2g $rcx&lt;BR /&gt;0x5afeac &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;&lt;BR /&gt;In any case seems to be more an intrinsic "limitation" of the fortran language&lt;BR /&gt;itself than an ifort or mkl problem. &lt;BR /&gt;&lt;BR /&gt;loriano&lt;BR /&gt;&lt;BR /&gt;&lt;/TEST_&gt;&lt;/STEPS1_X8_Y16_LOOP32GAS_1&gt;</description>
      <pubDate>Fri, 20 Feb 2009 08:47:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899486#M11153</guid>
      <dc:creator>redo</dc:creator>
      <dc:date>2009-02-20T08:47:51Z</dc:date>
    </item>
    <item>
      <title>Re: dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899487#M11154</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Could you consider submitting the test case on premier.intel.com? I'd still be curious whether it depends on number of threads (set OMP_NUM_THREADS to same value on each platform) or on whether the more current libiomp5 is used in place of libguide.&lt;BR /&gt;</description>
      <pubDate>Fri, 20 Feb 2009 14:04:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899487#M11154</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-02-20T14:04:16Z</dc:date>
    </item>
    <item>
      <title>Re: dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899488#M11155</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; Could you consider submitting the test case on premier.intel.com? I'd still be curious whether it depends on number of threads (set OMP_NUM_THREADS to same value on each platform) or on whether the more current libiomp5 is used in place of libguide.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;ok I'll try to &lt;EM&gt;submit the test case.&lt;BR /&gt;&lt;/EM&gt;</description>
      <pubDate>Mon, 23 Feb 2009 11:27:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899488#M11155</guid>
      <dc:creator>redo</dc:creator>
      <dc:date>2009-02-23T11:27:07Z</dc:date>
    </item>
    <item>
      <title>Re: dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899489#M11156</link>
      <description>&lt;DIV style="margin:0px;"&gt;Hi redo, &lt;BR /&gt;please try to link your example with the following command line:&lt;BR /&gt;ifort -check all -r8 -g -I/opt/intel/mkl/10.1.0.015/include test.f90 &lt;BR /&gt;/opt/intel/mkl/10.1.0.015/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.1.0.015/lib/em64t/libmkl_intel_thread.a &lt;BR /&gt;/opt/intel/mkl/10.1.0.015/lib/em64t/libmkl_core.a &lt;BR /&gt;-L/opt/intel/mkl/10.1.0.015/lib/em64t -liomp5 -lpthread -lm -o test.out&lt;BR /&gt;--Gennady&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Mon, 23 Feb 2009 15:20:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899489#M11156</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2009-02-23T15:20:46Z</dc:date>
    </item>
    <item>
      <title>Re: dcopy and MOVAPS (request for comments)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899490#M11157</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/334681"&gt;Gennady Fedorov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;Hi redo, &lt;BR /&gt;please try to link your example with the following command line:&lt;BR /&gt;ifort -check all -r8 -g -I/opt/intel/mkl/10.1.0.015/include test.f90 &lt;BR /&gt;/opt/intel/mkl/10.1.0.015/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.1.0.015/lib/em64t/libmkl_intel_thread.a &lt;BR /&gt;/opt/intel/mkl/10.1.0.015/lib/em64t/libmkl_core.a &lt;BR /&gt;-L/opt/intel/mkl/10.1.0.015/lib/em64t -liomp5 -lpthread -lm -o test.out&lt;BR /&gt;--Gennady&lt;/DIV&gt;
&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi &lt;EM&gt;Gennady, &lt;BR /&gt; I just tried the following:&lt;BR /&gt;&lt;BR /&gt;$ ifort -check all -r8 -g -I/home/redo/intel/mkl/10.1.0.015/include test.f90 /home/redo/intel/mkl/10.1.0.015/lib/em64t/libmkl_intel_lp64.a /home/redo/intel/mkl/10.1.0.015/lib/em64t/libmkl_intel_thread.a /home/redo/intel/mkl/10.1.0.015/lib/em64t/libmkl_core.a -L/home/redo/intel/mkl/10.1.0.015/lib/em64t -liomp5 -lpthread -lm -o test.out&lt;BR /&gt;&lt;BR /&gt;I am working on a:&lt;BR /&gt;&lt;BR /&gt;Red Hat Enterprise Linux AS release 4 (Nahant Update 4)&lt;BR /&gt;&lt;BR /&gt;kernel:&lt;BR /&gt;&lt;BR /&gt;Linux 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux&lt;BR /&gt;&lt;BR /&gt;$ ldd ./test.out &lt;BR /&gt; libiomp5.so =&amp;gt; /home/redo/intel/Compiler/11.0/074/lib/intel64/libiomp5.so (0x0000002a95557000)&lt;BR /&gt; libpthread.so.0 =&amp;gt; /lib64/tls/libpthread.so.0 (0x0000003079300000)&lt;BR /&gt; libimf.so =&amp;gt; /home/redo/intel/Compiler/11.0/074/lib/intel64/libimf.so (0x0000002a95718000)&lt;BR /&gt; libm.so.6 =&amp;gt; /lib64/tls/libm.so.6 (0x0000003078b00000)&lt;BR /&gt; libc.so.6 =&amp;gt; /lib64/tls/libc.so.6 (0x0000003078800000)&lt;BR /&gt; libgcc_s.so.1 =&amp;gt; /lib64/libgcc_s.so.1 (0x000000307af00000)&lt;BR /&gt; libdl.so.2 =&amp;gt; /lib64/libdl.so.2 (0x0000003078600000)&lt;BR /&gt; /lib64/ld-linux&lt;/EM&gt;&lt;EM&gt;-x86-64.so.2 (0x0000003078400000)&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;$ ./test.&lt;/EM&gt;&lt;EM&gt;out &lt;BR /&gt; first&lt;BR /&gt; second&lt;BR /&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;BR /&gt;&lt;BR /&gt;the problem is still there:&lt;BR /&gt;&lt;BR /&gt;(gdb) r&lt;BR /&gt;Starting program: /home/redo/molecule/aug-cc-PVDZ/test/test &lt;BR /&gt;[Thread debugging using libthread_db enabled]&lt;BR /&gt;[New Thread 182915222752 (LWP 22767)]&lt;BR /&gt; first&lt;BR /&gt; second&lt;BR /&gt;&lt;BR /&gt;Program received signal SIGSEGV, Segmentation fault.&lt;BR /&gt;[Switching to Thread 182915222752 (LWP 22767)]&lt;BR /&gt;0x0000002a96d19664 in Steps1_X8_Y16_Loop32gas_1 () from /home/redo/intel/Compiler/11.0/074/mkl/lib/em64t/libmkl_mc.so&lt;BR /&gt;(gdb) disassemble &lt;BR /&gt;...&lt;BR /&gt;0x0000002a96d19664 &lt;STEPS1_X8_Y16_LOOP32GAS_1&gt;: movaps %xmm0,(%rcx)&lt;BR /&gt;...&lt;BR /&gt;(gdb) print $rcx&lt;BR /&gt;$1 = 5963436&lt;BR /&gt;(gdb) x/8g $rcx&lt;BR /&gt;0x5afeac &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;0x5afebc &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;0x5afecc &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;0x5afedc &lt;TEST_&gt;: 0x0000000000000000 0x0000000000000000&lt;BR /&gt;&lt;BR /&gt;the cpuinfo:&lt;BR /&gt;&lt;BR /&gt;processor : 0&lt;BR /&gt;vendor_id : GenuineIntel&lt;BR /&gt;cpu family : 6&lt;BR /&gt;model  : 15&lt;BR /&gt;model name : Intel Xeon CPU 5160 @ 3.00GHz&lt;BR /&gt;stepping : 6&lt;BR /&gt;cpu MHz  : 3000.111&lt;BR /&gt;cache size : 4096 KB&lt;BR /&gt;physical id : 0&lt;BR /&gt;siblings : 2&lt;BR /&gt;core id  : 0&lt;BR /&gt;cpu cores : 2&lt;BR /&gt;fpu  : yes&lt;BR /&gt;fpu_exception : yes&lt;BR /&gt;cpuid level : 10&lt;BR /&gt;wp  : yes&lt;BR /&gt;flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr&lt;BR /&gt;bogomips : 6004.34&lt;BR /&gt;clflush size : 64&lt;BR /&gt;cache_alignment : 64&lt;BR /&gt;address sizes : 36 bits physical, 48 bits virtual&lt;BR /&gt;power management:&lt;BR /&gt;&lt;BR /&gt;cut...&lt;BR /&gt;&lt;BR /&gt;thanks for your answer&lt;BR /&gt;Loriano&lt;BR /&gt;&lt;/TEST_&gt;&lt;/TEST_&gt;&lt;/TEST_&gt;&lt;/TEST_&gt;&lt;/STEPS1_X8_Y16_LOOP32GAS_1&gt;&lt;/EM&gt;</description>
      <pubDate>Tue, 24 Feb 2009 07:41:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dcopy-and-MOVAPS-request-for-comments/m-p/899490#M11157</guid>
      <dc:creator>redo</dc:creator>
      <dc:date>2009-02-24T07:41:22Z</dc:date>
    </item>
  </channel>
</rss>

