<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Slow allocatable arrays in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874729#M73446</link>
    <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Hi&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I had actually meant to put in the timings with compiler version 10.0.025 ...Here are those&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 144pt;" border="0" cellspacing="0" cellpadding="0" width="192"&gt;
&lt;COL style="width: 48pt;" span="3" width="64" /&gt; 
&lt;TBODY&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt; width: 96pt;" colspan="2" width="128" height="17"&gt;Compiler 10.0.025&lt;/TD&gt;
&lt;TD style="width: 48pt;" width="64"&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;Win32&lt;/TD&gt;
&lt;TD&gt;x64&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;Loop&lt;/TD&gt;
&lt;TD align="right"&gt;8.613281&lt;/TD&gt;
&lt;TD align="right"&gt;4.4375&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;WithDim&lt;/TD&gt;
&lt;TD align="right"&gt;8.316406&lt;/TD&gt;
&lt;TD align="right"&gt;4.292969&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;NOTDim&lt;/TD&gt;
&lt;TD align="right"&gt;8.425781&lt;/TD&gt;
&lt;TD align="right"&gt;8.382813&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;Below are the Win32 and x64 build logs with 10.0.025&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Abhi&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;================&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;[1] Win32&lt;/P&gt;
&lt;P&gt;Deleting intermediate files and output files for project 'Test_AllocateSpeed', configuration 'Release|Win32'.&lt;BR /&gt;Compiling with Intel Fortran Compiler 10.0.025 [IA-32]...&lt;BR /&gt;ifort /nologo /module:"Release" /object:"Release" /libs:static /threads /c /Qvc8 /Qlocation,link,"C:Program Files (x86)Microsoft Visual Studio 8VCbin" "C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90"&lt;BR /&gt;Linking...&lt;BR /&gt;Link /OUT:"ReleaseTest_AllocateSpeed.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE /IMPLIB:"C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.lib" "ReleaseTest_AllocateSpeed.obj"&lt;BR /&gt;Link: executing 'link'&lt;BR /&gt;&lt;BR /&gt;Embedding manifest...&lt;BR /&gt;mt.exe /nologo /outputresource:"C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.exe;#1" /manifest "C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.exe.intermediate.manifest"&lt;BR /&gt;&lt;BR /&gt;Test_AllocateSpeed - 0 error(s), 0 warning(s)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;[2] x64&lt;/P&gt;
&lt;P&gt;Deleting intermediate files and output files for project 'Test_AllocateSpeed', configuration 'Release|x64'.&lt;BR /&gt;Compiling with Intel Fortran Compiler 10.0.025 [Intel 64]...&lt;BR /&gt;ifort /nologo /module:"x64Release" /object:"x64Release" /libs:static /threads /c /Qvc8 /Qlocation,link,"C:Program Files (x86)Microsoft Visual Studio 8VCbinx86_amd64" "C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90"&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(34): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(47): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(48): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(58): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(59): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;&lt;BR /&gt;Linking...&lt;BR /&gt;Link /OUT:"x64ReleaseTest_AllocateSpeed.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE /IMPLIB:"C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.lib" "x64ReleaseTest_AllocateSpeed.obj"&lt;BR /&gt;Link: executing 'link'&lt;BR /&gt;&lt;BR /&gt;Embedding manifest...&lt;BR /&gt;mt.exe /nologo /outputresource:"C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.exe;#1" /manifest "C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.exe.intermediate.manifest"&lt;BR /&gt;&lt;BR /&gt;Test_AllocateSpeed - 0 error(s), 0 warning(s)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 06 Oct 2008 17:29:22 GMT</pubDate>
    <dc:creator>abhimodak</dc:creator>
    <dc:date>2008-10-06T17:29:22Z</dc:date>
    <item>
      <title>Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874717#M73434</link>
      <description>&lt;P&gt;Dear all. I faced with problem that program works with large allocatable arrays much slower then with static arrays. Below is simple code that initialize large array. In case of static arrays this code works 10! times faster.&lt;BR /&gt;I'm using Intel Fortran Compiler 10.0 under Windows with 2GB RAM.&lt;BR /&gt;&lt;BR /&gt;Anybody now what is the reason and what to do to make allocatable arrays work faster?&lt;BR /&gt;&lt;BR /&gt;!integer, parameter :: NP=10000000&lt;BR /&gt;integer NP&lt;BR /&gt;real, allocatable :: X(:),Y(:)&lt;BR /&gt;!real X(10000000),Y(10000000)&lt;BR /&gt;integer i,k,ist,iend,icountrate&lt;BR /&gt;&lt;BR /&gt;NP = 10000000;&lt;BR /&gt;allocate(X(NP),Y(NP))&lt;BR /&gt;&lt;BR /&gt;do k = 1, 100&lt;BR /&gt;do i = 1, NP&lt;BR /&gt; X(i) = 0.&lt;BR /&gt; Y(i) = 0.&lt;BR /&gt;enddo&lt;BR /&gt;enddo&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 12:43:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874717#M73434</guid>
      <dc:creator>brovchik</dc:creator>
      <dc:date>2008-10-03T12:43:29Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874718#M73435</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/405063"&gt;brovchik&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;P&gt;Dear all. I faced with problem that program works with large allocatable arrays much slower then with static arrays. Below is simple code that initialize large array. In case of static arrays this code works 10! times faster.&lt;BR /&gt;I'm using Intel Fortran Compiler 10.0 under Windows with 2GB RAM.&lt;BR /&gt;&lt;BR /&gt;Anybody now what is the reason and what to do to make allocatable arrays work faster?&lt;BR /&gt;&lt;BR /&gt;!integer, parameter :: NP=10000000&lt;BR /&gt;integer NP&lt;BR /&gt;real, allocatable :: X(:),Y(:)&lt;BR /&gt;!real X(10000000),Y(10000000)&lt;BR /&gt;integer i,k,ist,iend,icountrate&lt;BR /&gt;&lt;BR /&gt;NP = 10000000;&lt;BR /&gt;allocate(X(NP),Y(NP))&lt;BR /&gt;&lt;BR /&gt;do k = 1, 100&lt;BR /&gt;do i = 1, NP&lt;BR /&gt; X(i) = 0.&lt;BR /&gt; Y(i) = 0.&lt;BR /&gt;enddo&lt;BR /&gt;enddo&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 13:01:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874718#M73435</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2008-10-03T13:01:16Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874719#M73436</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Did you check whether your outer loop is being executed in both cases? For years, many people have preferred compilers which can shortcut repetitions such as this. On a browser search, you will see 20 year old examples of precautions taken in artificial benchmarks, to prevent a compiler optimizing away extra loops.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 13:08:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874719#M73436</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2008-10-03T13:08:52Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874720#M73437</link>
      <description>&lt;DIV style="margin:0px;"&gt;That example is short enough that you should be able to look at the generated assembly and see the differences. If you can't read assembly try posing both versions here.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Set a breakpoint in the loop and debug to it; right-click, got to disassembly. copy the surrounding several dozen lines to the clipboard and paste them in.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;You don't say what kind of processor you're using. An unfortunate cache stride or some kind of prefetch disagreement seems unlikely with such a simple example but it's a remote possibility.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Your arrays are large enough that the performance of this code will be determined by how well the cache is managed. Optimal code will be right at 10x better than bad code, on current x64 hardware. (using write combining, no-fetch sequences versus actually updating individual cells one by one).&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Other remote possibilities include: alignment problems. NaNs in the heap. Even more exotic stuff I can't imagine right now.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Run both under VTune and see what's going on.&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 03 Oct 2008 15:34:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874720#M73437</guid>
      <dc:creator>Steve_Nuchia</dc:creator>
      <dc:date>2008-10-03T15:34:27Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874721#M73438</link>
      <description>&lt;DIV style="margin:0px;"&gt;If the outer loop is being optimized away in both case, the most likely explanation is that you are counting the Heap allocation overhead in one case and you are not counting the loader's allocation of the static arrays in the other. How are you measuing elapsed time? What are the elapsed times and what is your hardware?&lt;/DIV&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 15:36:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874721#M73438</guid>
      <dc:creator>Steve_Nuchia</dc:creator>
      <dc:date>2008-10-03T15:36:56Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874722#M73439</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/265453"&gt;snuchia@statsoft.com&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;If the outer loop is being optimized away in both case, the most likely explanation is that you are counting the Heap allocation overhead in one case and you are not counting the loader's allocation of the static arrays in the other. How are you measuing elapsed time? What are the elapsed times and what is your hardware?&lt;/DIV&gt;
&lt;P&gt;&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;Thanks a lot for the answer! I'm using Intel Core 2 CPU 2.4GHz. Time i measure using system_time(ist,icount_rate) routine. But the time difference is very well seen by eye. Under release configuration i have time 4.04s with allocatable arrays and 0.42s with static array. After you said about debugging I test it again under debug configuration and got times 9.18s and 5.8s. Not so extreme difference but still significant.&lt;/P&gt;
&lt;P&gt;All compiler option were default. Any changes that i could imagine in compiler options did not help. I tried also to use diferent options /heap-arrays[:size] but did not succeed. I have a feeling that something wrong with acces to the memory but can not understand what to do. Task Manager says that I have 80Mb Memory usage in case of allocatable arays (wich is normal because I have 2*10^7 real*4) and only 1.6Mb in case of static array.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 20:54:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874722#M73439</guid>
      <dc:creator>brovchik</dc:creator>
      <dc:date>2008-10-03T20:54:45Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874723#M73440</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/265453"&gt;snuchia@statsoft.com&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;That example is short enough that you should be able to look at the generated assembly and see the differences. If you can't read assembly try posing both versions here.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Set a breakpoint in the loop and debug to it; right-click, got to disassembly. copy the surrounding several dozen lines to the clipboard and paste them in.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;You don't say what kind of processor you're using. An unfortunate cache stride or some kind of prefetch disagreement seems unlikely with such a simple example but it's a remote possibility.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Your arrays are large enough that the performance of this code will be determined by how well the cache is managed. Optimal code will be right at 10x better than bad code, on current x64 hardware. (using write combining, no-fetch sequences versus actually updating individual cells one by one).&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Other remote possibilities include: alignment problems. NaNs in the heap. Even more exotic stuff I can't imagine right now.&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;Run both under VTune and see what's going on.&lt;BR /&gt;&lt;/DIV&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;I can not read dissasembly, so, i'm tring to show it here. I heard also (do not remember where) that for allocatable arrays program perform some aditional checks about memory storage adresses. May be it can be seen on dissasembly:&lt;/P&gt;
&lt;P&gt;so, allocatable variant:&lt;/P&gt;
&lt;P&gt;implicit none&lt;BR /&gt;&lt;BR /&gt;!integer, parameter :: NP=10000000&lt;BR /&gt;integer NP&lt;BR /&gt;real, allocatable :: X(:),Y(:)&lt;BR /&gt;!real X(10000000),Y(10000000)&lt;BR /&gt;integer i,j,k,ist,iend,icountrate&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;NP = 10000000;&lt;BR /&gt;0040101F mov dword ptr [ebp-10h],989680h &lt;BR /&gt;allocate(X(NP),Y(NP))&lt;BR /&gt;00401026 mov eax,dword ptr [X+0Ch (4D200Ch)] &lt;BR /&gt;0040102B or eax,9000000h &lt;BR /&gt;00401030 mov dword ptr [X+0Ch (4D200Ch)],eax &lt;BR /&gt;00401035 mov dword ptr [X+4 (4D2004h)],4 &lt;BR /&gt;0040103F mov eax,1 &lt;BR /&gt;00401044 mov dword ptr [X+10h (4D2010h)],eax &lt;BR /&gt;00401049 mov dword ptr [X+20h (4D2020h)],eax &lt;BR /&gt;0040104E mov eax,dword ptr [NP] &lt;BR /&gt;00401051 test eax,eax &lt;BR /&gt;00401053 jg TEST+5Eh (40105Eh) &lt;BR /&gt;00401055 mov dword ptr [ebp-34h],0 &lt;BR /&gt;0040105C jmp TEST+64h (401064h) &lt;BR /&gt;0040105E mov eax,dword ptr [NP] &lt;BR /&gt;00401061 mov dword ptr [ebp-34h],eax &lt;BR /&gt;00401064 mov eax,dword ptr [ebp-34h] &lt;BR /&gt;00401067 mov dword ptr [X+18h (4D2018h)],eax &lt;BR /&gt;0040106C mov edx,4 &lt;BR /&gt;00401071 mov dword ptr [X+1Ch (4D201Ch)],edx &lt;BR /&gt;00401077 add esp,0FFFFFFF0h &lt;BR /&gt;0040107A lea ecx,[ebp-48h] &lt;BR /&gt;0040107D mov dword ptr [esp],ecx &lt;BR /&gt;00401080 mov dword ptr [esp+4],2 &lt;BR /&gt;00401088 mov dword ptr [esp+8],eax &lt;BR /&gt;0040108C mov dword ptr [esp+0Ch],edx &lt;BR /&gt;00401090 call _for_check_mult_overflow (40252Ch) &lt;BR /&gt;00401095 add esp,10h &lt;BR /&gt;00401098 mov dword ptr [ebp-30h],eax &lt;BR /&gt;0040109B add esp,0FFFFFFF4h &lt;BR /&gt;0040109E mov eax,dword ptr [ebp-48h] &lt;BR /&gt;004010A1 mov dword ptr [esp],eax &lt;BR /&gt;004010A4 mov dword ptr [esp+4],offset X (4D2000h) &lt;BR /&gt;004010AC mov eax,dword ptr [X+0Ch (4D200Ch)] &lt;BR /&gt;004010B1 and eax,1 &lt;BR /&gt;004010B4 add eax,eax &lt;BR /&gt;004010B6 and eax,0FFFFFFEFh &lt;BR /&gt;004010B9 mov edx,dword ptr [ebp-30h] &lt;BR /&gt;004010BC and edx,1 &lt;BR /&gt;004010BF shl edx,4 &lt;BR /&gt;004010C2 or eax,edx &lt;BR /&gt;004010C4 mov dword ptr [esp+8],eax &lt;BR /&gt;004010C8 call _for_alloc_allocatable (402878h) &lt;BR /&gt;004010CD add esp,0Ch &lt;BR /&gt;004010D0 mov dword ptr [X+0Ch (4D200Ch)],5 &lt;BR /&gt;004010DA mov eax,dword ptr [X+20h (4D2020h)] &lt;BR /&gt;004010DF shl eax,2 &lt;BR /&gt;004010E2 neg eax &lt;BR /&gt;004010E4 mov dword ptr [X+8 (4D2008h)],eax &lt;BR /&gt;004010E9 mov eax,dword ptr [Y+0Ch (4D2034h)] &lt;BR /&gt;004010EE or eax,9000000h &lt;BR /&gt;004010F3 mov dword ptr [Y+0Ch (4D2034h)],eax &lt;BR /&gt;004010F8 mov dword ptr [Y+4 (4D202Ch)],4 &lt;BR /&gt;00401102 mov eax,1 &lt;BR /&gt;00401107 mov dword ptr [Y+10h (4D2038h)],eax &lt;BR /&gt;0040110C mov dword ptr [Y+20h (4D2048h)],eax &lt;BR /&gt;00401111 mov eax,dword ptr [NP] &lt;BR /&gt;00401114 test eax,eax &lt;BR /&gt;00401116 jg TEST+121h (401121h) &lt;BR /&gt;00401118 mov dword ptr [ebp-2Ch],0 &lt;BR /&gt;0040111F jmp TEST+127h (401127h) &lt;BR /&gt;00401121 mov eax,dword ptr [NP] &lt;BR /&gt;00401124 mov dword ptr [ebp-2Ch],eax &lt;BR /&gt;00401127 mov eax,dword ptr [ebp-2Ch] &lt;BR /&gt;0040112A mov dword ptr [Y+18h (4D2040h)],eax &lt;BR /&gt;0040112F mov edx,4 &lt;BR /&gt;00401134 mov dword ptr [Y+1Ch (4D2044h)],edx &lt;BR /&gt;0040113A add esp,0FFFFFFF0h &lt;BR /&gt;0040113D lea ecx,[ebp-44h] &lt;BR /&gt;00401140 mov dword ptr [esp],ecx &lt;BR /&gt;00401143 mov dword ptr [esp+4],2 &lt;BR /&gt;0040114B mov dword ptr [esp+8],eax &lt;BR /&gt;0040114F mov dword ptr [esp+0Ch],edx &lt;BR /&gt;00401153 call _for_check_mult_overflow (40252Ch) &lt;BR /&gt;00401158 add esp,10h &lt;BR /&gt;0040115B mov dword ptr [ebp-28h],eax &lt;BR /&gt;0040115E add esp,0FFFFFFF4h &lt;BR /&gt;00401161 mov eax,dword ptr [ebp-44h] &lt;BR /&gt;00401164 mov dword ptr [esp],eax &lt;BR /&gt;00401167 mov dword ptr [esp+4],offset Y (4D2028h) &lt;BR /&gt;0040116F mov eax,dword ptr [Y+0Ch (4D2034h)] &lt;BR /&gt;00401174 and eax,1 &lt;BR /&gt;00401177 add eax,eax &lt;BR /&gt;00401179 and eax,0FFFFFFEFh &lt;BR /&gt;0040117C mov edx,dword ptr [ebp-28h] &lt;BR /&gt;0040117F and edx,1 &lt;BR /&gt;00401182 shl edx,4 &lt;BR /&gt;00401185 or eax,edx &lt;BR /&gt;00401187 mov dword ptr [esp+8],eax &lt;BR /&gt;0040118B call _for_alloc_allocatable (402878h) &lt;BR /&gt;00401190 add esp,0Ch &lt;BR /&gt;00401193 mov dword ptr [Y+0Ch (4D2034h)],5 &lt;BR /&gt;0040119D mov eax,dword ptr [Y+20h (4D2048h)] &lt;BR /&gt;004011A2 shl eax,2 &lt;BR /&gt;004011A5 neg eax &lt;BR /&gt;004011A7 mov dword ptr [Y+8 (4D2030h)],eax &lt;BR /&gt;&lt;BR /&gt;call system_clock(ist,icountrate)&lt;BR /&gt;004011AC push edi &lt;BR /&gt;004011AD mov dword ptr [esp],4 &lt;BR /&gt;004011B4 call _for_system_clock_count (402AF0h) &lt;BR /&gt;004011B9 pop ecx &lt;BR /&gt;004011BA mov dword ptr [ebp-24h],eax &lt;BR /&gt;004011BD mov eax,dword ptr [ebp-24h] &lt;BR /&gt;004011C0 mov dword ptr [IST],eax &lt;BR /&gt;004011C3 push edi &lt;BR /&gt;004011C4 mov dword ptr [esp],4 &lt;BR /&gt;004011CB call _for_system_clock_rate (402AB0h) &lt;BR /&gt;004011D0 pop ecx &lt;BR /&gt;004011D1 mov dword ptr [ebp-20h],eax &lt;BR /&gt;004011D4 mov eax,dword ptr [ebp-20h] &lt;BR /&gt;004011D7 mov dword ptr [ICOUNTRATE],eax &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;do k = 1, 100&lt;BR /&gt;004011DA mov dword ptr &lt;K&gt;,1 &lt;BR /&gt;&lt;BR /&gt;do i = 1, NP&lt;BR /&gt;004011E1 mov eax,dword ptr [NP] &lt;BR /&gt;004011E4 mov dword ptr [ebp-8],eax &lt;BR /&gt;004011E7 mov dword ptr &lt;I&gt;,1 &lt;BR /&gt;004011EE mov eax,dword ptr [ebp-8] &lt;BR /&gt;004011F1 test eax,eax &lt;BR /&gt;004011F3 jle TEST+3C4h (4013C4h) &lt;BR /&gt; X(i) = 1.&lt;BR /&gt;004011F9 mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;004011FC mov edx,dword ptr [X+20h (4D2020h)] &lt;BR /&gt;00401202 cmp eax,edx &lt;BR /&gt;00401204 jge TEST+250h (401250h) &lt;BR /&gt;00401206 add esp,0FFFFFFE0h &lt;BR /&gt;00401209 mov dword ptr [esp],10100003h &lt;BR /&gt;00401210 mov dword ptr [esp+4],offset ___xt_z+58h (4B62A0h) &lt;BR /&gt;00401218 mov dword ptr [esp+8],5 &lt;BR /&gt;00401220 mov dword ptr [esp+0Ch],3 &lt;BR /&gt;00401228 mov dword ptr [esp+10h],1 &lt;BR /&gt;00401230 mov dword ptr [esp+14h],offset ___xt_z+144h (4B638Ch) &lt;BR /&gt;00401238 mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;0040123B mov dword ptr [esp+18h],eax &lt;BR /&gt;0040123F mov eax,dword ptr [X+20h (4D2020h)] &lt;BR /&gt;00401244 mov dword ptr [esp+1Ch],eax &lt;BR /&gt;00401248 call _for_emit_diagnostic (403224h) &lt;BR /&gt;0040124D add esp,20h &lt;BR /&gt;00401250 mov eax,dword ptr [X+20h (4D2020h)] &lt;BR /&gt;00401255 mov edx,dword ptr [X+18h (4D2018h)] &lt;BR /&gt;0040125B lea eax,[eax+edx-1] &lt;BR /&gt;0040125F mov edx,dword ptr &lt;I&gt; &lt;BR /&gt;00401262 cmp edx,eax &lt;BR /&gt;00401264 jle TEST+2BAh (4012BAh) &lt;BR /&gt;00401266 add esp,0FFFFFFE0h &lt;BR /&gt;00401269 mov dword ptr [esp],10100002h &lt;BR /&gt;00401270 mov dword ptr [esp+4],offset ___xt_z+0D8h (4B6320h) &lt;BR /&gt;00401278 mov dword ptr [esp+8],5 &lt;BR /&gt;00401280 mov dword ptr [esp+0Ch],2 &lt;BR /&gt;00401288 mov dword ptr [esp+10h],1 &lt;BR /&gt;00401290 mov dword ptr [esp+14h],offset ___xt_z+148h (4B6390h) &lt;BR /&gt;00401298 mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;0040129B mov dword ptr [esp+18h],eax &lt;BR /&gt;0040129F mov eax,dword ptr [X+20h (4D2020h)] &lt;BR /&gt;004012A4 mov edx,dword ptr [X+18h (4D2018h)] &lt;BR /&gt;004012AA lea eax,[eax+edx-1] &lt;BR /&gt;004012AE mov dword ptr [esp+1Ch],eax &lt;BR /&gt;004012B2 call _for_emit_diagnostic (403224h) &lt;BR /&gt;004012B7 add esp,20h &lt;BR /&gt;004012BA fld1 &lt;BR /&gt;004012BC mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;004012BF mov edx,dword ptr [X (4D2000h)] &lt;BR /&gt;004012C5 lea eax,[edx+eax*4] &lt;BR /&gt;004012C8 mov edx,dword ptr [X+20h (4D2020h)] &lt;BR /&gt;004012CE shl edx,2 &lt;BR /&gt;004012D1 neg edx &lt;BR /&gt;004012D3 fstp dword ptr [edx+eax]&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/K&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;and static variant:&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;implicit none&lt;BR /&gt;&lt;BR /&gt;integer, parameter :: NP=10000000&lt;BR /&gt;!integer NP&lt;BR /&gt;!real, allocatable :: X(:),Y(:)&lt;BR /&gt;real X(10000000),Y(10000000)&lt;BR /&gt;integer i,j,k,ist,iend,icountrate&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;!NP = 10000000;&lt;BR /&gt;!allocate(X(NP),Y(NP))&lt;BR /&gt;&lt;BR /&gt;call system_clock(ist,icountrate)&lt;BR /&gt;0040101F push edi &lt;BR /&gt;00401020 mov dword ptr [esp],4 &lt;BR /&gt;00401027 call _for_system_clock_count (402280h) &lt;BR /&gt;0040102C pop ecx &lt;BR /&gt;0040102D mov dword ptr [ebp-1Ch],eax &lt;BR /&gt;00401030 mov eax,dword ptr [ebp-1Ch] &lt;BR /&gt;00401033 mov dword ptr [IST],eax &lt;BR /&gt;00401036 push edi &lt;BR /&gt;00401037 mov dword ptr [esp],4 &lt;BR /&gt;0040103E call _for_system_clock_rate (402240h) &lt;BR /&gt;00401043 pop ecx &lt;BR /&gt;00401044 mov dword ptr [ebp-18h],eax &lt;BR /&gt;00401047 mov eax,dword ptr [ebp-18h] &lt;BR /&gt;0040104A mov dword ptr [ICOUNTRATE],eax &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;do k = 1, 100&lt;BR /&gt;0040104D mov dword ptr &lt;K&gt;,1 &lt;BR /&gt;&lt;BR /&gt;do i = 1, NP&lt;BR /&gt;00401054 mov dword ptr &lt;I&gt;,1 &lt;BR /&gt; X(i) = 1.&lt;BR /&gt;0040105B mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;0040105E test eax,eax &lt;BR /&gt;00401060 jg TEST+0A8h (4010A8h) &lt;BR /&gt;00401062 add esp,0FFFFFFE0h &lt;BR /&gt;00401065 mov dword ptr [esp],10100003h &lt;BR /&gt;0040106C mov dword ptr [esp+4],offset ___xt_z+58h (4B62A0h) &lt;BR /&gt;00401074 mov dword ptr [esp+8],5 &lt;BR /&gt;0040107C mov dword ptr [esp+0Ch],3 &lt;BR /&gt;00401084 mov eax,1 &lt;BR /&gt;00401089 mov dword ptr [esp+10h],eax &lt;BR /&gt;0040108D mov dword ptr [esp+14h],offset ___xt_z+144h (4B638Ch) &lt;BR /&gt;00401095 mov edx,dword ptr &lt;I&gt; &lt;BR /&gt;00401098 mov dword ptr [esp+18h],edx &lt;BR /&gt;0040109C mov dword ptr [esp+1Ch],eax &lt;BR /&gt;004010A0 call _for_emit_diagnostic (4029B4h) &lt;BR /&gt;004010A5 add esp,20h &lt;BR /&gt;004010A8 mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;004010AB cmp eax,989680h &lt;BR /&gt;004010B0 jle TEST+0FBh (4010FBh) &lt;BR /&gt;004010B2 add esp,0FFFFFFE0h &lt;BR /&gt;004010B5 mov dword ptr [esp],10100002h &lt;BR /&gt;004010BC mov dword ptr [esp+4],offset ___xt_z+0D8h (4B6320h) &lt;BR /&gt;004010C4 mov dword ptr [esp+8],5 &lt;BR /&gt;004010CC mov dword ptr [esp+0Ch],2 &lt;BR /&gt;004010D4 mov dword ptr [esp+10h],1 &lt;BR /&gt;004010DC mov dword ptr [esp+14h],offset ___xt_z+148h (4B6390h) &lt;BR /&gt;004010E4 mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;004010E7 mov dword ptr [esp+18h],eax &lt;BR /&gt;004010EB mov dword ptr [esp+1Ch],989680h &lt;BR /&gt;004010F3 call _for_emit_diagnostic (4029B4h) &lt;BR /&gt;004010F8 add esp,20h &lt;BR /&gt;004010FB fld1 &lt;BR /&gt;004010FD mov eax,dword ptr &lt;I&gt; &lt;BR /&gt;00401100 fstp dword ptr TWO_TO_M1536A+8 (4D577Ch)[eax*4] &lt;BR /&gt; Y(i) = 1.&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/K&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 21:15:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874723#M73440</guid>
      <dc:creator>brovchik</dc:creator>
      <dc:date>2008-10-03T21:15:31Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874724#M73441</link>
      <description>&lt;P&gt;You have array bounds checking on - that's most of the code. Try turning it off. If you don't realize you have it on, then you may be building a Debug configuration - use a Release configuration.&lt;/P&gt;
&lt;P&gt;With an allocatable array, the array checking code has to fetch the bounds from the array descriptor each time (well, it may not HAVE to but it does), but with a static array it knows the bounds at compile time.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2008 23:39:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874724#M73441</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2008-10-03T23:39:10Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874725#M73442</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Hi Steve&lt;/P&gt;
&lt;P&gt;Attached is source file and an excel results files.&lt;/P&gt;
&lt;P&gt;I am using Win64 XP Profession, Visual Studio 2005. I have Xeon 5150 (2.66 GHz) with 6 Gb of RAM.&lt;/P&gt;
&lt;P&gt;I ran the test with two compilers 11.0.039 beta and 10.1.024 with win32 and x64 builds AND in the default "release" configuration. There are no additional checks etc. that I activated.&lt;/P&gt;
&lt;P&gt;What I find is that the computation time starts to differ when NP = 1000000 or higher. When reading from user's input, this starts to happen with NP one order of magnitude smaller.&lt;/P&gt;
&lt;P&gt;However, what I am surprised at is the difference made by use of X(1:NP) and just X.&lt;/P&gt;
&lt;P&gt;I really hope that there is not silly error in my program. But I am wondering what is going on.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Sincerely&lt;/P&gt;
&lt;P&gt;Abhi&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2008 00:35:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874725#M73442</guid>
      <dc:creator>abhimodak</dc:creator>
      <dc:date>2008-10-04T00:35:07Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874726#M73443</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Although I used "Add Files" I don't see them in my post; hence I am pasting my code and the test results here:&lt;/P&gt;
&lt;P&gt;&lt;A&gt; Results:&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; height: 298px;" border="0" cellspacing="0" cellpadding="0" width="256"&gt;
&lt;COL style="width: 48pt;" span="4" width="64" /&gt; 
&lt;TBODY&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt; width: 48pt;" width="64" height="17"&gt;NP&lt;/TD&gt;
&lt;TD style="width: 48pt;" width="64" align="right"&gt;10000000&lt;/TD&gt;
&lt;TD style="width: 48pt;" width="64"&gt;&lt;/TD&gt;
&lt;TD style="width: 48pt;" width="64"&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" colspan="3" height="17"&gt;Compiler   11.0.039 Beta&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;Win32&lt;/TD&gt;
&lt;TD&gt;x64&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;Loop&lt;/TD&gt;
&lt;TD align="right"&gt;0.125&lt;/TD&gt;
&lt;TD align="right"&gt;0.125&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;WithDim&lt;/TD&gt;
&lt;TD align="right"&gt;0.09375&lt;/TD&gt;
&lt;TD align="right"&gt;0.082031&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;NOTDim&lt;/TD&gt;
&lt;TD align="right"&gt;8.753906&lt;/TD&gt;
&lt;TD align="right"&gt;4.375&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" colspan="2" height="17"&gt;Compiler   10.1.024&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;Win32&lt;/TD&gt;
&lt;TD&gt;x64&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;Loop&lt;/TD&gt;
&lt;TD align="right"&gt;0.125&lt;/TD&gt;
&lt;TD align="right"&gt;0.109375&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;WithDim&lt;/TD&gt;
&lt;TD align="right"&gt;0.09375&lt;/TD&gt;
&lt;TD align="right"&gt;0.109375&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;NOTDim&lt;/TD&gt;
&lt;TD align="right"&gt;8.53125&lt;/TD&gt;
&lt;TD align="right"&gt;4.28125&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;B&gt; Source&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Program Test_AllocationSpeed&lt;BR /&gt; !&lt;BR /&gt; ! Purpose: Test Speed difference when using allocatable arrays.&lt;BR /&gt; !&lt;BR /&gt; Implicit None&lt;BR /&gt; !&lt;BR /&gt; Integer :: NP&lt;BR /&gt; Real(8), Allocatable :: X(:), Y(:)&lt;BR /&gt; ! &lt;BR /&gt;! Integer, Parameter :: NP = 10000000 &lt;BR /&gt;! Real(8) :: X(NP), Y(NP) &lt;BR /&gt; !&lt;BR /&gt; Integer :: ial, i, k&lt;BR /&gt; Character(32) :: AllocationError&lt;BR /&gt; &lt;BR /&gt; Real(8) :: ts, te&lt;BR /&gt;!&lt;BR /&gt;!###################&lt;BR /&gt;!&lt;BR /&gt;! Print *, "Give NP"&lt;BR /&gt;! Read *, NP&lt;BR /&gt; ! &lt;BR /&gt; NP = 10000000&lt;BR /&gt;&lt;BR /&gt; Allocate(X(NP), Y(NP), stat=ial)!, ERRMSG = AllocationError)&lt;BR /&gt; if (ial /= 0) then&lt;BR /&gt; Stop&lt;BR /&gt; !Write(*,"(A)") Trim(AllocationError)&lt;BR /&gt; endif&lt;BR /&gt;&lt;BR /&gt; ! With Loop&lt;BR /&gt; Call CPU_Time(ts)&lt;BR /&gt; do k = 1, 100&lt;BR /&gt; do i = 1, NP&lt;BR /&gt; X(i) = 0.0d0&lt;BR /&gt; Y(i) = 0.0d0&lt;BR /&gt; enddo&lt;BR /&gt; end do&lt;BR /&gt; Call CPU_Time(te)&lt;BR /&gt; Write(*,"(A)") "With Loop:"&lt;BR /&gt; Write(*,"(A,ES14.6)") "Computation time with Loop &lt;S&gt;:", (te-ts)&lt;BR /&gt; Write(*,*)&lt;BR /&gt; &lt;BR /&gt; ! With whole array dimensioned&lt;BR /&gt; Call CPU_Time(ts)&lt;BR /&gt; do k = 1, 100&lt;BR /&gt; X(1:NP) = 0.0d0&lt;BR /&gt; Y(1:NP) = 0.0d0&lt;BR /&gt; end do&lt;BR /&gt; Call CPU_Time(te)&lt;BR /&gt; Write(*,"(A)") "With whole array dimensioned:"&lt;BR /&gt; Write(*,"(A,ES14.6)") "Computation time &lt;S&gt;:", (te-ts)&lt;BR /&gt; Write(*,*)&lt;BR /&gt; &lt;BR /&gt; ! With whole array NOT dimensioned&lt;BR /&gt; Call CPU_Time(ts)&lt;BR /&gt; do k = 1, 100&lt;BR /&gt; X = 0.0d0&lt;BR /&gt; Y = 0.0d0&lt;BR /&gt; end do&lt;BR /&gt; Call CPU_Time(te)&lt;BR /&gt; Write(*,"(A)") "With whole array NOT dimensioned:"&lt;BR /&gt; Write(*,"(A,ES14.6)") "Computation time &lt;S&gt;:", (te-ts)&lt;BR /&gt; Write(*,*) &lt;BR /&gt; &lt;BR /&gt;!&lt;BR /&gt; End Program Test_AllocationSpeed&lt;BR /&gt;!&lt;BR /&gt;!===============================================================================&lt;BR /&gt;!&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&lt;BR /&gt;!===============================================================================&lt;BR /&gt;!&lt;BR /&gt;&lt;BR /&gt;&lt;/S&gt;&lt;/S&gt;&lt;/S&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2008 00:53:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874726#M73443</guid>
      <dc:creator>abhimodak</dc:creator>
      <dc:date>2008-10-04T00:53:48Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874727#M73444</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;On your test I have next times on Intel Core2 Duo 2.4GHz on Win32 and 10.0. compiler:&lt;/P&gt;
&lt;P&gt;Allocatable Static&lt;/P&gt;
&lt;P&gt;loop 8.04 7.875&lt;/P&gt;
&lt;P&gt;with DIM 7.546 7.546875&lt;/P&gt;
&lt;P&gt;NOT DIM 7.531 7.546875&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I'm very confused, because after I run your test as separate project, my test with allocatble and static arrays began to show almost equal times. This is very strange to me because I did not change anyting! The only idea i have is that some external programs can influence memory access of the testet program. May be in your case it is something similar. Your case is very interesting and your times for "loop" and "dim" look very optimistic. Very interesting to know is such significant speed up is an error or it can be achieved somehow.&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2008 07:39:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874727#M73444</guid>
      <dc:creator>brovchik</dc:creator>
      <dc:date>2008-10-04T07:39:38Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874728#M73445</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Show us the ifort command line used - it will be in the build log, a link to which will be displayed after you build the project. My guess is that you used an optimization level that removed one or both loops completely.&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2008 12:11:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874728#M73445</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2008-10-04T12:11:18Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874729#M73446</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Hi&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I had actually meant to put in the timings with compiler version 10.0.025 ...Here are those&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 144pt;" border="0" cellspacing="0" cellpadding="0" width="192"&gt;
&lt;COL style="width: 48pt;" span="3" width="64" /&gt; 
&lt;TBODY&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt; width: 96pt;" colspan="2" width="128" height="17"&gt;Compiler 10.0.025&lt;/TD&gt;
&lt;TD style="width: 48pt;" width="64"&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;&lt;/TD&gt;
&lt;TD&gt;Win32&lt;/TD&gt;
&lt;TD&gt;x64&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;Loop&lt;/TD&gt;
&lt;TD align="right"&gt;8.613281&lt;/TD&gt;
&lt;TD align="right"&gt;4.4375&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;WithDim&lt;/TD&gt;
&lt;TD align="right"&gt;8.316406&lt;/TD&gt;
&lt;TD align="right"&gt;4.292969&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 12.75pt;" height="17"&gt;
&lt;TD style="height: 12.75pt;" height="17"&gt;NOTDim&lt;/TD&gt;
&lt;TD align="right"&gt;8.425781&lt;/TD&gt;
&lt;TD align="right"&gt;8.382813&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;Below are the Win32 and x64 build logs with 10.0.025&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Abhi&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;================&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;[1] Win32&lt;/P&gt;
&lt;P&gt;Deleting intermediate files and output files for project 'Test_AllocateSpeed', configuration 'Release|Win32'.&lt;BR /&gt;Compiling with Intel Fortran Compiler 10.0.025 [IA-32]...&lt;BR /&gt;ifort /nologo /module:"Release" /object:"Release" /libs:static /threads /c /Qvc8 /Qlocation,link,"C:Program Files (x86)Microsoft Visual Studio 8VCbin" "C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90"&lt;BR /&gt;Linking...&lt;BR /&gt;Link /OUT:"ReleaseTest_AllocateSpeed.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE /IMPLIB:"C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.lib" "ReleaseTest_AllocateSpeed.obj"&lt;BR /&gt;Link: executing 'link'&lt;BR /&gt;&lt;BR /&gt;Embedding manifest...&lt;BR /&gt;mt.exe /nologo /outputresource:"C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.exe;#1" /manifest "C:AbhiMySourceTestsTest_AllocateSpeedreleasetest_allocatespeed.exe.intermediate.manifest"&lt;BR /&gt;&lt;BR /&gt;Test_AllocateSpeed - 0 error(s), 0 warning(s)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;[2] x64&lt;/P&gt;
&lt;P&gt;Deleting intermediate files and output files for project 'Test_AllocateSpeed', configuration 'Release|x64'.&lt;BR /&gt;Compiling with Intel Fortran Compiler 10.0.025 [Intel 64]...&lt;BR /&gt;ifort /nologo /module:"x64Release" /object:"x64Release" /libs:static /threads /c /Qvc8 /Qlocation,link,"C:Program Files (x86)Microsoft Visual Studio 8VCbinx86_amd64" "C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90"&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(34): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(47): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(48): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(58): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;C:AbhiMySourceTestsTest_AllocateSpeedTest_AllocateSpeed.f90(59): (col. 13) remark: LOOP WAS VECTORIZED.&lt;BR /&gt;&lt;BR /&gt;Linking...&lt;BR /&gt;Link /OUT:"x64ReleaseTest_AllocateSpeed.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE /IMPLIB:"C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.lib" "x64ReleaseTest_AllocateSpeed.obj"&lt;BR /&gt;Link: executing 'link'&lt;BR /&gt;&lt;BR /&gt;Embedding manifest...&lt;BR /&gt;mt.exe /nologo /outputresource:"C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.exe;#1" /manifest "C:AbhiMySourceTestsTest_AllocateSpeedx64releasetest_allocatespeed.exe.intermediate.manifest"&lt;BR /&gt;&lt;BR /&gt;Test_AllocateSpeed - 0 error(s), 0 warning(s)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Oct 2008 17:29:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874729#M73446</guid>
      <dc:creator>abhimodak</dc:creator>
      <dc:date>2008-10-06T17:29:22Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874730#M73447</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Other remote possibilities include: alignment problems. NaNs in the clipboard and paste them in.&lt;BR /&gt;You don't say what kind of processor you're using. An unfortunate cached stride or some kind of prefetch disagreement seems unlikely with such a simple example but it's a remote possibility.&lt;BR /&gt;Your arrays are large enough that the performance of this code will be able to it; right-click, got to disassembly. Copy the surrounding several dozen lines to the heap. That example is short enough that you can't read assembly try posing both under VTune and see the differences. Even more exotic stuff I can't imagine right now.&lt;BR /&gt;Run both versions here.&lt;BR /&gt;Set a breakpoint in the loop and debug to look at 10x better than bad code, on current x64 hardware. (using write combining, no-fetch sequences versus actually updating individual cells one by how well the cache is managed. Best code will be right at the generated assembly and see what's going on. If you should be determined by one).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;-------------------------&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Emma Henley&lt;BR /&gt;&lt;/STRONG&gt; &lt;A class="ftalternatingbarlinklarge" href="http://hubpages.com/hub/Sidekick-Phones" target="_blank"&gt;Sidekick Phones&lt;/A&gt; - &lt;A class="ftalternatingbarlinklarge" href="http://hubpages.com/hub/Sidekick-Phones" target="_blank"&gt;Sidekick Phones&lt;/A&gt; - &lt;A class="ftalternatingbarlinklarge" href="http://hubpages.com/hub/32-HDTV-LCD" target="_blank"&gt;32 HDTV LCD&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2008 07:34:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874730#M73447</guid>
      <dc:creator>emmahenley</dc:creator>
      <dc:date>2008-10-08T07:34:41Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874731#M73448</link>
      <description>&lt;DIV style="margin:0px;"&gt;160 megabytes of data * 100 passes = 16e9 bytes to write. At 8 seconds you're seeing only 2 GB/sec which is about half of what your hardware might be capable of if it has a really good ram configuration. Anything less than 4 seconds means some of the code is optimized away.&lt;/DIV&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2008 14:02:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874731#M73448</guid>
      <dc:creator>Steve_Nuchia</dc:creator>
      <dc:date>2008-10-08T14:02:10Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874732#M73449</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Honestly, I am not sure if I follow the last two posts....&lt;/P&gt;
&lt;P&gt;I have put the the timings with three versions of the compiler for Win32 and x64. I don't understand the way are...What really bothers me the performance of whole array operations.&lt;/P&gt;
&lt;P&gt;Earlier Steve suggested that there may be /check:bounds. But it is not so since I use the release configuration for all these runs.&lt;/P&gt;
&lt;P&gt;I am using Xeon 5150 (2.66 GHz) with 6 Gb of RAM. The operating system in WinXP 64 Professional.&lt;/P&gt;
&lt;P&gt;Abhi&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2008 16:40:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874732#M73449</guid>
      <dc:creator>abhimodak</dc:creator>
      <dc:date>2008-10-08T16:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874733#M73450</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;I see what looks like bounds checking code in the assembly listing you posted.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Oct 2008 19:38:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874733#M73450</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2008-10-13T19:38:22Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874734#M73451</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Hi Steve&lt;/P&gt;
&lt;P&gt;So even when I am NOT using /check:bounds it is getting in there? How come it is not affecting the loops? I am very confused.. Should I be using the whole array (without dimension) syntax or not?&lt;/P&gt;
&lt;P&gt;Abhi&lt;/P&gt;</description>
      <pubDate>Mon, 13 Oct 2008 22:00:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874734#M73451</guid>
      <dc:creator>abhimodak</dc:creator>
      <dc:date>2008-10-13T22:00:42Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874735#M73452</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;You are showing the assembly output from a debug configuration.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Oct 2008 23:11:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874735#M73452</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2008-10-13T23:11:26Z</dc:date>
    </item>
    <item>
      <title>Re: Slow allocatable arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874736#M73453</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;Hi Steve&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I did NOT put any assembly code... I only put the build logs. The assmebly code was put by the brovchik. I am running only the "release" mode and all the computation times I reported are only with the release mode.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Abhi&lt;/P&gt;</description>
      <pubDate>Mon, 13 Oct 2008 23:21:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Slow-allocatable-arrays/m-p/874736#M73453</guid>
      <dc:creator>abhimodak</dc:creator>
      <dc:date>2008-10-13T23:21:40Z</dc:date>
    </item>
  </channel>
</rss>

