<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic /o3 delivers in x64 only bad performance in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/o3-delivers-in-x64-only-bad-performance/m-p/1050021#M114951</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;In a simple ASCII-file read subroutine I observed a strange performance loss behavior only in x64 release mode but not in x64 debug, Win32 debug and Win32 release mode. In every release case I use /o3 (ifort 15.0.0.108), in debug optimization is turned off. The release reading performance in the latter three cases is nearly the same. The ASCII file contains x, y, z coordinates of a grid (1025 lines). Commenting the preprocessor commands and the screen output makes no difference.&lt;/P&gt;

&lt;P&gt;But, if I use /o2 for this subroutine the performance is the same as for the three other cases, reading the files takes now 1 second instead of 12 seconds. Is it possible to find the reason for this behavior with only the information from the code snipped? In /o3 the vectorrization report shows the lopps do i.. and d j .. are vectorized. In /o2 they are not. In Win32 /o3 there are also vectorized, but no performance issue is seen. (r3grid is a double precision real array).&lt;/P&gt;

&lt;P&gt;Any hint is welcome, Johannes&lt;/P&gt;

&lt;P&gt;[fortran]&lt;/P&gt;

&lt;P&gt;&amp;nbsp; ie = len_trim(t0grid)&lt;BR /&gt;
	&amp;nbsp; inquire(file=t0grid(1:ie), exist=l0in)&lt;BR /&gt;
	&amp;nbsp; if (l0in) then&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; open(i0ugrid,file=t0grid(1:ie),status='old',buffered='YES',action='read')&lt;BR /&gt;
	&amp;nbsp; else&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; write(*,9900) t0grid(1:ie)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;pause&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; stop&lt;BR /&gt;
	&amp;nbsp; end if&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	!&lt;BR /&gt;
	! *** read the mesh &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; read(i0ugrid,'(a)') t0dumy ! skip header&lt;BR /&gt;
	&amp;nbsp; read(i0ugrid,*) t0circ,i0schnitte,t0profil,i0profilpunkte,t0dim,i0dim&lt;BR /&gt;
	! &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; allocate(r3grid(i0schnitte,i0profilpunkte,i0dim))&lt;BR /&gt;
	!&lt;BR /&gt;
	&amp;nbsp; write(t0s,'(i4)') i0schnitte&lt;BR /&gt;
	&amp;nbsp; write(t0p,'(i4)') i0profilpunkte&lt;BR /&gt;
	&amp;nbsp; write(t0d,'(i4)') i0dim&lt;BR /&gt;
	&amp;nbsp; call write_both( '&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; - Read GRID with '//t0s//&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;amp;&lt;BR /&gt;
	&amp;nbsp;&amp;amp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ' sections and '//t0p//' nodes in ' &amp;amp;&lt;BR /&gt;
	&amp;nbsp;&amp;amp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; //t0d//' dimensions')&lt;BR /&gt;
	!&lt;BR /&gt;
	#if defined(_WIN32) || defined(_WIN64)&lt;BR /&gt;
	&amp;nbsp; open(6,carriagecontrol ='fortran')&lt;BR /&gt;
	#endif &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; do i = 1, i0schnitte&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; do j = 1, i0profilpunkte&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; read(i0ugrid,*) r3grid(i,j,1:i0dim)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&lt;BR /&gt;
	#if defined(_WIN32) || defined(_WIN64)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; write(6,'("+&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Progress: ",i4," von ",i4," Sections read.")') i,i0schnitte&lt;BR /&gt;
	#endif&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; end do&lt;BR /&gt;
	#if defined(_WIN32) || defined(_WIN64)&lt;BR /&gt;
	&amp;nbsp; open(6,carriagecontrol ='list') &amp;nbsp;&lt;BR /&gt;
	#endif&amp;nbsp;&lt;/P&gt;

&lt;P&gt;[/fortran]&lt;/P&gt;</description>
    <pubDate>Fri, 07 Nov 2014 16:05:53 GMT</pubDate>
    <dc:creator>Johannes_Rieke</dc:creator>
    <dc:date>2014-11-07T16:05:53Z</dc:date>
    <item>
      <title>/o3 delivers in x64 only bad performance</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/o3-delivers-in-x64-only-bad-performance/m-p/1050021#M114951</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;In a simple ASCII-file read subroutine I observed a strange performance loss behavior only in x64 release mode but not in x64 debug, Win32 debug and Win32 release mode. In every release case I use /o3 (ifort 15.0.0.108), in debug optimization is turned off. The release reading performance in the latter three cases is nearly the same. The ASCII file contains x, y, z coordinates of a grid (1025 lines). Commenting the preprocessor commands and the screen output makes no difference.&lt;/P&gt;

&lt;P&gt;But, if I use /o2 for this subroutine the performance is the same as for the three other cases, reading the files takes now 1 second instead of 12 seconds. Is it possible to find the reason for this behavior with only the information from the code snipped? In /o3 the vectorrization report shows the lopps do i.. and d j .. are vectorized. In /o2 they are not. In Win32 /o3 there are also vectorized, but no performance issue is seen. (r3grid is a double precision real array).&lt;/P&gt;

&lt;P&gt;Any hint is welcome, Johannes&lt;/P&gt;

&lt;P&gt;[fortran]&lt;/P&gt;

&lt;P&gt;&amp;nbsp; ie = len_trim(t0grid)&lt;BR /&gt;
	&amp;nbsp; inquire(file=t0grid(1:ie), exist=l0in)&lt;BR /&gt;
	&amp;nbsp; if (l0in) then&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; open(i0ugrid,file=t0grid(1:ie),status='old',buffered='YES',action='read')&lt;BR /&gt;
	&amp;nbsp; else&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; write(*,9900) t0grid(1:ie)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;pause&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; stop&lt;BR /&gt;
	&amp;nbsp; end if&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	!&lt;BR /&gt;
	! *** read the mesh &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; read(i0ugrid,'(a)') t0dumy ! skip header&lt;BR /&gt;
	&amp;nbsp; read(i0ugrid,*) t0circ,i0schnitte,t0profil,i0profilpunkte,t0dim,i0dim&lt;BR /&gt;
	! &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; allocate(r3grid(i0schnitte,i0profilpunkte,i0dim))&lt;BR /&gt;
	!&lt;BR /&gt;
	&amp;nbsp; write(t0s,'(i4)') i0schnitte&lt;BR /&gt;
	&amp;nbsp; write(t0p,'(i4)') i0profilpunkte&lt;BR /&gt;
	&amp;nbsp; write(t0d,'(i4)') i0dim&lt;BR /&gt;
	&amp;nbsp; call write_both( '&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; - Read GRID with '//t0s//&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;amp;&lt;BR /&gt;
	&amp;nbsp;&amp;amp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ' sections and '//t0p//' nodes in ' &amp;amp;&lt;BR /&gt;
	&amp;nbsp;&amp;amp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; //t0d//' dimensions')&lt;BR /&gt;
	!&lt;BR /&gt;
	#if defined(_WIN32) || defined(_WIN64)&lt;BR /&gt;
	&amp;nbsp; open(6,carriagecontrol ='fortran')&lt;BR /&gt;
	#endif &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; do i = 1, i0schnitte&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; do j = 1, i0profilpunkte&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; read(i0ugrid,*) r3grid(i,j,1:i0dim)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&lt;BR /&gt;
	#if defined(_WIN32) || defined(_WIN64)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; write(6,'("+&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Progress: ",i4," von ",i4," Sections read.")') i,i0schnitte&lt;BR /&gt;
	#endif&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; end do&lt;BR /&gt;
	#if defined(_WIN32) || defined(_WIN64)&lt;BR /&gt;
	&amp;nbsp; open(6,carriagecontrol ='list') &amp;nbsp;&lt;BR /&gt;
	#endif&amp;nbsp;&lt;/P&gt;

&lt;P&gt;[/fortran]&lt;/P&gt;</description>
      <pubDate>Fri, 07 Nov 2014 16:05:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/o3-delivers-in-x64-only-bad-performance/m-p/1050021#M114951</guid>
      <dc:creator>Johannes_Rieke</dc:creator>
      <dc:date>2014-11-07T16:05:53Z</dc:date>
    </item>
  </channel>
</rss>

