<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Ifort 11.1 bug (?) segfault with nested OpenMP and large privat in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748916#M5967</link>
    <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Thanks for the input, I agree with the use of the module.&lt;BR /&gt;&lt;BR /&gt;I am not quite sure what your point is with the rest of it though. I guess I could have been more clear. This was intended as a small example, the first one generates a segmentation violation, if n is large enough (1000 on our systems), but does not if n is small (around 10). This segmentation fault also happens if for example you allocate the array etc.&lt;BR /&gt;&lt;BR /&gt;There seems to be a problem with nested OpenMP, and large private arrays.&lt;BR /&gt;&lt;BR /&gt;If you take the outer omp parallel region out, there is no seg fault. If you leave it in, but select 1 thread (which should be pretty equivilent) then there is a segfault.&lt;BR /&gt;&lt;BR /&gt;There does seem to be a problem where you have nested parallel regions, where a large array is private in the first parallel region, and then also private in the second. (In our case that would give 12 copies of the array, and take up about 48 MB of memory). As I understand it this is allowed by the standard, but a segmentation fault occurs.&lt;BR /&gt;&lt;BR /&gt;When you have a shared array in the first parallel region, then a private array in the 2nd (which still gives a 1000,1000,12 array in effect then you do not get the segnemtation violation at runtime.&lt;BR /&gt;&lt;BR /&gt;So what I was more interested in, is nesting private variables legal, or is there anything else in there which could potentially cause a segmentation violation, or is there a compiler bug?&lt;BR /&gt;&lt;BR /&gt;Jon</description>
    <pubDate>Mon, 06 Sep 2010 15:15:35 GMT</pubDate>
    <dc:creator>jonathanvincent</dc:creator>
    <dc:date>2010-09-06T15:15:35Z</dc:date>
    <item>
      <title>Ifort 11.1 bug (?) segfault with nested OpenMP and large private arrays</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748914#M5965</link>
      <description>Seen this on two different computer systems&lt;BR /&gt;&lt;BR /&gt;Seems to be a compiler issue.&lt;BR /&gt;&lt;BR /&gt;Program that segfaults&lt;BR /&gt;*****************************************************&lt;BR /&gt;program tomp&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer,external:: omp_get_num_threads&lt;BR /&gt; integer,external:: omp_get_thread_num&lt;BR /&gt; logical,external :: omp_get_nested&lt;BR /&gt;&lt;BR /&gt; integer nthread1,m1&lt;BR /&gt;&lt;BR /&gt; call omp_set_nested(.true.)&lt;BR /&gt; &lt;BR /&gt;!$omp parallel private (m1) num_threads(4)&lt;BR /&gt;&lt;BR /&gt; call omp_set_nested(.true.)&lt;BR /&gt; &lt;BR /&gt; nthread1 = omp_get_num_threads()&lt;BR /&gt; m1 = omp_get_thread_num()&lt;BR /&gt;&lt;BR /&gt; write(*,*) 'outer: Running on nthread1=',nthread1,m1&lt;BR /&gt; write(*,*) 'outer: ',omp_get_nested()&lt;BR /&gt;&lt;BR /&gt; call inner(m1)&lt;BR /&gt; &lt;BR /&gt; write(*,*) 'outer: done ',m1&lt;BR /&gt; &lt;BR /&gt;!$omp end parallel&lt;BR /&gt; end program tomp&lt;BR /&gt;&lt;BR /&gt; subroutine inner(m1)&lt;BR /&gt;&lt;BR /&gt; integer,parameter :: n=1000&lt;BR /&gt;c increasing n to 1000 will give segmentation fault with ifort&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt; real a(n,n)&lt;BR /&gt; integer,external:: omp_get_num_threads&lt;BR /&gt; integer,external:: omp_get_thread_num&lt;BR /&gt; logical,external:: omp_get_nested&lt;BR /&gt;&lt;BR /&gt; integer m1,i,j,m2,nthread2,k&lt;BR /&gt;&lt;BR /&gt;!$omp parallel private(m2,i,j,k,a) num_threads(3)&lt;BR /&gt;&lt;BR /&gt; call omp_set_nested(.true.)&lt;BR /&gt;&lt;BR /&gt; nthread2 = omp_get_num_threads()&lt;BR /&gt; m2 = omp_get_thread_num()&lt;BR /&gt;&lt;BR /&gt; write(*,*) 'inner: Running on nthread2=',nthread2,m2,m1&lt;BR /&gt; write(*,*) 'inner: ',omp_get_nested()&lt;BR /&gt;&lt;BR /&gt; a=0.&lt;BR /&gt; do k=1,1000&lt;BR /&gt; do i=1,n&lt;BR /&gt; do j=1,n&lt;BR /&gt; a(i,j) = sin(a(i,j))**2.&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt;&lt;BR /&gt; write(*,*) 'inner: done ',m2&lt;BR /&gt;!$omp end parallel&lt;BR /&gt;&lt;BR /&gt; end subroutine inner&lt;BR /&gt;*********************************************************&lt;BR /&gt;&lt;BR /&gt;Program that seems to work.&lt;BR /&gt;&lt;BR /&gt;**********************************************************&lt;BR /&gt;program tomp&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer, parameter :: othreads=4&lt;BR /&gt; integer, parameter :: n=1000&lt;BR /&gt;&lt;BR /&gt; integer,external:: omp_get_num_threads&lt;BR /&gt; integer,external:: omp_get_thread_num&lt;BR /&gt; logical,external :: omp_get_nested&lt;BR /&gt;&lt;BR /&gt; integer nthread1,m1&lt;BR /&gt;&lt;BR /&gt; real :: a(n,n,othreads)&lt;BR /&gt;&lt;BR /&gt; call omp_set_nested(.true.)&lt;BR /&gt; &lt;BR /&gt;!$omp parallel private (m1) num_threads(othreads)&lt;BR /&gt; &lt;BR /&gt; nthread1 = omp_get_num_threads()&lt;BR /&gt; m1 = omp_get_thread_num()&lt;BR /&gt;&lt;BR /&gt; write(*,*) 'outer: Running on nthread1=',nthread1,m1&lt;BR /&gt; write(*,*) 'outer: ',omp_get_nested()&lt;BR /&gt;&lt;BR /&gt; call inner(m1,n,a(:,:,m1))&lt;BR /&gt; &lt;BR /&gt; write(*,*) 'outer: done ',m1&lt;BR /&gt; &lt;BR /&gt;!$omp end parallel&lt;BR /&gt; end program tomp&lt;BR /&gt;&lt;BR /&gt; subroutine inner(m1,n,a)&lt;BR /&gt;&lt;BR /&gt; real a(n,n)&lt;BR /&gt; integer,external:: omp_get_num_threads&lt;BR /&gt; integer,external:: omp_get_thread_num&lt;BR /&gt; logical,external:: omp_get_nested&lt;BR /&gt;&lt;BR /&gt; integer m1,i,j,m2,nthread2,k&lt;BR /&gt;&lt;BR /&gt;!$omp parallel private(m2,i,j,k,a) num_threads(3)&lt;BR /&gt;&lt;BR /&gt; nthread2 = omp_get_num_threads()&lt;BR /&gt; m2 = omp_get_thread_num()&lt;BR /&gt; nested = omp_get_nested()&lt;BR /&gt;&lt;BR /&gt; write(*,*) 'inner: Running on nthread2=',nthread2,m2,m1&lt;BR /&gt; write(*,*) 'inner: ',omp_get_nested()&lt;BR /&gt; &lt;BR /&gt; a=0.&lt;BR /&gt; do k=1,1000&lt;BR /&gt; do i=1,n&lt;BR /&gt; do j=1,n&lt;BR /&gt; a(i,j) = sin(a(i,j))**2.&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt;&lt;BR /&gt; write(*,*) 'inner: done ',m2&lt;BR /&gt;!$omp end parallel&lt;BR /&gt;&lt;BR /&gt; end subroutine inner&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 06 Sep 2010 09:48:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748914#M5965</guid>
      <dc:creator>jonathanvincent</dc:creator>
      <dc:date>2010-09-06T09:48:44Z</dc:date>
    </item>
    <item>
      <title>Ifort 11.1 bug (?) segfault with nested OpenMP and large privat</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748915#M5966</link>
      <description>First, I suggest you use&lt;BR /&gt;&lt;BR /&gt; USE OMP_LIB&lt;BR /&gt;&lt;BR /&gt;To declare the OpenMP library interfaces.&lt;BR /&gt;&lt;BR /&gt;Second, declaring in inner: real a(n,n)&lt;BR /&gt;with n as a parameter (=1000) may (one of)&lt;BR /&gt;&lt;BR /&gt; allocate a(n,n) as SAVE&lt;BR /&gt; allocate a(n,n) on stack&lt;BR /&gt; allocate a(n,n) off heap&lt;BR /&gt;&lt;BR /&gt;As written it is ambiguous as to what will happen&lt;BR /&gt;&lt;BR /&gt;For OpenMP you would want inner's a to be local to thread calling inner. IOW .not. SAVE.&lt;BR /&gt;To assure .not. SAVE&lt;BR /&gt;&lt;BR /&gt; recursive subroutine inner(m1)&lt;BR /&gt; ...&lt;BR /&gt; real a(n,n)&lt;BR /&gt;.or.&lt;BR /&gt; real, allocatable :: a(:,:)&lt;BR /&gt; ...&lt;BR /&gt; allocate a(n,n)&lt;BR /&gt; ...&lt;BR /&gt; deallocate(a)&lt;BR /&gt; end subroutine inner&lt;BR /&gt;&lt;BR /&gt;However, creating a on stack will consume 4MB or 8MB of stack space&lt;BR /&gt;to avoid this, consider enabling heap arrays .or. using the real, allocatable :: a(:,:) technique.&lt;BR /&gt;Use of option heap arrays is unclear in the code. The next person supporting your code might not be aware of this and neglect to include the compiler option and therefore inadvertantly introduce a bug into their code design (code is correct but not performing as expected/required).&lt;BR /&gt;&lt;BR /&gt;keep aware that inside the parallel region within inner, that m2 is the 0-based team member number of the team established by team member m1 of the thread team calling inner (and m2==0 for each calling team member from the caller thread team). IOW assuming all threads are granted, you will have 12 threads&lt;BR /&gt;&lt;BR /&gt;and write(*,*) 'inner: done ',m2&lt;BR /&gt;&lt;BR /&gt;will write 4 sets of m2 = 0,1,2 (interleaved arbitrarily).&lt;BR /&gt;&lt;BR /&gt;You might need to insert !$OMP critical around your writes&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
      <pubDate>Mon, 06 Sep 2010 10:49:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748915#M5966</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-09-06T10:49:03Z</dc:date>
    </item>
    <item>
      <title>Ifort 11.1 bug (?) segfault with nested OpenMP and large privat</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748916#M5967</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Thanks for the input, I agree with the use of the module.&lt;BR /&gt;&lt;BR /&gt;I am not quite sure what your point is with the rest of it though. I guess I could have been more clear. This was intended as a small example, the first one generates a segmentation violation, if n is large enough (1000 on our systems), but does not if n is small (around 10). This segmentation fault also happens if for example you allocate the array etc.&lt;BR /&gt;&lt;BR /&gt;There seems to be a problem with nested OpenMP, and large private arrays.&lt;BR /&gt;&lt;BR /&gt;If you take the outer omp parallel region out, there is no seg fault. If you leave it in, but select 1 thread (which should be pretty equivilent) then there is a segfault.&lt;BR /&gt;&lt;BR /&gt;There does seem to be a problem where you have nested parallel regions, where a large array is private in the first parallel region, and then also private in the second. (In our case that would give 12 copies of the array, and take up about 48 MB of memory). As I understand it this is allowed by the standard, but a segmentation fault occurs.&lt;BR /&gt;&lt;BR /&gt;When you have a shared array in the first parallel region, then a private array in the 2nd (which still gives a 1000,1000,12 array in effect then you do not get the segnemtation violation at runtime.&lt;BR /&gt;&lt;BR /&gt;So what I was more interested in, is nesting private variables legal, or is there anything else in there which could potentially cause a segmentation violation, or is there a compiler bug?&lt;BR /&gt;&lt;BR /&gt;Jon</description>
      <pubDate>Mon, 06 Sep 2010 15:15:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748916#M5967</guid>
      <dc:creator>jonathanvincent</dc:creator>
      <dc:date>2010-09-06T15:15:35Z</dc:date>
    </item>
    <item>
      <title>Ifort 11.1 bug (?) segfault with nested OpenMP and large privat</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748917#M5968</link>
      <description>Ok cut the example down even more.&lt;BR /&gt;&lt;BR /&gt;if othreads=1 or n=10 then it works fine. For large n with othreads &amp;gt;= 2 then we get a segmentation violation.&lt;BR /&gt;&lt;BR /&gt;My understanding is that the end result of othreads=1 ithreads=4 and othreads=2 and ithreads=2 should be pretty much the same. Except the second one results in a segfault, and the first one does not. Interstingly othreads=4 ithreads=1 also gives a segfault.&lt;BR /&gt;&lt;BR /&gt;I am happy to be shown to be wrong, but it does look like something is not working as it should with the compiled code.&lt;BR /&gt;&lt;BR /&gt; program tomp&lt;BR /&gt; use OMP_LIB&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer, parameter :: othreads=2&lt;BR /&gt; integer, parameter :: ithreads=2&lt;BR /&gt; integer, parameter :: n=1000&lt;BR /&gt;&lt;BR /&gt; integer i,j&lt;BR /&gt;&lt;BR /&gt; real :: a(n,n)&lt;BR /&gt;&lt;BR /&gt; call omp_set_nested(.true.)&lt;BR /&gt; &lt;BR /&gt;!$omp parallel private(i,j,a) num_threads(othreads)&lt;BR /&gt;!$omp parallel private(i,j,a) num_threads(ithreads)&lt;BR /&gt; &lt;BR /&gt; a=0.&lt;BR /&gt; do i=1,n&lt;BR /&gt; do j=1,n&lt;BR /&gt; a(i,j) = sin(a(i,j))**2.0d0&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt;&lt;BR /&gt;!$omp end parallel &lt;BR /&gt;!$omp end parallel&lt;BR /&gt;&lt;BR /&gt; end program tomp&lt;BR /&gt;</description>
      <pubDate>Mon, 06 Sep 2010 16:06:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748917#M5968</guid>
      <dc:creator>jonathanvincent</dc:creator>
      <dc:date>2010-09-06T16:06:24Z</dc:date>
    </item>
    <item>
      <title>Ifort 11.1 bug (?) segfault with nested OpenMP and large privat</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748918#M5969</link>
      <description>program tomp&lt;BR /&gt; use OMP_LIB&lt;BR /&gt; implicit none&lt;BR /&gt;&lt;BR /&gt; integer, parameter :: othreads=2&lt;BR /&gt; integer, parameter :: ithreads=2&lt;BR /&gt; integer, parameter :: n=1000&lt;BR /&gt;&lt;BR /&gt; integer i,j&lt;BR /&gt; ! declare array descriptor only&lt;BR /&gt; real, ALLOCATABLE:: a(:,:)&lt;BR /&gt;&lt;BR /&gt; call omp_set_nested(.true.)&lt;BR /&gt; &lt;BR /&gt;!$omp parallel private(i,j,a) num_threads(othreads)&lt;BR /&gt; ! here, each outer level thread has private (stack located) unallocated array descriptor&lt;BR /&gt; ! (less than 100 bytes of stack consumed per thread)&lt;BR /&gt;!$omp parallel private(i,j,a) num_threads(ithreads)&lt;BR /&gt; ! here, each inner level thread has private (stack located) unallocated array descriptor&lt;BR /&gt; ! (less than 100 bytes of stack consumed per thread)&lt;BR /&gt; ! now allocate seperate/private memory blocks per thread&lt;BR /&gt; ALLOCATE(a(n,n))&lt;BR /&gt; a=0.&lt;BR /&gt; do i=1,n&lt;BR /&gt; do j=1,n&lt;BR /&gt; a(i,j) = sin(a(i,j))**2.0d0&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt; ! each thread deallocates seperate/private memory block&lt;BR /&gt; DEALLOCATE(a)&lt;BR /&gt;!$omp end parallel &lt;BR /&gt;!$omp end parallel&lt;BR /&gt;&lt;BR /&gt; end program tomp</description>
      <pubDate>Tue, 07 Sep 2010 14:12:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748918#M5969</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-09-07T14:12:14Z</dc:date>
    </item>
    <item>
      <title>Ifort 11.1 bug (?) segfault with nested OpenMP and large privat</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748919#M5970</link>
      <description>When an application is built with -openmp, local arrays are placed on the stack, so that each thread can have a private copy, should that benecessary. Since the default maximum stack size is quite small on many Linux distributions, the maximum often needs to be increased for OpenMP applications, to avoid a seg fault when the maximum is exceeded. Try ulimit -s unlimited (or limit stacksize unlimited for the C shell).&lt;BR /&gt;&lt;BR /&gt;If the array is actually made private in an OpenMP parallel region, the thread stack size may also need to be increased, either with an environment variable (OMP_STACKSIZE or KMP_STACKSIZE) or with a corresponding RTL call. These result in actual memory allocations and are not upper limits (unlike the shell stack limit), so the values specified should not be arbitrarily large.&lt;BR /&gt;&lt;BR /&gt;See also the following:&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/threading-fortran-applications-for-parallel-performance-on-multi-core-systems/"&gt;http://software.intel.com/en-us/articles/threading-fortran-applications-for-parallel-performance-on-multi-core-systems/&lt;/A&gt; &lt;BR /&gt;and&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/openmp-option-no-pragmas-causes-segmentation-fault"&gt;http://software.intel.com/en-us/articles/openmp-option-no-pragmas-causes-segmentation-fault&lt;/A&gt; /&lt;BR /&gt;and&lt;BR /&gt;&lt;A href="http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/optaps/common/optaps_par_var.htm"&gt;http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/optaps/common/optaps_par_var.htm&lt;/A&gt;</description>
      <pubDate>Fri, 10 Sep 2010 22:43:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Ifort-11-1-bug-segfault-with-nested-OpenMP-and-large-private/m-p/748919#M5970</guid>
      <dc:creator>Martyn_C_Intel</dc:creator>
      <dc:date>2010-09-10T22:43:33Z</dc:date>
    </item>
  </channel>
</rss>

