<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to Debug / Open MP related in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140696#M137099</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I am using the Intel Fortran compiler for Windows -- Parallel Studio XE 2019 Update 5 -- with Microsoft Visual Studio.&lt;/P&gt;&lt;P&gt;My codes successfully compiles and runs smoothly under Debug mode. However, under Release mode it crashes with message (from Visual Studio):&lt;/P&gt;&lt;P&gt;Exception thrown at 0x00711AAE in TradeInformality_FineGrid.exe: 0xC0000005: Access violation reading location 0x00000000.&lt;/P&gt;&lt;P&gt;After some research, I found out it crashes here (and, more precisely, as soon as it enters the parallel do part):&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;    maxthr = omp_get_max_threads()
    
    ! Set the number of threads
    Call omp_set_num_threads(maxthr)
	
	!$omp parallel do private(j,k)
    do k = 1, nZ
        do j = 1, n_nodes
			BBT(j,k) = maxval( tmp(:,k) - C_subT(:,j) )
        end do
    end do
    !$omp end parallel do&lt;/PRE&gt;

&lt;P&gt;If I remove the OpenMP directives or if I "Generate Sequential Code (/Qopenmp_stubs)", the code runs fine. So, I am unsure what may be wrong here. Any ideas on how to debug this?&lt;/P&gt;
&lt;P&gt;Many thanks,&lt;BR /&gt;Rafael&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;    
&amp;nbsp;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 14 Nov 2019 16:27:43 GMT</pubDate>
    <dc:creator>Dix_Carneiro__Rafael</dc:creator>
    <dc:date>2019-11-14T16:27:43Z</dc:date>
    <item>
      <title>How to Debug / Open MP related</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140696#M137099</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I am using the Intel Fortran compiler for Windows -- Parallel Studio XE 2019 Update 5 -- with Microsoft Visual Studio.&lt;/P&gt;&lt;P&gt;My codes successfully compiles and runs smoothly under Debug mode. However, under Release mode it crashes with message (from Visual Studio):&lt;/P&gt;&lt;P&gt;Exception thrown at 0x00711AAE in TradeInformality_FineGrid.exe: 0xC0000005: Access violation reading location 0x00000000.&lt;/P&gt;&lt;P&gt;After some research, I found out it crashes here (and, more precisely, as soon as it enters the parallel do part):&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;    maxthr = omp_get_max_threads()
    
    ! Set the number of threads
    Call omp_set_num_threads(maxthr)
	
	!$omp parallel do private(j,k)
    do k = 1, nZ
        do j = 1, n_nodes
			BBT(j,k) = maxval( tmp(:,k) - C_subT(:,j) )
        end do
    end do
    !$omp end parallel do&lt;/PRE&gt;

&lt;P&gt;If I remove the OpenMP directives or if I "Generate Sequential Code (/Qopenmp_stubs)", the code runs fine. So, I am unsure what may be wrong here. Any ideas on how to debug this?&lt;/P&gt;
&lt;P&gt;Many thanks,&lt;BR /&gt;Rafael&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;    
&amp;nbsp;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2019 16:27:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140696#M137099</guid>
      <dc:creator>Dix_Carneiro__Rafael</dc:creator>
      <dc:date>2019-11-14T16:27:43Z</dc:date>
    </item>
    <item>
      <title>Can you show the declarations</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140697#M137100</link>
      <description>&lt;P&gt;Can you show the declarations of BBT, tmp and C_subT?&lt;/P&gt;&lt;P&gt;Reading location 0x00000000 would indicate one of them is (may be) an uninitialized pointer or unallocated array.&lt;/P&gt;&lt;P&gt;Apparently you are running a 32-bit application.&lt;/P&gt;&lt;P&gt;In Release build, see what happens if you add the runtime check for array bounds checking. This will inhibit vectorization of the loop, but it should not affect the declarations of BBT, tmp and C_subT.&lt;/P&gt;&lt;P&gt;Also, in Release build, without the runtime check for array bounds checking, what happens with !DIR$ NOVECTOR placed in front of do j loop?&lt;/P&gt;&lt;P&gt;I seem to recall an old bug that may have resurfaced itself where one of the CPU registers used to reference base of an array is erroneously zeroed. If you are adventuresome can you generate your Release build with Debug symbols (both compiler and linker options) place a break point on the maxval statement. pause all threads except for the current thread (threads pane in debugger), open the registers and disassembly windows then single step with focus in the disassembly window. Before each step, see if the base register is zero.&lt;/P&gt;
&lt;PRE class="brush:plain; class-name:dark;"&gt;008012C2  movups      xmmword ptr [edx+eax*8+10h],xmm2  
&lt;/PRE&gt;

&lt;P&gt;In the above, edx is the base register, eax is the index, and 8 is the scale factor, 10h is an offset&lt;/P&gt;
&lt;P&gt;Because the target address of the exception was 0x00000000, I would expect the two registers and offset to be 0.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2019 19:58:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140697#M137100</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-11-14T19:58:54Z</dc:date>
    </item>
    <item>
      <title>Dear Jim,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140698#M137101</link>
      <description>&lt;P&gt;Dear Jim,&lt;/P&gt;&lt;P&gt;Many thanks for your thoughtful response!&lt;/P&gt;&lt;P&gt;Yes, I double checked, and the variables you mention seem to be well declared and initialized.&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;real(KIND=DOUBLE), dimension(:,:), allocatable :: tmp(:,:), BBT(:,:), C_subT(:,:)

allocate(C_subT(nE,n_nodes))
allocate(BBT(n_nodes,nZ))
allocate(tmp(nE,nZ))&lt;/PRE&gt;

&lt;P&gt;"In Release build, see what happens if you add the runtime check for array bounds checking." So, if I add the runtime check for array bounds checking, the code runs smoothly. No error!&lt;/P&gt;
&lt;P&gt;"In Release build, without the runtime check for array bounds checking, what happens with !DIR$ NOVECTOR placed in front of do j loop?" I get the same error!&lt;/P&gt;

&lt;PRE class="brush:; class-name:dark;"&gt;        !$omp parallel do private(j,k)
        do k = 1, nZ
            !DIR$ NOVECTOR
            do j = 1, n_nodes
                BBT(j,k) = maxval( tmp(:,k) - C_subT(:,j) )
            end do
        end do
        !$omp end parallel do&lt;/PRE&gt;

&lt;P&gt;I have not fully understood the rest of your suggestions. How can I "generate my Release build with Debug symbols (both compiler and linker options) "?&lt;/P&gt;
&lt;P&gt;Many thanks again,&lt;BR /&gt;Rafael&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 06:24:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140698#M137101</guid>
      <dc:creator>Dix_Carneiro__Rafael</dc:creator>
      <dc:date>2019-11-15T06:24:28Z</dc:date>
    </item>
    <item>
      <title>In the VS IDE select the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140699#M137102</link>
      <description>&lt;P&gt;In the VS IDE select the Release Build&lt;BR /&gt;then in the Solution Explorer pane Right-Click on the Project for the application, then choose Properties&lt;BR /&gt;Verify, and select if necessary, that the Configuration and Platform pull-downs are set for Release (or all) and the platform of choice.&lt;BR /&gt;Expand Configuration Properties&lt;BR /&gt;Expand Fortran&lt;BR /&gt;Select General&lt;BR /&gt;Click in the value field of the property Debug Information Format, pull-down and select Full, Click Apply button&lt;BR /&gt;Expand Linker&lt;BR /&gt;Click on Debugging&lt;BR /&gt;Click in value field of Generate Debug Info, pull-down and select Yes&lt;BR /&gt;Click Apply, OK&lt;BR /&gt;Rebuild&lt;/P&gt;&lt;P&gt;Note, different versions MS VS IDE may have different legends and/or Property tree organizations. IOW you may have to hunt a little to locate these properties.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 13:09:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140699#M137102</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-11-15T13:09:29Z</dc:date>
    </item>
    <item>
      <title>Dear Jim,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140700#M137103</link>
      <description>&lt;P&gt;Dear Jim,&lt;/P&gt;&lt;P&gt;Many thanks for the details.&lt;/P&gt;&lt;P&gt;I compiled the code on Release mode with the debug options you requested. I also added a break point where you suggested. Here is the result:&lt;/P&gt;&lt;P&gt;0062F767 &amp;nbsp;mov &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ecx,dword ptr [ebp+20h]&lt;/P&gt;&lt;P&gt;However, be aware that under the options above, the code runs smoothly. I am not able to replicate the error with these options.&lt;/P&gt;&lt;P&gt;Also, interestingly, if I add the "write" line below, the code also works smoothly. Do you think this is a bug in the compiler?&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;        !$omp parallel do private(j,k)
        do k = 1, nZ
            do j = 1, n_nodes
                write(*,*) 'k=', k, 'j=', j
                BBT(j,k) = maxval( tmp(:,k) - C_subT(:,j) )
            end do
        end do
        !$omp end parallel do&lt;/PRE&gt;

&lt;P&gt;Many thanks,&lt;BR /&gt;Rafael&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 16:08:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140700#M137103</guid>
      <dc:creator>Dix_Carneiro__Rafael</dc:creator>
      <dc:date>2019-11-15T16:08:36Z</dc:date>
    </item>
    <item>
      <title>Without the write statement,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140701#M137104</link>
      <description>&lt;P&gt;Without the write statement, that loop in release mode would likely execute using vector instructions. With the write statement, the loop will execute using scalar instructions. IOW different code (exclusive of write).&lt;/P&gt;&lt;P&gt;I do think at this point it appears to be a bug in the compiler.&lt;/P&gt;&lt;P&gt;As a means to coax the compiler in generating different SIMD code, try:&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;!$omp parallel do private(j,k)
do k = 1, nZ
    !dir$ simd
    do j = 1, n_nodes
        BBT(j,k) = maxval( tmp(:,k) - C_subT(:,j) )
    end do
end do
!$omp end parallel do
&lt;/PRE&gt;

&lt;P&gt;While the simd compiler directive shouldn't be required in this case, see if it corrects the problem.&lt;/P&gt;
&lt;P&gt;lf that is unproductive, try&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; !dir$ simd vectorlengthfor(double)&lt;/P&gt;
&lt;P&gt;You should submit a bug report and your work around if successful.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 17:09:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140701#M137104</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-11-15T17:09:18Z</dc:date>
    </item>
    <item>
      <title>*** Side note</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140702#M137105</link>
      <description>&lt;P&gt;*** Side note&lt;/P&gt;&lt;P&gt;maxval( tmp(:,k) - C_subT(:,j) )&lt;/P&gt;&lt;P&gt;will internally generate the equivalent of a DO loop, either scalar or vector.&lt;/P&gt;&lt;P&gt;Therefor, one other quick test is to try:&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;!$omp parallel do private(j,k)
do k = 1, nZ
    do j = 1, n_nodes
        !dir$ simd
        BBT(j,k) = maxval( tmp(:,k) - C_subT(:,j) )
    end do
end do
!$omp end parallel do

&lt;/PRE&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 17:13:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140702#M137105</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-11-15T17:13:53Z</dc:date>
    </item>
    <item>
      <title>Great, many thanks.</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140703#M137106</link>
      <description>&lt;P&gt;Great, many thanks.&lt;/P&gt;&lt;P&gt;Before I submit a bug report, there is one more piece of information.&lt;/P&gt;&lt;P&gt;I usually turn on the /Qparallel option, together with /Qopenmp:&lt;/P&gt;&lt;P&gt;/nologo /O2 /Qparallel /heap-arrays0 /Qopenmp /module:"Release\\" /object:"Release\\" /Fd"Release\vc150.pdb" /libs:static /threads /Qmkl:sequential /c&lt;/P&gt;&lt;P&gt;Now, if I remove&amp;nbsp;/Qparallel from the command line, I have no error and the code runs smoothly.&lt;/P&gt;&lt;P&gt;Is it wrong to compile with both&amp;nbsp;/Qparallel and&amp;nbsp;/Qopenmp?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 17:22:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140703#M137106</guid>
      <dc:creator>Dix_Carneiro__Rafael</dc:creator>
      <dc:date>2019-11-15T17:22:08Z</dc:date>
    </item>
    <item>
      <title>**** I usually turn on the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140704#M137107</link>
      <description>&lt;P&gt;**** I usually turn on the /Qparallel option, together with /Qopenmp&lt;/P&gt;&lt;P&gt;NO - Bad idea&lt;/P&gt;&lt;P&gt;Use one or none&lt;/P&gt;&lt;P&gt;The compiler can generate OpenMP directive parallelization, implicit parallelization, but it is bad and error prone practice to use both.&lt;/P&gt;&lt;P&gt;Your loop (without the !dir$ simd), and both options, would have generated code to use OpenMP on the do k loop, and auto-generate parallel code on:&lt;BR /&gt;do j&lt;BR /&gt;or&lt;BR /&gt;maxval&lt;BR /&gt;or do j an maxval&lt;/P&gt;&lt;P&gt;in the process you would be generating nested thread pools.&lt;/P&gt;&lt;P&gt;Assume your system has 8 hardware threads, the OpenMP loop will generate a top level OpenMP thread pool with 8 threads. Then each thread executing the parallel do j loop, when encountering the auto-parallel "region" will generate a non-OpenMP thread pool (even though it may do so borrowing code from OpenMP runtime system). Now your system will have 8 pools, each with 8 threads (64 threads), should maxval with the array expression itself be auto-parallelized within the auto-parallel do j loop, then each thread of that nested level will generate a non-OpenMP thread pool with 8 threads. 8*8*8 threads (512 threads).&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 18:18:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140704#M137107</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-11-15T18:18:27Z</dc:date>
    </item>
    <item>
      <title>Thank you very much, Jim. </title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140705#M137108</link>
      <description>&lt;P&gt;Thank you very much, Jim.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a general preference for using qparallel vs qopenmp for parallelizing loops?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 19:49:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140705#M137108</guid>
      <dc:creator>Dix_Carneiro__Rafael</dc:creator>
      <dc:date>2019-11-15T19:49:40Z</dc:date>
    </item>
    <item>
      <title>In my opinion, auto</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140706#M137109</link>
      <description>&lt;P&gt;In my opinion, auto-parallelism is only warranted in rather trivial programs that can benefit from parallelization and where the programmer (support person) is reluctant or prohibited from making any source code changes. By trivial I mean programs of low complexity that typically have loops&amp;nbsp;with no nest levels. In more complex programs, typically those with nested loops, it is difficult for the compiler to determine where best to place the auto-parallel regions, and in particular where detection of nested usage is not clear to the compiler, or where intrinsic functions (maxval on array expression) may not be aware that it is being executed within a parallel region.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 19:58:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-Debug-Open-MP-related/m-p/1140706#M137109</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-11-15T19:58:47Z</dc:date>
    </item>
  </channel>
</rss>

