<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Erroneous results in non-parallel sections when building pa in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890623#M3780</link>
    <description>&lt;P align="justify"&gt;Hi Jim, thnx for your answer.&lt;/P&gt;
&lt;P align="justify"&gt;I have used the diagnostic code you sugested and i can confirm that there's no parallel execution in that loop. The loopis in a subroutine which is called after an OMP PARALLEL / OMP Sections / ... / OMP END PARALLEL block. I have played arround with it a bit more, and i found no logical explanation for the behavior described above. I'll post more details, maybe some of you can help (or test this to see if you get the same behavior):&lt;/P&gt;&lt;PRE&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;DO ja=1,N; DO ia=1,M; &lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;DO jb=1,N; DO ib=1,M;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;     &lt;/SPAN&gt;A(ia,ja) = A(ia,ja) + K(M-ia+ib, N-ja+jb)*B(ib,jb);&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;ENDDO;ENDDO;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;FONT face="Times New Roman"&gt;ENDDO;ENDDO;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/PRE&gt;
&lt;OL&gt;
&lt;LI&gt;Loop orderExec. timeResult 
&lt;/LI&gt;&lt;LI&gt;ja, ia, jb, ib 1.27s ERR 
&lt;/LI&gt;&lt;LI&gt;ia, ja, jb, ib 1.27s ERR 
&lt;/LI&gt;&lt;LI&gt;ja, jb, ia, ib0.52s ERR 
&lt;/LI&gt;&lt;LI&gt;ja, jb, ib, ia 2.23sOK 
&lt;/LI&gt;&lt;LI&gt;jb, ja, ib, ia 2.23s&amp;amp;
nbsp; OK&lt;/LI&gt;&lt;/OL&gt;
&lt;P align="justify"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;FONT style="BACKGROUND-COLOR: #d4d0c8" face="Times New Roman" size="3"&gt;Also, performing additional calculus between loops (like computing k=M-ia; l = N-ja) seems to change the loop order required for correct computations, but it doesnt solve the problem for all possible loop orders (when it does, execution time is 6.4s).&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P align="justify"&gt;Now, if i introduce a temporary summation variable between loops 2 and 3, i do get correct results for all loop orders i have tested, execuion timebeing 1.62s:&lt;/P&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;DO ja=1,N; DO ia=1,M;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;t_sum = 0.d0;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;DO jb=1,N; DO ib=1,M;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;     &lt;/SPAN&gt;t_sum = t_sum + K(M-ia+ib, N-ja+jb)*B(ib,jb);&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;ENDDO;ENDDO;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; mso-ansi-language: FR"&gt;A(ia,ja) = A(ia,ja) + t_sum;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; mso-ansi-language: FR"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;FONT face="Times New Roman"&gt;ENDDO;ENDDO;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/PRE&gt;
&lt;P align="justify"&gt;However, all these tweakings do not gu
arantee good execution for the rest of the code. At this point i'm evalluating the performance loss from using /Op (improving FP consistency), but it seems quite harsh; loop execution time passes to 6.2s in single processing and 3.2s in OMP. So, i'm still in search of a better solution, if anyone can help. Thnx.&lt;/P&gt;
&lt;P align="justify"&gt;P.S. Forgot to say that if loop is parallelized (OMP PARALLEL), i get correct results for all loop orders, best execution time being 0.97s for (ja, ia, jb, ib) order and 0.52s for (ja, jb, ia, ib) order.&lt;/P&gt;</description>
    <pubDate>Tue, 09 Jan 2007 22:48:30 GMT</pubDate>
    <dc:creator>mdobrica</dc:creator>
    <dc:date>2007-01-09T22:48:30Z</dc:date>
    <item>
      <title>Erroneous results in non-parallel sections when building parallel code (OpenMP)</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890621#M3778</link>
      <description>&lt;P&gt;&lt;FONT color="#000000"&gt;Hi, i have encountered this issue several times with IVF 9.1 on Core 2 Duo processor, Win32. I get wrong results from calculations performed in loops found outside parallel regions. This only happens when parallel code is generated (i have otherparallel openmp regions). The loop in question is a kind of matrix-matrix multiplication:&lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;DO&lt;/FONT&gt;&lt;FONT size="2"&gt; ja=&lt;/FONT&gt;&lt;FONT size="2"&gt;1&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;FONT color="#000000"&gt;,N; &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;DO&lt;/FONT&gt;&lt;FONT size="2"&gt; ia=&lt;/FONT&gt;&lt;FONT size="2"&gt;1&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;FONT color="#000000"&gt;,M; &lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;DO&lt;/FONT&gt;&lt;FONT size="2"&gt; jb=&lt;/FONT&gt;&lt;FONT size="2"&gt;1&lt;/FONT&gt;&lt;FONT size="2"&gt;,N; &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;DO&lt;/FONT&gt;&lt;FONT size="2"&gt; ib=&lt;/FONT&gt;&lt;FONT size="2"&gt;1&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;FONT color="#000000"&gt;,M;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;A(ia,ja) = A(ia,ja) + K(M-ia+ib, N-ja+jb)*B(ib,jb);&lt;/FONT&gt;&lt;/P&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;ENDDO&lt;/FONT&gt;&lt;FONT size="2"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#000000" size="2"&gt;ENDDO&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;FONT color="#000000"&gt;;&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#000000" size="2"&gt;ENDDO&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;FONT color="#000000"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;ENDDO&lt;/FONT&gt;&lt;FONT color="#ffffff" size="2"&gt;&lt;FONT color="#000000"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;When this code is executed, the resulting matrix A may differ by several orders of magnitude from thecorrect results. This happens even if matrices are small (M,N&amp;lt;100), with autoparallelization turned off (but OpenMP directives are processed). As i said before, this only happens in non-parallel regions of the code.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;I have found different workarounds to this problem, but yet i'm not sure all my codeis running as it should. Possible solutions were:1) tochange the looping order from (ja, ia, jb, ib) to (ja, jb, ia, ib); 2) to parallelize the code with !$OMP PARALLEL DO REDUCTION(+:A); 3) to set "Improve FP consistency" compiler option (/Op); &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;I was wondering if anyone else has encountered this issue and if there is any known workaround that wouldensure that such errors do not produce (since it seems to me this is a quite particular error). Thnx for having read all this this :)&lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;&lt;FONT color="#ffffff" size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;FONT color="#ffffff" size="2"&gt;&lt;FONT color="#000000"&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;FONT color="#000000"&gt;&lt;FONT color="#ffffff"&gt;&lt;FONT color="#000000"&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Times New Roman" color="#000000"&gt;&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;FONT color="#ffffff" size="2"&gt;&lt;FONT face="Times New Roman" color="#000000" size="3"&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 08 Jan 2007 14:10:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890621#M3778</guid>
      <dc:creator>mdobrica</dc:creator>
      <dc:date>2007-01-08T14:10:48Z</dc:date>
    </item>
    <item>
      <title>Re: Erroneous results in non-parallel sections when building pa</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890622#M3779</link>
      <description>&lt;P&gt;If the above nested loop is experiencing problems when you believe it is not executed in parallel but you have other OpenMP parallel sections in your program, then it is likely that your assumption is incorrect. For example a preceeding parallel section terminated with a NOWAIT.&lt;/P&gt;
&lt;P&gt;Before the 1st DO insert the diagnostic code&lt;/P&gt;
&lt;P&gt;if(OMP_IN_PARALLEL()) then&lt;BR /&gt;STOP ! place break point here&lt;BR /&gt;endif&lt;/P&gt;
&lt;P&gt;The above does not catch all such problems as the Master thread may have exited a parallel region (via NOWAIT) while a different thread is still processing not only array A but K and B as well i.e. you get to the summation loop on Aprior to processing on K and B being complete.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2007 00:57:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890622#M3779</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-01-09T00:57:12Z</dc:date>
    </item>
    <item>
      <title>Re: Erroneous results in non-parallel sections when building pa</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890623#M3780</link>
      <description>&lt;P align="justify"&gt;Hi Jim, thnx for your answer.&lt;/P&gt;
&lt;P align="justify"&gt;I have used the diagnostic code you sugested and i can confirm that there's no parallel execution in that loop. The loopis in a subroutine which is called after an OMP PARALLEL / OMP Sections / ... / OMP END PARALLEL block. I have played arround with it a bit more, and i found no logical explanation for the behavior described above. I'll post more details, maybe some of you can help (or test this to see if you get the same behavior):&lt;/P&gt;&lt;PRE&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;DO ja=1,N; DO ia=1,M; &lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;DO jb=1,N; DO ib=1,M;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;     &lt;/SPAN&gt;A(ia,ja) = A(ia,ja) + K(M-ia+ib, N-ja+jb)*B(ib,jb);&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;ENDDO;ENDDO;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;FONT face="Times New Roman"&gt;ENDDO;ENDDO;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/PRE&gt;
&lt;OL&gt;
&lt;LI&gt;Loop orderExec. timeResult 
&lt;/LI&gt;&lt;LI&gt;ja, ia, jb, ib 1.27s ERR 
&lt;/LI&gt;&lt;LI&gt;ia, ja, jb, ib 1.27s ERR 
&lt;/LI&gt;&lt;LI&gt;ja, jb, ia, ib0.52s ERR 
&lt;/LI&gt;&lt;LI&gt;ja, jb, ib, ia 2.23sOK 
&lt;/LI&gt;&lt;LI&gt;jb, ja, ib, ia 2.23s&amp;amp;
nbsp; OK&lt;/LI&gt;&lt;/OL&gt;
&lt;P align="justify"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;FONT style="BACKGROUND-COLOR: #d4d0c8" face="Times New Roman" size="3"&gt;Also, performing additional calculus between loops (like computing k=M-ia; l = N-ja) seems to change the loop order required for correct computations, but it doesnt solve the problem for all possible loop orders (when it does, execution time is 6.4s).&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P align="justify"&gt;Now, if i introduce a temporary summation variable between loops 2 and 3, i do get correct results for all loop orders i have tested, execuion timebeing 1.62s:&lt;/P&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;DO ja=1,N; DO ia=1,M;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;t_sum = 0.d0;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;DO jb=1,N; DO ib=1,M;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;     &lt;/SPAN&gt;t_sum = t_sum + K(M-ia+ib, N-ja+jb)*B(ib,jb);&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;ENDDO;ENDDO;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; mso-ansi-language: FR"&gt;A(ia,ja) = A(ia,ja) + t_sum;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; mso-ansi-language: FR"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-pagination: none; tab-stops: 0in 47.95pt 95.9pt 143.85pt 191.8pt 239.75pt 287.7pt 335.65pt 383.6pt 431.55pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black"&gt;&lt;FONT face="Times New Roman"&gt;ENDDO;ENDDO;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/PRE&gt;
&lt;P align="justify"&gt;However, all these tweakings do not gu
arantee good execution for the rest of the code. At this point i'm evalluating the performance loss from using /Op (improving FP consistency), but it seems quite harsh; loop execution time passes to 6.2s in single processing and 3.2s in OMP. So, i'm still in search of a better solution, if anyone can help. Thnx.&lt;/P&gt;
&lt;P align="justify"&gt;P.S. Forgot to say that if loop is parallelized (OMP PARALLEL), i get correct results for all loop orders, best execution time being 0.97s for (ja, ia, jb, ib) order and 0.52s for (ja, jb, ia, ib) order.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2007 22:48:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890623#M3780</guid>
      <dc:creator>mdobrica</dc:creator>
      <dc:date>2007-01-09T22:48:30Z</dc:date>
    </item>
    <item>
      <title>Re: Erroneous results in non-parallel sections when building pa</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890624#M3781</link>
      <description>&lt;P&gt;Please note that&lt;/P&gt;
&lt;P&gt;OMP PARALLEL / OMP Sections / ... / OMP END PARALLEL &lt;/P&gt;
&lt;P&gt;May be initiated within a parallel section. i.e. when using nested parallel sections. &lt;/P&gt;
&lt;P&gt;The OMP_IN_PARALLEL() should have caught that though&lt;/P&gt;
&lt;P&gt;If the problem is not due to OpenMP threading then it could potentialy be due to a compiler bug due to loop unrolling (bug) or autoparallization (bug).&lt;/P&gt;
&lt;P&gt;The use of the temporary should have caused the summation to run faster.&lt;/P&gt;
&lt;P&gt;The summation loops are a good candidate for explicit parallization and vectorization&lt;/P&gt;
&lt;P&gt;Your nested loop looks like it is a good candidate for OpenMP with vectorization.&lt;/P&gt;
&lt;P&gt;Rework to use&lt;/P&gt;&lt;PRE&gt;!dec$ attributes align : 16 :: t_sum&lt;BR /&gt;real(8), automatic :: t_sum(2)&lt;BR /&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;FONT face="Times New Roman"&gt;Then change the inner loop to run ib in two steps at a time.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;What is in K? A selector (0/1) or a scale factor?&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2007 04:37:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890624#M3781</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-01-10T04:37:26Z</dc:date>
    </item>
    <item>
      <title>Re: Erroneous results in non-parallel sections when building pa</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890625#M3782</link>
      <description>&lt;P align="justify"&gt;&lt;FONT style="BACKGROUND-COLOR: #ffffff" color="#000000"&gt;Hi again and thnx for your answer. I'm not using nested parallel sections, and i guess you're right when suspecting an autoparallelization or loop unrolling bug.&lt;/FONT&gt;&lt;/P&gt;
&lt;P align="justify"&gt;&lt;FONT style="BACKGROUND-COLOR: #ffffff" color="#000000"&gt;The use of the temporary did cause the summation to run faster, and i found it to be even faster (1.02s) and also correct if it is applied only for the innermost loop (thus allowing the use of the ja, jb, ia, ib looping order). It is interesting that the optimal singlethread correct execution takes exactly twice the time of the fastest incorect singlethread execution (which, in turn,equals the fastest execution in OMP with 2 threads). This gives the ideea that the loop gets autoparallelized, and probably a reduction clause isnt used by the compiler. This is strange, however, since i turned off the autoparallelization option of the compiler.&lt;/FONT&gt;&lt;/P&gt;
&lt;P align="justify"&gt;&lt;FONT style="BACKGROUND-COLOR: #ffffff" color="#000000"&gt;Running in two steps at a time worsens computing time in bothparallel and single-thread execution(3.0s in siglethread) (maybe i wrote something wrong, althoughi think it's relatedto missprediction by the CPU). K is a scale factor.&lt;/FONT&gt;&lt;/P&gt;
&lt;P align="justify"&gt;&lt;FONT style="BACKGROUND-COLOR: #ffffff" color="#000000"&gt;The issue, however, is not makingthis particular loop run faster (although its an interesting exercise). I'm concerned since i have lots of loops in my code that don't normallyneed parallelization (since they are only called once in a while), and now i find myself forced to check the correct execution of each loop to make sure i dont get wrong results.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2007 06:32:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890625#M3782</guid>
      <dc:creator>mdobrica</dc:creator>
      <dc:date>2007-01-10T06:32:51Z</dc:date>
    </item>
    <item>
      <title>Re: Erroneous results in non-parallel sections when building pa</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890626#M3783</link>
      <description>&lt;P&gt;I would suggest configuring the code where it looks correct but produces incorrect results. Then compile with optimizations off and on. Also experiment with disabling SSE instructions.Assuming a temporal dependency is not at issue...if you can identify a failure mode between options then this would indicate a compiler bug. A simple test app could be created and submitted to the Premeir site.&lt;/P&gt;
&lt;P&gt;The double speed can be due to vectorization (use of SSE to compute 2 REAL(8) or 4 REAL(4) operations in one instruction). By turning on/off the SSE instructions you can affect the vectorization code.&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jan 2007 00:03:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Erroneous-results-in-non-parallel-sections-when-building/m-p/890626#M3783</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-01-12T00:03:55Z</dc:date>
    </item>
  </channel>
</rss>

