<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Is VTune telling you where in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999962#M103393</link>
    <description>&lt;P&gt;Is VTune telling you where those synchronization calls are coming from in your code? Trace Analyzer and Collector's timeline display can be helpful in understanding what is happening.&lt;/P&gt;</description>
    <pubDate>Thu, 14 May 2015 12:17:38 GMT</pubDate>
    <dc:creator>Steven_L_Intel1</dc:creator>
    <dc:date>2015-05-14T12:17:38Z</dc:date>
    <item>
      <title>Coarray sync problems</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999957#M103388</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Is there a way to resolv coarray sync issues with vtune? Is there maybe&lt;/P&gt;

&lt;P&gt;a tutorial for this?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thank you&lt;/P&gt;

&lt;P&gt;Jan&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2015 08:35:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999957#M103388</guid>
      <dc:creator>Jan_W_2</dc:creator>
      <dc:date>2015-05-13T08:35:17Z</dc:date>
    </item>
    <item>
      <title>Please describe the problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999958#M103389</link>
      <description>&lt;P&gt;Please describe the problem in more detail. VTune Amplifier XE won't help with coarray issues as MPI is used. Intel Trace Analyzer and Collector can help with this.&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2015 13:37:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999958#M103389</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2015-05-13T13:37:47Z</dc:date>
    </item>
    <item>
      <title>Dear Steve,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999959#M103390</link>
      <description>&lt;P&gt;Dear Steve,&lt;/P&gt;

&lt;P&gt;it is a transpose-free QMR solver. I want to use coarrays for parallelize the matrix-vector product.&lt;/P&gt;

&lt;P&gt;Unfortunally, if I use more than one image the calculations slow down.&lt;/P&gt;

&lt;P&gt;This is the matrix-vector product routine:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;subroutine matvec(a,x,y)
      type(coo_matrix), intent(in)       :: a
      complex(dp), dimension(:), intent(in) :: x
      complex(dp), dimension(size(x,1)), intent(out) :: y
      complex(dp), allocatable,dimension(:)  :: tmp[:]
      integer :: i, me, numi

      me   = this_image()
      numi = num_images()
!allocate 
      allocate(tmp(a%n)&lt;LI&gt;)
      y   = cmplx(0.0_dp, 0.0_dp,dp)
      tmp = cmplx(0.0_dp, 0.0_dp,dp)
!sync
      sync all
!Add locally all
      do i = 1, a%loc_nnz
        tmp(a%ir(i)) = tmp(a%ir(i)) + (a%val(i) * x(a%jc(i)))
      end do
!sync
      sync all
!Sum all coarrays together
      y = globalSum_serial(tmp, a%n)
!deallocate
      deallocate(tmp)
end subroutine&lt;/LI&gt;&lt;/PRE&gt;

&lt;P&gt;And the sum function is:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;function globalSum_serial(vec,n) result(this)
    complex(dp), dimension(:), intent(inout) :: vec&lt;LI&gt;
    integer, intent(in) :: n
    complex(dp), dimension(n) :: this
    integer :: i, me, numi

    me   = this_image()
    numi = num_images()

    sync all

    if(me ==  1) then
      do i = 2, numi
        vec(:)[1] = vec(:)[1] + vec(:)&lt;I&gt;
      end do
        this(:) = vec(:)
    end if

    sync all
    if(me /= 1) this(:) = vec(:)[1]
    sync all
end function&lt;/I&gt;&lt;/LI&gt;&lt;/PRE&gt;

&lt;P&gt;I compile this with ifort 15.0.1 using only -coarray.&lt;/P&gt;

&lt;P&gt;When I use a big matrix the more images I use the matvec routine will slow down.&lt;/P&gt;

&lt;P&gt;I did a basic hotspot analysis with vtune and it says that&lt;/P&gt;

&lt;P&gt;ICAF_BARRIER and ICAF_UNLOC are the code segments which need the most time.&lt;/P&gt;

&lt;P&gt;Thank you jan&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2015 15:36:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999959#M103390</guid>
      <dc:creator>Jan_W_2</dc:creator>
      <dc:date>2015-05-13T15:36:22Z</dc:date>
    </item>
    <item>
      <title>What happens if you replace:</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999960#M103391</link>
      <description>&lt;P&gt;What happens if you replace:&lt;/P&gt;

&lt;P&gt;vec(:)[1] = vec(:)[1] + vec(:)&lt;I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;with:&lt;/P&gt;

&lt;P&gt;vec(:) = vec(:) + vec(:)&lt;I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;?&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2015 17:47:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999960#M103391</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2015-05-13T17:47:00Z</dc:date>
    </item>
    <item>
      <title>I tried it but it doen't help</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999961#M103392</link>
      <description>&lt;P&gt;I tried it but it doen't help ... the calculation speed is as before.&lt;/P&gt;

&lt;P&gt;Thank you&lt;/P&gt;

&lt;P&gt;Jan&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2015 06:23:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999961#M103392</guid>
      <dc:creator>Jan_W_2</dc:creator>
      <dc:date>2015-05-14T06:23:00Z</dc:date>
    </item>
    <item>
      <title>Is VTune telling you where</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999962#M103393</link>
      <description>&lt;P&gt;Is VTune telling you where those synchronization calls are coming from in your code? Trace Analyzer and Collector's timeline display can be helpful in understanding what is happening.&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2015 12:17:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999962#M103393</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2015-05-14T12:17:38Z</dc:date>
    </item>
    <item>
      <title>Dear Steve,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999963#M103394</link>
      <description>&lt;P&gt;Dear Steve,&lt;/P&gt;

&lt;P&gt;I checked the Vtune analysis again, and it is telling me that the barriers are in the globalSum_serial function.&lt;/P&gt;

&lt;P&gt;I also checked the times how long each image needs for summing. They all need more or less the same time.&lt;/P&gt;

&lt;P&gt;Thank you,&lt;/P&gt;

&lt;P&gt;Jan&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 May 2015 09:14:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999963#M103394</guid>
      <dc:creator>Jan_W_2</dc:creator>
      <dc:date>2015-05-25T09:14:01Z</dc:date>
    </item>
    <item>
      <title>What you are doing in your</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999964#M103395</link>
      <description>&lt;P&gt;What you are doing in your globalsum procedure is a cross-image reduction; while this is formally correct, it is quite inefficient. The statement that is particularly inefficient is the last communication statement&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;if&lt;/CODE&gt;&lt;CODE&gt;(me /= 1) this(:) = vec(:)[1]&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;which oversubscribes the network link to image 1. The only "good" solution to this is using a collective call, which presently is not yet defined for coarray Fortran (but hopefully soon will be). For now, I think using MPI_Allreduce in its place should work (some MPI boilerplate may be needed). The alternative would be to implement the reduction manually, using all images (e.g. with a butterfly communication pattern) to reduce the amount of synchronization and avoid oversubscription.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Cheers&lt;/P&gt;

&lt;P&gt;Reinhold&lt;/P&gt;</description>
      <pubDate>Wed, 27 May 2015 09:26:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Coarray-sync-problems/m-p/999964#M103395</guid>
      <dc:creator>reinhold-bader</dc:creator>
      <dc:date>2015-05-27T09:26:51Z</dc:date>
    </item>
  </channel>
</rss>

