Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29300 Discussions

ifort 13.0, 14.0 coarray extremly slow read/write between nodes

AShte
Beginner
1,929 Views

This is my test code: $ cat ca_check.f90 program z implicit none integer :: x(10)

  • , img, nimgs, i real :: time1, time2 img = this_image() nimgs = num_images() x = img if (img .eq. 1) then do i=1,nimgs call cpu_time(time1) x = x(:) call cpu_time(time2) write (*,"(a,f)") "Remote read took, s : ", time2-time1 call cpu_time(time1) x(:) = x call cpu_time(time2) write (*,"(a,f)") "Remote write took, s : ", time2-time1 write (*,"(99999(i0,tr1))") x end do end if sync all write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok" end program z $ Compiled with: ifort -o ca_check.xcack ca_check.f90 -coarray=distributed -coarray-config-file=ca.conf -debug full -warn all $ cat ca.conf -envall -n 64 ./ca_check.xcack $ $ cat zpbs #!/bin/sh #PBS -l walltime=00:01:00,nodes=4:ppn=16 #PBS -j oe #PBS -m abe cd $HOME/nobackup/cgpack/branches/coarray/tests echo "LD_LIBRARY_PATH: " $LD_LIBRARY_PATH > zzz echo "which mpirun: " `which mpirun` >> zzz export I_MPI_DAPL_PROVIDER=ofa-v2-ib0 mpdboot --rsh=ssh --file=$PBS_NODEFILE -n 4 mpdtrace -l >> zzz cm-launcher ./ca_check.xcack >> zzz mpdallexit $ $ cat zzz LD_LIBRARY_PATH: /cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64 which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun node32-035_47536 (10.131.0.179) node33-002_50475 (10.131.0.98) node33-003_55287 (10.131.0.99) node34-006_42324 (10.131.0.54) Remote read took, s : 0.0010000 Remote write took, s : 0.0000000 1 1 1 1 1 1 1 1 1 1 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 3 3 3 3 3 3 3 3 3 3 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 5 5 5 5 5 5 5 5 5 5 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 7 7 7 7 7 7 7 7 7 7 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 9 9 9 9 9 9 9 9 9 9 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 10 10 10 10 10 10 10 10 10 10 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 11 11 11 11 11 11 11 11 11 11 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 12 12 12 12 12 12 12 12 12 12 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 13 13 13 13 13 13 13 13 13 13 Remote read took, s : 0.0009990 Remote write took, s : 0.0000000 14 14 14 14 14 14 14 14 14 14 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 15 15 15 15 15 15 15 15 15 15 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 16 16 16 16 16 16 16 16 16 16 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 17 17 17 17 17 17 17 17 17 17 Remote read took, s : 13.3259735 Remote write took, s : 12.9360342 18 18 18 18 18 18 18 18 18 18 Remote read took, s : 13.8728924 Remote write took, s : 12.5950813 19 19 19 19 19 19 19 19 19 19 Remote read took, s : 14.5117950 Remote write took, s : 12.9060364 20 20 20 20 20 20 20 20 20 20 $ Note that: - values read from processors 2,4,6,8 are just wrong. They are all zero, but must be equal to the processor number. - There are 16 cores in a node. Read/write to/from the first 16 processors are very fast, <1us. Read/write to/from processor 17, which probably is the first processor in another node, is still fast, but every other processor beyond that takes over 10 seconds for read or write. I've checked with both 13.0 and 14.0. I'm happy to provide further details of MPI setup. Thanks Anton
  • 0 Kudos
    11 Replies
    AShte
    Beginner
    1,929 Views
    The format is all wrong. I'll try again. The problem: remote read or write operations across node boundaries take over 10 sec! The code: $ cat z.f90 program z implicit none integer :: x(10)
  • , img, nimgs, i real :: time1, time2 img = this_image() nimgs = num_images() x = img if (img .eq. 1) then do i=1,nimgs call cpu_time(time1) x = x(:) call cpu_time(time2) write (*,"(a,f)") "Remote read took, s : ", time2-time1 call cpu_time(time1) x(:) = x call cpu_time(time2) write (*,"(a,f)") "Remote write took, s : ", time2-time1 write (*,"(99999(i0,tr1))") x end do end if sync all write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok" end program z $ The purpose of the code is to time remote read/write, and to check the correctness of the answer. I have 16 cores per node. I'm running on 4 nodes, 64 cores. Compilation and linking: ifort -o z.x z.f90 -coarray=distributed -coarray-config-file=ca.conf -debug full -warn all $ cat ca.conf -envall -n 64 ./z.x $ PBS job submission script: $ cat zpbs #!/bin/sh #PBS -l walltime=00:01:00,nodes=4:ppn=16 #PBS -j oe #PBS -m abe echo "LD_LIBRARY_PATH: " $LD_LIBRARY_PATH > zzz echo "which mpirun: " `which mpirun` >> zzz export I_MPI_DAPL_PROVIDER=ofa-v2-ib0 mpdboot --rsh=ssh --file=$PBS_NODEFILE -n 4 mpdtrace -l >> zzz cm-launcher ./z.x >> zzz mpdallexit $ And this is the result. Note that read/write time goes from under 1us to over 10s. Also note the results from images 2,4,6,8 are wrong. These are all zeros, but should have been equal to image number. $ cat zzz LD_LIBRARY_PATH: /cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64 which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun node32-035_47536 (10.131.0.179) node33-002_50475 (10.131.0.98) node33-003_55287 (10.131.0.99) node34-006_42324 (10.131.0.54) Remote read took, s : 0.0010000 Remote write took, s : 0.0000000 1 1 1 1 1 1 1 1 1 1 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 3 3 3 3 3 3 3 3 3 3 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 5 5 5 5 5 5 5 5 5 5 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 7 7 7 7 7 7 7 7 7 7 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 9 9 9 9 9 9 9 9 9 9 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 10 10 10 10 10 10 10 10 10 10 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 11 11 11 11 11 11 11 11 11 11 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 12 12 12 12 12 12 12 12 12 12 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 13 13 13 13 13 13 13 13 13 13 Remote read took, s : 0.0009990 Remote write took, s : 0.0000000 14 14 14 14 14 14 14 14 14 14 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 15 15 15 15 15 15 15 15 15 15 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 16 16 16 16 16 16 16 16 16 16 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 17 17 17 17 17 17 17 17 17 17 Remote read took, s : 13.3259735 Remote write took, s : 12.9360342 18 18 18 18 18 18 18 18 18 18 Remote read took, s : 13.8728924 Remote write took, s : 12.5950813 19 19 19 19 19 19 19 19 19 19 Remote read took, s : 14.5117950 Remote write took, s : 12.9060364 20 20 20 20 20 20 20 20 20 20 Here the allocated time for the job finished, otherwise the remaining 44 images would presumably do their read/write too, but it takes too long to wait. Thanks Anton
  • 0 Kudos
    jimdempseyatthecove
    Honored Contributor III
    1,929 Views

    Insert a SYNC ALL after x=img and see what happens.

    Jim Dempsey

    0 Kudos
    AShte
    Beginner
    1,929 Views
    No, this makes no difference. The modified code: program z implicit none integer :: x(10)
  • , img, nimgs, i real :: time1, time2 img = this_image() nimgs = num_images() x = img sync all if (img .eq. 1) then do i=1,nimgs call cpu_time(time1) x = x(:) call cpu_time(time2) write (*,"(a,f)") "Remote read took, s : ", time2-time1 call cpu_time(time1) x(:) = x call cpu_time(time2) write (*,"(a,f)") "Remote write took, s : ", time2-time1 write (*,"(99999(i0,tr1))") x end do end if sync all write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok" end program z At runtime I still get enourmous times for remote read and write calls: Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 12 12 12 12 12 12 12 12 12 12 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 13 13 13 13 13 13 13 13 13 13 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 14 14 14 14 14 14 14 14 14 14 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 15 15 15 15 15 15 15 15 15 15 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 16 16 16 16 16 16 16 16 16 16 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 17 17 17 17 17 17 17 17 17 17 Remote read took, s : 16.4365025 Remote write took, s : 13.6949177 18 18 18 18 18 18 18 18 18 18 Remote read took, s : 15.1436958 Remote write took, s : 13.7209167 19 19 19 19 19 19 19 19 19 19 Remote read took, s : 16.4264984 Remote write took, s : 13.6939240 20 20 20 20 20 20 20 20 20 20 Remote read took, s : 15.7575989 Remote write took, s : 13.6139297 21 21 21 21 21 21 21 21 21 21 Remote read took, s : 13.9138794 Remote write took, s : 13.7969055 22 22 22 22 22 22 22 22 22 22 Perhaps something is wrong with MPI setup? Anything else I could check? Thanks
  • 0 Kudos
    Ron_Green
    Moderator
    1,929 Views

    Anton,

    I'll take a look at this.  What compiler version are you using?

    ron

    0 Kudos
    AShte
    Beginner
    1,929 Views
    I tried 13.0.1 20121010 14.0.0 20130728 Thank you
    0 Kudos
    AShte
    Beginner
    1,929 Views

    Ron, any progress?

    Thanks

    Anton

    0 Kudos
    Ron_Green
    Moderator
    1,929 Views

    Our mutual friend Stephen reminded me to revisit this post.

    First, a little status on Intel's CAF implementation:  Our initial goal was to get a functional CAF implementation that conforms strictly to the Standard.  Performance has not been fully addressed at this time and may take a while to get to acceptable levels for production purposes - particularly for distributed memory systems.

    But next, I see some errors in the test and question what it is you're timing.  In particular, let's visit correctness first.  Removing the timing and writes you have this code on Image 1:

    do i=1,imgs
    
      x = x(:)
    
      x(:) = x
    
    end do

    The problem here - CAF remote reads/writes are inherently asychronous or 1-sided.  So the value of X from the read may not have been completed by the time you use X in the next statement on the RHS of the assignment.  So the results are unpredictable.   And a minor point, for Image 1, do you want to test self-read and write (the do loop goes from 1 to imgs, but do we care about image 1 reading/writing itself in shared memory? ).  What I think you want is something like a neighbor exchange, something like this for the read maybe?:

    do i = 2,imgs
      sync all
      if ( img = 1 ) then
         !...start timer here
         x = x(:)
         sync images(i) !...wait for remote read to complete
         !...finish timer here, print result?
      else if ( img = i ) then
         sync images(1) !...sync point with image 1
      end if
    end do

    Remembering that image control statements (like the SYNC IMAGES) imply a SYNC MEMORY.  Maybe I should have used SYNC MEMORY instead, but so it goes.

    So the next question is, what do you want to time?  Do you want find the time for the data transfer as we're doing above?  Or do you want to time how long the statement takes to see if it's true asynchronous or synchronous or just darn inefficient?   In the above we're also capturing the time for the 1-1 synchronization, so it's not a good measure of throughput.   Also, note I had a SYNC ALL at the top of the loop to make sure all the images execute the I iterations in lock step.  Thinking of this, I believe it would be OK to remove that.  Then each remote image would quickly drop into the SYNC IMAGES(1) and be waiting for image 1's SYNC IMAGES(img).  That would be faster, obviously.

    Tricky stuff.  I might suggest rethinking this experiment to see if we can derive a better test.  ALSO, don't use cpu_time as it gathers the sum of thread times for the process, which with threads running in background to do the IO might give too much time.  I use a wall-clock instead like this contained procedure mytime() :

    program foo
    use ISO_FORTRAN_ENV
    implicit none
    integer, parameter :: dp = REAL64
    real (kind=dp) :: tstart, tstop, ttime
    
    !... ready to time a block of code
    tstart = mytime()
    !...do something
    tstop = mytime()
    ttime = tstop - tstart
    
    contains
      function mytime()  result (tseconds)
        real (dp)       :: tseconds
        integer (INT64) ::  count, count_rate, count_max
        real (dp)       :: tsec, rate
    
        CALL SYSTEM_CLOCK(count, count_rate, count_max)
    
        tsec = count
        rate = count_rate
        tseconds = tsec / rate
      end function mytime 
    
    end program foo

     

     

    0 Kudos
    AShte
    Beginner
    1,929 Views
    Ron Sorry for the delay. I bothered several people here in Bristol, including Jim Cownie, but got nowhere, and then have given up on this issue. Hence I missed your reply. So thank you for your help. 1. Stephen who? 2. I disagree with your statement that the result of this code is unpredictable: do i=1,imgs x = x(:) x(:) = x end do If you bear in mind that this fragment is executing only on one image, and the fragment is within a single segment, it must be executed in order. I confirmed this with Dan Nagle on comp.lang.fortran. So really the fragment must include: sync all if ( img .eq. 1 ) then do i = 2, nimgs x = x(:) x(:) = x end do end if sync all Do you agree? 3. I'm happy with your timer, so my complete program is now: program z use iso_fortran_env implicit none integer, parameter :: dp = real64 real( kind=dp) :: time1, time2 integer :: x(10)
  • , img, nimgs, i img = this_image() nimgs = num_images() x = img sync all if ( img .eq. 1 ) then do i = 2, nimgs time1 = mytime() x = x(:) time2 = mytime() write (*,"(a,g)") "Remote read took, s : ", time2-time1 time1 = mytime() x(:) = x time2 = mytime() write (*,"(a,g)") "Remote write took, s : ", time2-time1 write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:) end do end if sync all write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok" contains function mytime() result (tseconds) real( dp ) :: tseconds integer( INT64 ) :: tsec, trate CALL SYSTEM_CLOCK( count=tsec, count_rate=trate ) tseconds = real(tsec,kind=dp) / real(trate,kind=dp) end function mytime end program z HOwever, the performance is still the same: LD_LIBRARY_PATH: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mpirt/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/../compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mkl/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/tbb/lib/intel64/gcc4.4:/cm/shared/apps/ParaView-4.0.1/ParaView-4.0.1-Linux-64bit/lib:/cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/lib:/cm/shared/languages/Intel-Compiler-XE-14/lib/intel64:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64 which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun node43-037_33944 (10.131.1.97) node43-038_56186 (10.131.1.98) node43-039_35896 (10.131.1.99) node43-040_49240 (10.131.1.100) Remote read took, s : .2717971801757812E-04 Remote write took, s : .7009506225585938E-04 img: 2x:2222222222 Remote read took, s : .9059906005859375E-05 Remote write took, s : .1382827758789062E-04 img: 3x:3333333333 Remote read took, s : .6914138793945312E-05 Remote write took, s : .1406669616699219E-04 img: 4x:4444444444 Remote read took, s : .5960464477539062E-05 Remote write took, s : .1192092895507812E-04 img: 5x:5555555555 Remote read took, s : .8106231689453125E-05 Remote write took, s : .1406669616699219E-04 img: 6x:6666666666 Remote read took, s : .8106231689453125E-05 Remote write took, s : .1406669616699219E-04 img: 7x:7777777777 Remote read took, s : .8106231689453125E-05 Remote write took, s : .1287460327148438E-04 img: 8x:8888888888 Remote read took, s : .8106231689453125E-05 Remote write took, s : .2098083496093750E-04 img: 9x:9999999999 Remote read took, s : .9059906005859375E-05 Remote write took, s : .2098083496093750E-04 img: 10x:10101010101010101010 Remote read took, s : .9059906005859375E-05 Remote write took, s : .2217292785644531E-04 img: 11x:11111111111111111111 Remote read took, s : .9059906005859375E-05 Remote write took, s : .2217292785644531E-04 img: 12x:12121212121212121212 Remote read took, s : .8821487426757812E-05 Remote write took, s : .2002716064453125E-04 img: 13x:13131313131313131313 Remote read took, s : .9059906005859375E-05 Remote write took, s : .2193450927734375E-04 img: 14x:14141414141414141414 Remote read took, s : .8106231689453125E-05 Remote write took, s : .2002716064453125E-04 img: 15x:15151515151515151515 Remote read took, s : .9059906005859375E-05 Remote write took, s : .2002716064453125E-04 img: 16x:16161616161616161616 Remote read took, s : .2694129943847656E-04 Remote write took, s : .1330375671386719E-03 img: 17x:17171717171717171717 Remote read took, s : 4.086822986602783 Remote write took, s : 13.64371013641357 img: 18x:18181818181818181818 Remote read took, s : 4.057008028030396 Remote write took, s : 13.62605905532837 img: 19x:19191919191919191919 Remote read took, s : 4.033457994461060 Remote write took, s : 13.60342288017273 img: 20x:20202020202020202020 Remote read took, s : 3.867400169372559 Remote write took, s : 13.55423808097839 img: 21x:21212121212121212121 Remote read took, s : 2.599767923355103 Remote write took, s : 13.53067493438721 img: 22x:22222222222222222222 Remote read took, s : 3.370637893676758 Remote write took, s : 13.62505698204041 img: 23x:23232323232323232323 Remote read took, s : 4.130011081695557 Remote write took, s : 13.79351282119751 img: 24x:24242424242424242424 Remote read took, s : 3.336811780929565 Remote write took, s : 13.72070097923279 img: 25x:25252525252525252525 Remote read took, s : 3.968912124633789 Remote write took, s : 13.58001303672791 img: 26x:26262626262626262626 Remote read took, s : 2.945718050003052 Remote write took, s : 13.59926700592041 img: 27x:27272727272727272727 Remote read took, s : 3.360033988952637 Remote write took, s : 13.64630603790283 img: 28x:28282828282828282828 Remote read took, s : 3.888566970825195 Remote write took, s : 13.63198804855347 img: 29x:29292929292929292929 Remote read took, s : 2.508543968200684 Remote write took, s : 13.61940813064575 img: 30x:30303030303030303030 Remote read took, s : 4.009042024612427 Remote write took, s : 13.61328911781311 img: 31x:31313131313131313131 Remote read took, s : 3.974460840225220 Remote write took, s : 13.60890007019043 img: 32x:32323232323232323232 Remote read took, s : .5388259887695312E-04 Remote write took, s : .1411437988281250E-03 img: 33x:33333333333333333333 Remote read took, s : .5396170616149902 Remote write took, s : 13.62038612365723 img: 34x:34343434343434343434 Remote read took, s : 3.770447015762329 At which point my 5 min, allocated to the job, which should have been more than enough for such a simple program, has run out. This was on 4 16-core nodes, i.e. 64 images in total. Let me know what you think. I'll now redo the timing test with your suggested modification, in case remote read/writes are indeed out of order, even though they are on the same image and within the same segment. Many thanks Anton
  • 0 Kudos
    AShte
    Beginner
    1,929 Views

    ok, maybe you are right. With your modification, the times are reasonable:

    LD_LIBRARY_PATH:  /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mpirt/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/../compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mkl/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/tbb/lib/intel64/gcc4.4:/cm/shared/apps/ParaView-4.0.1/ParaView-4.0.1-Linux-64bit/lib:/cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/lib:/cm/shared/languages/Intel-Compiler-XE-14/lib/intel64:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64
    which mpirun:  /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun
    node46-009_46215 (10.131.1.33)
    node46-012_41759 (10.131.1.36)
    node46-010_52398 (10.131.1.34)
    node46-011_56689 (10.131.1.35)
    Remote read took, s :     .1471042633056641E-03
    img: 2x:2222222222
    Remote write took, s :     .2789497375488281E-04
    img: 2x:2222222222
    Remote read took, s :     .7867813110351562E-05
    img: 3x:3333333333
    Remote write took, s :     .1621246337890625E-04
    img: 3x:3333333333
    Remote read took, s :     .9059906005859375E-05
    img: 4x:4444444444
    Remote write took, s :     .1502037048339844E-04
    img: 4x:4444444444
    Remote read took, s :     .9059906005859375E-05
    img: 5x:5555555555
    Remote write took, s :     .1478195190429688E-04
    img: 5x:5555555555
    Remote read took, s :     .9059906005859375E-05
    img: 6x:6666666666
    Remote write took, s :     .1502037048339844E-04
    img: 6x:6666666666
    Remote read took, s :     .8106231689453125E-05
    img: 7x:7777777777
    Remote write took, s :     .1502037048339844E-04
    img: 7x:7777777777
    Remote read took, s :     .9059906005859375E-05
    img: 8x:8888888888
    Remote write took, s :     .1502037048339844E-04
    img: 8x:8888888888
    Remote read took, s :     .1001358032226562E-04
    img: 9x:9999999999
    Remote write took, s :     .2384185791015625E-04
    img: 9x:9999999999
    Remote read took, s :     .1192092895507812E-04
    img: 10x:10101010101010101010
    Remote write took, s :     .2217292785644531E-04
    img: 10x:10101010101010101010
    Remote read took, s :     .1001358032226562E-04
    img: 11x:11111111111111111111
    Remote write took, s :     .2384185791015625E-04
    img: 11x:11111111111111111111
    Remote read took, s :     .1096725463867188E-04
    img: 12x:12121212121212121212
    Remote write took, s :     .2288818359375000E-04
    img: 12x:12121212121212121212
    Remote read took, s :     .1001358032226562E-04
    img: 13x:13131313131313131313
    Remote write took, s :     .2408027648925781E-04
    img: 13x:13131313131313131313
    Remote read took, s :     .8821487426757812E-05
    img: 14x:14141414141414141414
    Remote write took, s :     .2193450927734375E-04
    img: 14x:14141414141414141414
    Remote read took, s :     .9059906005859375E-05

    img: 15x:15151515151515151515
    Remote write took, s :     .2312660217285156E-04
    img: 15x:15151515151515151515
    Remote read took, s :     .1001358032226562E-04
    img: 16x:16161616161616161616
    Remote write took, s :     .2193450927734375E-04
    img: 16x:16161616161616161616
    Remote read took, s :     .1902410984039307
    img: 17x:17171717171717171717
    Remote write took, s :     .1580715179443359E-03
    img: 17x:17171717171717171717
    Remote read took, s :     .4100799560546875E-04
    img: 18x:18181818181818181818
    Remote write took, s :     .1480579376220703E-03
    img: 18x:18181818181818181818
    Remote read took, s :     .4005432128906250E-04
    img: 19x:19191919191919191919
    Remote write took, s :     .1471042633056641E-03
    img: 19x:19191919191919191919
    Remote read took, s :     .3600120544433594E-04
    img: 20x:20202020202020202020
    Remote write took, s :     .1480579376220703E-03
    img: 20x:20202020202020202020
    Remote read took, s :     .4196166992187500E-04
    img: 21x:21212121212121212121
    Remote write took, s :     .1480579376220703E-03
    img: 21x:21212121212121212121
    Remote read took, s :     .4315376281738281E-04
    img: 22x:22222222222222222222
    Remote write took, s :     .1480579376220703E-03
    img: 22x:22222222222222222222
    Remote read took, s :     .3886222839355469E-04
    img: 23x:23232323232323232323
    Remote write took, s :     .1478195190429688E-03
    img: 23x:23232323232323232323
    Remote read took, s :     .4196166992187500E-04
    img: 24x:24242424242424242424
    Remote write took, s :     .1480579376220703E-03
    img: 24x:24242424242424242424
    Remote read took, s :     .3504753112792969E-04
    img: 25x:25252525252525252525
    Remote write took, s :     .1192092895507812E-03
    img: 25x:25252525252525252525
    Remote read took, s :     .3695487976074219E-04
    img: 26x:26262626262626262626
    Remote write took, s :     .1280307769775391E-03
    img: 26x:26262626262626262626
    Remote read took, s :     .4291534423828125E-04
    img: 27x:27272727272727272727
    Remote write took, s :     .1170635223388672E-03
    img: 27x:27272727272727272727
    Remote read took, s :     .4315376281738281E-04
    img: 28x:28282828282828282828
    Remote write took, s :     .1189708709716797E-03
    img: 28x:28282828282828282828
    Remote read took, s :     .4291534423828125E-04
    img: 29x:29292929292929292929
    Remote write took, s :     .1189708709716797E-03
    img: 29x:29292929292929292929
    Remote read took, s :     .4291534423828125E-04
    img: 30x:30303030303030303030
    Remote write took, s :     .1161098480224609E-03
    img: 30x:30303030303030303030
    Remote read took, s :     .4196166992187500E-04
    img: 31x:31313131313131313131
    Remote write took, s :     .1189708709716797E-03
    img: 31x:31313131313131313131
    Remote read took, s :     .5912780761718750E-04
    img: 32x:32323232323232323232
    Remote write took, s :     .1139640808105469E-03
    img: 32x:32323232323232323232
    Remote read took, s :     .5888938903808594E-04

    img: 33x:33333333333333333333
    Remote write took, s :     .1428127288818359E-03
    img: 33x:33333333333333333333
    Remote read took, s :     .4220008850097656E-04
    img: 34x:34343434343434343434
    Remote write took, s :     .1418590545654297E-03
    img: 34x:34343434343434343434
    Remote read took, s :     .4601478576660156E-04
    img: 35x:35353535353535353535
    Remote write took, s :     .1418590545654297E-03
    img: 35x:35353535353535353535
    Remote read took, s :     .4506111145019531E-04
    img: 36x:36363636363636363636
    Remote write took, s :     .1428127288818359E-03
    img: 36x:36363636363636363636
    Remote read took, s :     .4386901855468750E-04
    img: 37x:37373737373737373737
    Remote write took, s :     .1420974731445312E-03
    img: 37x:37373737373737373737
    Remote read took, s :     .4506111145019531E-04
    img: 38x:38383838383838383838
    Remote write took, s :     .1409053802490234E-03
    img: 38x:38383838383838383838
    Remote read took, s :     .4792213439941406E-04
    img: 39x:39393939393939393939
    Remote write took, s :     .1471042633056641E-03
    img: 39x:39393939393939393939
    Remote read took, s :     .4196166992187500E-04
    img: 40x:40404040404040404040
    Remote write took, s :     .1418590545654297E-03
    img: 40x:40404040404040404040
    Remote read took, s :     .4887580871582031E-04
    img: 41x:41414141414141414141
    Remote write took, s :     .1149177551269531E-03
    img: 41x:41414141414141414141
    Remote read took, s :     .3910064697265625E-04
    img: 42x:42424242424242424242
    Remote write took, s :     .1330375671386719E-03
    img: 42x:42424242424242424242
    Remote read took, s :     .3790855407714844E-04
    img: 43x:43434343434343434343
    Remote write took, s :     .1099109649658203E-03
    img: 43x:43434343434343434343
    Remote read took, s :     .3910064697265625E-04
    img: 44x:44444444444444444444
    Remote write took, s :     .1099109649658203E-03
    img: 44x:44444444444444444444
    Remote read took, s :     .4220008850097656E-04
    img: 45x:45454545454545454545
    Remote write took, s :     .1099109649658203E-03
    img: 45x:45454545454545454545
    Remote read took, s :     .4291534423828125E-04
    img: 46x:46464646464646464646
    Remote write took, s :     .1101493835449219E-03
    img: 46x:46464646464646464646
    Remote read took, s :     .4100799560546875E-04
    img: 47x:47474747474747474747
    Remote write took, s :     .1111030578613281E-03
    img: 47x:47474747474747474747
    Remote read took, s :     .4386901855468750E-04
    img: 48x:48484848484848484848
    Remote write took, s :     .1118183135986328E-03
    img: 48x:48484848484848484848
    Remote read took, s :     .5507469177246094E-04
    img: 49x:49494949494949494949
    Remote write took, s :     .1418590545654297E-03
    img: 49x:49494949494949494949
    Remote read took, s :     .4196166992187500E-04
    img: 50x:50505050505050505050
    Remote write took, s :     .1440048217773438E-03
    img: 50x:50505050505050505050
    Remote read took, s :     .4816055297851562E-04

    Remote read took, s :     .4816055297851562E-04
    img: 51x:51515151515151515151
    Remote write took, s :     .1430511474609375E-03
    img: 51x:51515151515151515151
    Remote read took, s :     .4506111145019531E-04
    img: 52x:52525252525252525252
    Remote write took, s :     .1418590545654297E-03
    img: 52x:52525252525252525252
    Remote read took, s :     .4506111145019531E-04
    img: 53x:53535353535353535353
    Remote write took, s :     .1420974731445312E-03
    img: 53x:53535353535353535353
    Remote read took, s :     .4386901855468750E-04
    img: 54x:54545454545454545454
    Remote write took, s :     .1428127288818359E-03
    img: 54x:54545454545454545454
    Remote read took, s :     .4506111145019531E-04
    img: 55x:55555555555555555555
    Remote write took, s :     .1440048217773438E-03
    img: 55x:55555555555555555555
    Remote read took, s :     .4696846008300781E-04
    img: 56x:56565656565656565656
    Remote write took, s :     .1420974731445312E-03
    img: 56x:56565656565656565656
    Remote read took, s :     .4506111145019531E-04
    img: 57x:57575757575757575757
    Remote write took, s :     .1139640808105469E-03
    img: 57x:57575757575757575757
    Remote read took, s :     .4506111145019531E-04
    img: 58x:58585858585858585858
    Remote write took, s :     .1301765441894531E-03
    img: 58x:58585858585858585858
    Remote read took, s :     .4196166992187500E-04
    img: 59x:59595959595959595959
    Remote write took, s :     .1120567321777344E-03
    img: 59x:59595959595959595959
    Remote read took, s :     .3790855407714844E-04
    img: 60x:60606060606060606060
    Remote write took, s :     .1120567321777344E-03
    img: 60x:60606060606060606060
    Remote read took, s :     .3886222839355469E-04
    img: 61x:61616161616161616161
    Remote write took, s :     .1130104064941406E-03
    img: 61x:61616161616161616161
    Remote read took, s :     .4100799560546875E-04
    img: 62x:62626262626262626262
    Remote write took, s :     .1130104064941406E-03
    img: 62x:62626262626262626262
    Remote read took, s :     .4100799560546875E-04
    img: 63x:63636363636363636363
    Remote write took, s :     .1120567321777344E-03
    img: 63x:63636363636363636363
    Remote read took, s :     .4196166992187500E-04
    img: 64x:64646464646464646464
    Remote write took, s :     .1130104064941406E-03
    img: 64x:64646464646464646464

     

    The fragment in question was this:

     

    sync all

    do i = 2, nimgs
      if ( img .eq. 1 ) then
        time1 = mytime()
        x = x(:)
        sync images ( i )
        time2 = mytime()
        write (*,"(a,g)") "Remote read took, s : ", time2-time1
        write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)
      else if ( img .eq. i ) then
        sync images( 1 )
      end if

      if ( img .eq. 1 ) then
        time1 = mytime()
        x(:) = x
        sync images ( i )
        time2 = mytime()
        write (*,"(a,g)") "Remote write took, s : ", time2-time1
        write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)
      else if ( img .eq. i ) then
        sync images( 1 )
      end if
    end do

    sync all

     

    Thanks

    Anton

     

     

     

     

    0 Kudos
    AShte
    Beginner
    1,929 Views
    Several people, who are all on the Fortran standardisation committee, confirmed in comp.lang.fortran and comp-fortran-90@jiscmail.ac.uk that Ron's interpretation of one-sided read/write is incorrect. I therefore think the problem described in my report is a compiler bug. I think a bug report should be opened on this issue, but I don't know how do this. Many thanks Anton
    0 Kudos
    Steven_L_Intel1
    Employee
    1,929 Views

    Ron is out of the office today - I've asked him to revisit this when he returns. I saw Bill Long's explanation.

    0 Kudos
    Reply