- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is my test code: $ cat ca_check.f90 program z implicit none integer :: x(10)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Insert a SYNC ALL after x=img and see what happens.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Anton,
I'll take a look at this. What compiler version are you using?
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ron, any progress?
Thanks
Anton
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our mutual friend Stephen reminded me to revisit this post.
First, a little status on Intel's CAF implementation: Our initial goal was to get a functional CAF implementation that conforms strictly to the Standard. Performance has not been fully addressed at this time and may take a while to get to acceptable levels for production purposes - particularly for distributed memory systems.
But next, I see some errors in the test and question what it is you're timing. In particular, let's visit correctness first. Removing the timing and writes you have this code on Image 1:
do i=1,imgs x = x(:) x(:) = x end do
The problem here - CAF remote reads/writes are inherently asychronous or 1-sided. So the value of X from the read may not have been completed by the time you use X in the next statement on the RHS of the assignment. So the results are unpredictable. And a minor point, for Image 1, do you want to test self-read and write (the do loop goes from 1 to imgs, but do we care about image 1 reading/writing itself in shared memory? ). What I think you want is something like a neighbor exchange, something like this for the read maybe?:
do i = 2,imgs sync all if ( img = 1 ) then !...start timer here x = x(:) sync images(i) !...wait for remote read to complete !...finish timer here, print result? else if ( img = i ) then sync images(1) !...sync point with image 1 end if end do
Remembering that image control statements (like the SYNC IMAGES) imply a SYNC MEMORY. Maybe I should have used SYNC MEMORY instead, but so it goes.
So the next question is, what do you want to time? Do you want find the time for the data transfer as we're doing above? Or do you want to time how long the statement takes to see if it's true asynchronous or synchronous or just darn inefficient? In the above we're also capturing the time for the 1-1 synchronization, so it's not a good measure of throughput. Also, note I had a SYNC ALL at the top of the loop to make sure all the images execute the I iterations in lock step. Thinking of this, I believe it would be OK to remove that. Then each remote image would quickly drop into the SYNC IMAGES(1) and be waiting for image 1's SYNC IMAGES(img). That would be faster, obviously.
Tricky stuff. I might suggest rethinking this experiment to see if we can derive a better test. ALSO, don't use cpu_time as it gathers the sum of thread times for the process, which with threads running in background to do the IO might give too much time. I use a wall-clock instead like this contained procedure mytime() :
program foo use ISO_FORTRAN_ENV implicit none integer, parameter :: dp = REAL64 real (kind=dp) :: tstart, tstop, ttime !... ready to time a block of code tstart = mytime() !...do something tstop = mytime() ttime = tstop - tstart contains function mytime() result (tseconds) real (dp) :: tseconds integer (INT64) :: count, count_rate, count_max real (dp) :: tsec, rate CALL SYSTEM_CLOCK(count, count_rate, count_max) tsec = count rate = count_rate tseconds = tsec / rate end function mytime end program foo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, maybe you are right. With your modification, the times are reasonable:
LD_LIBRARY_PATH: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mpirt/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/../compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mkl/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/tbb/lib/intel64/gcc4.4:/cm/shared/apps/ParaView-4.0.1/ParaView-4.0.1-Linux-64bit/lib:/cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/lib:/cm/shared/languages/Intel-Compiler-XE-14/lib/intel64:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64
which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun
node46-009_46215 (10.131.1.33)
node46-012_41759 (10.131.1.36)
node46-010_52398 (10.131.1.34)
node46-011_56689 (10.131.1.35)
Remote read took, s : .1471042633056641E-03
img: 2x:2222222222
Remote write took, s : .2789497375488281E-04
img: 2x:2222222222
Remote read took, s : .7867813110351562E-05
img: 3x:3333333333
Remote write took, s : .1621246337890625E-04
img: 3x:3333333333
Remote read took, s : .9059906005859375E-05
img: 4x:4444444444
Remote write took, s : .1502037048339844E-04
img: 4x:4444444444
Remote read took, s : .9059906005859375E-05
img: 5x:5555555555
Remote write took, s : .1478195190429688E-04
img: 5x:5555555555
Remote read took, s : .9059906005859375E-05
img: 6x:6666666666
Remote write took, s : .1502037048339844E-04
img: 6x:6666666666
Remote read took, s : .8106231689453125E-05
img: 7x:7777777777
Remote write took, s : .1502037048339844E-04
img: 7x:7777777777
Remote read took, s : .9059906005859375E-05
img: 8x:8888888888
Remote write took, s : .1502037048339844E-04
img: 8x:8888888888
Remote read took, s : .1001358032226562E-04
img: 9x:9999999999
Remote write took, s : .2384185791015625E-04
img: 9x:9999999999
Remote read took, s : .1192092895507812E-04
img: 10x:10101010101010101010
Remote write took, s : .2217292785644531E-04
img: 10x:10101010101010101010
Remote read took, s : .1001358032226562E-04
img: 11x:11111111111111111111
Remote write took, s : .2384185791015625E-04
img: 11x:11111111111111111111
Remote read took, s : .1096725463867188E-04
img: 12x:12121212121212121212
Remote write took, s : .2288818359375000E-04
img: 12x:12121212121212121212
Remote read took, s : .1001358032226562E-04
img: 13x:13131313131313131313
Remote write took, s : .2408027648925781E-04
img: 13x:13131313131313131313
Remote read took, s : .8821487426757812E-05
img: 14x:14141414141414141414
Remote write took, s : .2193450927734375E-04
img: 14x:14141414141414141414
Remote read took, s : .9059906005859375E-05
img: 15x:15151515151515151515
Remote write took, s : .2312660217285156E-04
img: 15x:15151515151515151515
Remote read took, s : .1001358032226562E-04
img: 16x:16161616161616161616
Remote write took, s : .2193450927734375E-04
img: 16x:16161616161616161616
Remote read took, s : .1902410984039307
img: 17x:17171717171717171717
Remote write took, s : .1580715179443359E-03
img: 17x:17171717171717171717
Remote read took, s : .4100799560546875E-04
img: 18x:18181818181818181818
Remote write took, s : .1480579376220703E-03
img: 18x:18181818181818181818
Remote read took, s : .4005432128906250E-04
img: 19x:19191919191919191919
Remote write took, s : .1471042633056641E-03
img: 19x:19191919191919191919
Remote read took, s : .3600120544433594E-04
img: 20x:20202020202020202020
Remote write took, s : .1480579376220703E-03
img: 20x:20202020202020202020
Remote read took, s : .4196166992187500E-04
img: 21x:21212121212121212121
Remote write took, s : .1480579376220703E-03
img: 21x:21212121212121212121
Remote read took, s : .4315376281738281E-04
img: 22x:22222222222222222222
Remote write took, s : .1480579376220703E-03
img: 22x:22222222222222222222
Remote read took, s : .3886222839355469E-04
img: 23x:23232323232323232323
Remote write took, s : .1478195190429688E-03
img: 23x:23232323232323232323
Remote read took, s : .4196166992187500E-04
img: 24x:24242424242424242424
Remote write took, s : .1480579376220703E-03
img: 24x:24242424242424242424
Remote read took, s : .3504753112792969E-04
img: 25x:25252525252525252525
Remote write took, s : .1192092895507812E-03
img: 25x:25252525252525252525
Remote read took, s : .3695487976074219E-04
img: 26x:26262626262626262626
Remote write took, s : .1280307769775391E-03
img: 26x:26262626262626262626
Remote read took, s : .4291534423828125E-04
img: 27x:27272727272727272727
Remote write took, s : .1170635223388672E-03
img: 27x:27272727272727272727
Remote read took, s : .4315376281738281E-04
img: 28x:28282828282828282828
Remote write took, s : .1189708709716797E-03
img: 28x:28282828282828282828
Remote read took, s : .4291534423828125E-04
img: 29x:29292929292929292929
Remote write took, s : .1189708709716797E-03
img: 29x:29292929292929292929
Remote read took, s : .4291534423828125E-04
img: 30x:30303030303030303030
Remote write took, s : .1161098480224609E-03
img: 30x:30303030303030303030
Remote read took, s : .4196166992187500E-04
img: 31x:31313131313131313131
Remote write took, s : .1189708709716797E-03
img: 31x:31313131313131313131
Remote read took, s : .5912780761718750E-04
img: 32x:32323232323232323232
Remote write took, s : .1139640808105469E-03
img: 32x:32323232323232323232
Remote read took, s : .5888938903808594E-04
img: 33x:33333333333333333333
Remote write took, s : .1428127288818359E-03
img: 33x:33333333333333333333
Remote read took, s : .4220008850097656E-04
img: 34x:34343434343434343434
Remote write took, s : .1418590545654297E-03
img: 34x:34343434343434343434
Remote read took, s : .4601478576660156E-04
img: 35x:35353535353535353535
Remote write took, s : .1418590545654297E-03
img: 35x:35353535353535353535
Remote read took, s : .4506111145019531E-04
img: 36x:36363636363636363636
Remote write took, s : .1428127288818359E-03
img: 36x:36363636363636363636
Remote read took, s : .4386901855468750E-04
img: 37x:37373737373737373737
Remote write took, s : .1420974731445312E-03
img: 37x:37373737373737373737
Remote read took, s : .4506111145019531E-04
img: 38x:38383838383838383838
Remote write took, s : .1409053802490234E-03
img: 38x:38383838383838383838
Remote read took, s : .4792213439941406E-04
img: 39x:39393939393939393939
Remote write took, s : .1471042633056641E-03
img: 39x:39393939393939393939
Remote read took, s : .4196166992187500E-04
img: 40x:40404040404040404040
Remote write took, s : .1418590545654297E-03
img: 40x:40404040404040404040
Remote read took, s : .4887580871582031E-04
img: 41x:41414141414141414141
Remote write took, s : .1149177551269531E-03
img: 41x:41414141414141414141
Remote read took, s : .3910064697265625E-04
img: 42x:42424242424242424242
Remote write took, s : .1330375671386719E-03
img: 42x:42424242424242424242
Remote read took, s : .3790855407714844E-04
img: 43x:43434343434343434343
Remote write took, s : .1099109649658203E-03
img: 43x:43434343434343434343
Remote read took, s : .3910064697265625E-04
img: 44x:44444444444444444444
Remote write took, s : .1099109649658203E-03
img: 44x:44444444444444444444
Remote read took, s : .4220008850097656E-04
img: 45x:45454545454545454545
Remote write took, s : .1099109649658203E-03
img: 45x:45454545454545454545
Remote read took, s : .4291534423828125E-04
img: 46x:46464646464646464646
Remote write took, s : .1101493835449219E-03
img: 46x:46464646464646464646
Remote read took, s : .4100799560546875E-04
img: 47x:47474747474747474747
Remote write took, s : .1111030578613281E-03
img: 47x:47474747474747474747
Remote read took, s : .4386901855468750E-04
img: 48x:48484848484848484848
Remote write took, s : .1118183135986328E-03
img: 48x:48484848484848484848
Remote read took, s : .5507469177246094E-04
img: 49x:49494949494949494949
Remote write took, s : .1418590545654297E-03
img: 49x:49494949494949494949
Remote read took, s : .4196166992187500E-04
img: 50x:50505050505050505050
Remote write took, s : .1440048217773438E-03
img: 50x:50505050505050505050
Remote read took, s : .4816055297851562E-04
Remote read took, s : .4816055297851562E-04
img: 51x:51515151515151515151
Remote write took, s : .1430511474609375E-03
img: 51x:51515151515151515151
Remote read took, s : .4506111145019531E-04
img: 52x:52525252525252525252
Remote write took, s : .1418590545654297E-03
img: 52x:52525252525252525252
Remote read took, s : .4506111145019531E-04
img: 53x:53535353535353535353
Remote write took, s : .1420974731445312E-03
img: 53x:53535353535353535353
Remote read took, s : .4386901855468750E-04
img: 54x:54545454545454545454
Remote write took, s : .1428127288818359E-03
img: 54x:54545454545454545454
Remote read took, s : .4506111145019531E-04
img: 55x:55555555555555555555
Remote write took, s : .1440048217773438E-03
img: 55x:55555555555555555555
Remote read took, s : .4696846008300781E-04
img: 56x:56565656565656565656
Remote write took, s : .1420974731445312E-03
img: 56x:56565656565656565656
Remote read took, s : .4506111145019531E-04
img: 57x:57575757575757575757
Remote write took, s : .1139640808105469E-03
img: 57x:57575757575757575757
Remote read took, s : .4506111145019531E-04
img: 58x:58585858585858585858
Remote write took, s : .1301765441894531E-03
img: 58x:58585858585858585858
Remote read took, s : .4196166992187500E-04
img: 59x:59595959595959595959
Remote write took, s : .1120567321777344E-03
img: 59x:59595959595959595959
Remote read took, s : .3790855407714844E-04
img: 60x:60606060606060606060
Remote write took, s : .1120567321777344E-03
img: 60x:60606060606060606060
Remote read took, s : .3886222839355469E-04
img: 61x:61616161616161616161
Remote write took, s : .1130104064941406E-03
img: 61x:61616161616161616161
Remote read took, s : .4100799560546875E-04
img: 62x:62626262626262626262
Remote write took, s : .1130104064941406E-03
img: 62x:62626262626262626262
Remote read took, s : .4100799560546875E-04
img: 63x:63636363636363636363
Remote write took, s : .1120567321777344E-03
img: 63x:63636363636363636363
Remote read took, s : .4196166992187500E-04
img: 64x:64646464646464646464
Remote write took, s : .1130104064941406E-03
img: 64x:64646464646464646464
The fragment in question was this:
sync all
do i = 2, nimgs
if ( img .eq. 1 ) then
time1 = mytime()
x = x(:)
sync images ( i )
time2 = mytime()
write (*,"(a,g)") "Remote read took, s : ", time2-time1
write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)
else if ( img .eq. i ) then
sync images( 1 )
end if
if ( img .eq. 1 ) then
time1 = mytime()
x(:) = x
sync images ( i )
time2 = mytime()
write (*,"(a,g)") "Remote write took, s : ", time2-time1
write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)
else if ( img .eq. i ) then
sync images( 1 )
end if
end do
sync all
Thanks
Anton
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ron is out of the office today - I've asked him to revisit this when he returns. I saw Bill Long's explanation.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page