- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in my quest to find a solution for the problem I posted in the thread "Internal compiler error with lock/unlock", I stumbled upon another problem. Again using Intel Fortran 12.0.3.
Here is the program:
! checkscalar.f90 --
! Check some odd (erroneous) behaviour with scalar coarrays
!
program checkscalar
implicit none
logical, codimension
logical, codimension
ready = .false.
new_results = .true. ! Indicates the image has results available
write(*,*) 'Image', this_image(), new_results
sync all
write(*,*) 'Image2', this_image(), new_results
!
! Collect the found primes in image 1, create new tasks
! for all images
!
do while ( .not. ready )
if ( this_image() == 1 ) then
call collect_results
endif
call sleepqq(1)
enddo
contains
!
! Subroutine to collect the results from all
! images (run by image 1)
!
subroutine collect_results
integer :: i
integer :: np
integer :: maxindex
do i = 1,num_images()
write(*,*) 'Examine', i, new_results
enddo
do i = 1,num_images()
ready = .true.
enddo
end subroutine collect_results
end program checkscalar
It does not do much: in image 1 I print the value of new_results and when the
loop is finished I set the flag ready in all images so that they will stop. At least
that is my intention.
The output of one run with this program is:
Image 6 T
Image 1 T
Image 3 T
Image2 3 T
Image 7 T
Image2 7 T
Image2 6 T
Image 5 T
Image 2 T
Image2 2 T
Image 4 T
Image 8 T
Image2 4 T
Image2 1 T
Examine 1 T
Image2 8 T
Image2 5 T
(after a few seconds of no progress at all, I stopped the program)
So, all 8 images start, image 1 is entering the loop, prints the value of new_results on that
image, and then hangs - it does not get beyond this!
Does anyone have any clues? Am I doing something wrong? (Possible, of course, but the
program is so simple that I can not believe that.)
Regards,
Arjen
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[fortran]do while ( .not. ready ) if ( this_image() == 1 ) then call collect_results endif call sleepqq(1) enddo [/fortran]If the loop is entered and this_image() returns a value different from 1, the loop will do nothing but consume time endlessly. What event would cause the IF block to be executed in this case?
With a dual-core CPU, I get the output
[bash] Image 1 T Image 2 T Image2 1 T Examine 1 T Image2 2 T [/bash]and then the program gets stuck in the DO WHILE loop. The last line printed has this_image() equal to 2, so the loop will keep calling SLEEPQQ and do nothing else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The function this_image() returns the unique number belonging to the image.
The intention of the above fragment is to have image 1 (or thread 1 or ...) examine the
results produced by all images. In the reduced program I posted there are no actual results,
but the variable that indicates there are is set to .true. at the start. The only task to be done
in the routine collect_results is to loop over the images, print the value of new_results on that
image and then to set the variable ready in all images so that the do-while loop stops.
In short:
All images except one should wait for image 1 to set "ready" and then terminate the loop
and stop the program altogether.
Unfortunately, image 1 seems to get stuck reading the value of new_result on image 2
and then the program hangs.
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry I cannot help further as I do not currently have PS 2011 XE, but I hope to have it shortly (a few weeks). At that point I should be able to run your example program (although not on Mac). Could you describe your test environment: processor/processors, single system/multi-system, connection, OS/OSs, bitness (32/64) version of IVF (Fortran Composer). This would help me in trying to replicate your problem.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should not use things such as SLEEPQQ in coarray applications. Use the language-provided synchronization features instead. For example:
[fortran]! checkscalar.f90 -- ! Check some odd (erroneous) behaviour with scalar coarrays ! program checkscalar implicit none logical, codimension
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I run this program on a 64-bits Linux machine with 8 cores.
I am using the free version of Intel Fortran for Linux, version number 12.0.3.
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
thanks for these comments - still learning my way around coarrays. I will experiment with this.
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I run this program on a 64-bits Linux machine with 8 cores
Other than for a learning experience with coarrays there is little reasons for using coarrays on such a system - OpenMP would provide for better performance and ease in programming. The only advantage of using coarrays on that system is the static data size could be larger due to multiple process space and the local data (stack and local process allocatables) are acquired from different process virtual address space. And lack of space for local data would only be a concern if you were running 32-bit applications.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried Steve's version of my program and that works fine. However, when I introduced a loop with a "sync all" statement, only the first thread finishes - the ready variable seems not to be updated on any other image.
Here is the program:
! checkscalar.f90 --
! Check some odd (erroneous) behaviour with scalar coarrays
!
program checkscalar
implicit none
logical, codimension
logical, codimension
ready = .false.
new_results = .true. ! Indicates the image has results available
write(*,*) 'Image', this_image(), new_results
sync all
write(*,*) 'Image2', this_image(), new_results
!
! Collect the found primes in image 1, create new tasks
! for all images
!
do while ( .not. ready )
sync all
if ( this_image() == 1 ) then
call collect_results
endif
sync all
enddo
write(*,*) 'Image ',this_image(), ' done'
contains
!
! Subroutine to collect the results from all
! images (run by image 1)
!
subroutine collect_results
integer :: i
integer :: np
integer :: maxindex
do i = 1,num_images()
write(*,*) 'Examine', i, new_results
enddo
do i = 1,num_images()
ready = .true.
enddo
end subroutine collect_results
end program checkscalar
What I see in the output is that image 1 finishes after examining all images and the others
never produce the message "Image n done", despite the "sync all" statements.
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I replaced the sync all statements in the do while loop by sync images statements:
do while ( .not. ready )
if ( this_image() == 1 ) then
call collect_results
sync images( * )
else
sync images( 1 )
endif
enddo
and then the program finishes nicely. Now with this solution I can continue my experiments
with the actual program(s).
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A potential error (not your error) is if the compiler optimization stripped the second sync all in your do loop (i.e. sees one at top and bottom and assumes redundant). If this is the case then potentially the sync all following image 1's collect_results is never called.
Two things to try
a) place sync all after do loop
b) remove first sync all in do loop (forcing sync all to follow collect_results)
Wouldn't hurt to test both scenarios as you want to discover what is going on as opposed to simply getting the code to work (i.e. don't stop testing if first test succeeds).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rather than use a covariable "ready" and testing it in a loop, use the synchronization tools provided by the language, including locks, critical sections and the various forms of SYNC. It would be an unusual coarray application that needed to use a loop for this.
The biggest danger of learning coarrays is assuming you can simply translate concepts from shared-memory programming in the past. If you're an MPI programmer, however, it may be an easier transition as coarrays sort of look like one-way MPI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the compiler that this statement is required, but the result was still the same.
Adding a write statement here and there does clarify _what_ is happening, but not why.
Image 1 leaves the loop, but all other remain in the loop. It definitely looks if the "ready"
coarray is not updated. With the "sync image" statement it is. (Actually got my original
program to work that way)
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-O0 works too.
Well, that is good to know!
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I introduce work arounds I also insert a comment with a common unique signature
! **hack**
! using "( .not. ready[this_image()] )" as opposed to "(.not. ready)"
! due to compiler bug
Then later on, as I get new versions of the compiler, I can locate all the hacks and test to see if bug fixed.
Also, when fixed, I leave the code in but conditionalized out. You never know if a bug resurfaces.
This would seem to indicate that sync all is broken, or at least appears broken under this circumstance.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page