Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Coarray and SYNC MEMORY statement

OP1
New Contributor III
1,685 Views

Beyond the "Coarrays in the next Fortran Standard" document by Reid et al, is there a thorough treatment of the subject/documentation available anywhere? I am trying to graps some of the subtle(r) aspects of coarray.

For instance, I am puzzled by the behavior of the simple code below. When the SYNC MEMORY statement is commented out, the code runs until completion; but I will note that all tasks do no start in parallel - tasks assigned to images other than 1 do not start until image 1 has completed its task. When the SYNC MEMORY statement in not commented out, the code behavior is the desired one (all images start doing their tasks in parallel), but then the code hangs after the last task complete.

I am sure the reason for this must be obvious to those more versed in coarray but it still escapes me :) .

Thanks in advance for your help,
Olivier

 [fortran]

PROGRAM

MAIN

USE

, INTRINSIC :: ISO_FORTRAN_ENV, ONLY: LOCK_TYPE

IMPLICIT NONE

INTEGER

:: I,J,TASK

INTEGER

:: STACK_SIZE

  • INTEGER

    ,ALLOCATABLE:: TASK_STACK(:)[:]

    LOGICAL

    :: KEEP_WORKING

    REAL(8)

    :: R

    TYPE

    (LOCK_TYPE) :: STACK_LOCK

  • ALLOCATE

    (TASK_STACK(10)

  • )
  • IF

    (THIS_IMAGE()==1)

    THEN

    DOI=1,SIZE(TASK_STACK)

    TASK_STACK(I) = I

    END DO

    STACK_SIZE =

    SIZE(TASK_STACK)

    WRITE(*,*) 'There will be', NUM_IMAGES(), ' images executing the tasks.'

    END IF

    SYNC ALL

    KEEP_WORKING = .TRUE.

    DO

    WHILE(KEEP_WORKING)

    LOCK(STACK_LOCK[1])

    IF (STACK_SIZE[1]>0) THEN

    TASK = TASK_STACK(STACK_SIZE[1])[1]

    STACK_SIZE[1] = STACK_SIZE[1]-1

    KEEP_WORKING = .TRUE.

    ELSE

    KEEP_WORKING = .FALSE.

    END IF

    UNLOCK(STACK_LOCK[1])

    !SYNC MEMORY

    IF (KEEP_WORKING) THEN

    WRITE(*,*)'I am image ',THIS_IMAGE(),' and I am doing task ',TASK,'.'

    R = 0.0D+0

    DOI=1,8000

    DOJ=1,8000

    R = R+

    COS(REAL(I,KIND=8))+SIN(REAL(J,KIND=8)+R)

    END DO

    END DO

    WRITE(*,*)'I am image ',THIS_IMAGE(),' and I am done: ',R

    END IF

    END DO

    SYNC ALL

    END

    PROGRAM MAIN

    [/fortran]

    0 Kudos
    8 Replies
    Steven_L_Intel1
    Employee
    1,685 Views

    There is the Fortran 2008 standard at http://j3-fortran.org/doc/standing/links/007.pdf

    We found that our implementation of SYNC MEMORY was not correct and have fixed it for a release later this year. With the SYNC MEMORY enabled, I get the following output which I think is what you want:

    [plain]

     There will be           8  images executing the tasks.
     I am image            1  and I am doing task           10 .
     I am image            1  and I am done:   -21267699.5410634
     I am image            1  and I am doing task            8 .
     I am image            6  and I am doing task            9 .
     I am image            6  and I am done:   -21267699.5410634
     I am image            1  and I am done:   -21267699.5410634
     I am image            1  and I am doing task            6 .
     I am image            4  and I am doing task            7 .
     I am image            4  and I am done:   -21267699.5410634
     I am image            1  and I am done:   -21267699.5410634
     I am image            1  and I am doing task            4 .
     I am image            3  and I am doing task            5 .
     I am image            1  and I am done:   -21267699.5410634
     I am image            3  and I am done:   -21267699.5410634
     I am image            1  and I am doing task            3 .
     I am image            1  and I am done:   -21267699.5410634
     I am image            1  and I am doing task            2 .
     I am image            1  and I am done:   -21267699.5410634
     I am image            5  and I am doing task            1 .
     I am image            5  and I am done:   -21267699.5410634
    [/plain]

    You can sign up for the beta of this version - see the sticky thread at the top of this forum.

    0 Kudos
    OP1
    New Contributor III
    1,685 Views

    Steve,

    Your prompt answer is greatly appreciated - I cannot commend you enough on how nicely and efficiently you and your colleagues support this forum! Thank you so much.

    It seems from the example you posted that you have the same behavior as when I comment out the SYNC MEMORY statement. Actually, I believe the SYNC MEMORY statement is not required in this example. I note that for instance, image 6 (or the second task assigned to image 1) does not start until the first task assigned to image 1 is complete. I do not understand this behavior.

    Ideally, since you have 8 cores, shouldn't you have (since the tasks take some time to complete) an output that looks like:

    I am image 1 and I am doing task...
    I am image 2 and I am doing task ...
    I am image 3 and I am doing task ...

    before any of the task completion messages? In your example just like in mine, the other images do not start until image 1 is done with the first task. Hence my puzzlement :)

    Olivier

    0 Kudos
    Steven_L_Intel1
    Employee
    1,685 Views

    Yes, there does seem to be something odd going on here. I will look at it more closely.

    0 Kudos
    OP1
    New Contributor III
    1,685 Views

    Steve - did you get a chance to take a look at this a bit further?

    Also - the documention on what SYNC MEMORY does is a bit obscure. Does SYNC MEMORY only make sure that the image that encounters it 'exposes' its own coarrays; or when one image encounters it, it requires all images to expose their own coarrays (knowing that they might be busy doing other things, which may slow down the execution of SYNC MEMORY).

    Coarrays seem to have significant potential for code simplification - I wish there were more examples available to get started.

    Thanks,
    Olivier

    0 Kudos
    Steven_L_Intel1
    Employee
    1,685 Views

    I did not have a chance to investigate this further. Here is what the standard says:

    2 1 Execution of a SYNC MEMORY statement ends one segment and begins another; those two segments can be
    3 ordered by a user-defined way with respect to segments on other images.
    4 R862 sync-memory-stmt is SYNC MEMORY [ ( [ sync-stat -list ] ) ]
    5 2 If, by execution of statements on image P,
    6 - a variable X on image Q is defined, referenced, becomes undefined, or has its allocation status, pointer
    7 association status, array bounds, dynamic type, or type parameters changed or inquired about by execution
    8 of a statement,
    9 - that statement precedes a successful execution of a SYNC MEMORY statement, and
    10 - a variable Y on image Q is defined, referenced, becomes undened, or has its allocation status, pointer
    11 association status, array bounds, dynamic type, or type parameters changed or inquired about by execution
    12 of a statement that succeeds execution of that SYNC MEMORY statement,
    13 then the action regarding X on image Q precedes the action regarding Y on image Q.
    14 3 User-defined ordering of segment Pi on image P to precede segment Qj on image Q occurs when
    15 - image P executes an image control statement that ends segment Pi, and then executes statements that
    16 initiate a cooperative synchronization between images P and Q, and
    17 - image Q executes statements that complete the cooperative synchronization between images P and Q and
    18 then executes an image control statement that begins segment Qj .

    19 4 Execution of the cooperative synchronization between images P and Q shall include a dependency that forces
    20 execution on image P of the statements that initiate the synchronization to precede the execution on image Q of
    21 the statements that complete the synchronization. The mechanisms available for creating such a dependency are
    22 processor dependent.

    NOTE 8.40
    SYNC MEMORY usually suppresses compiler optimizations that might reorder memory operations acrossthe segment boundary dened by the SYNC MEMORY statement and ensures that all memory operationsinitiated in the preceding segments in its image complete before any memory operations in the subsequentsegment in its image are initiated. It needs to do this unless it can establish that failure to do so could notalter processing on another image.

    NOTE 8.41
    SYNC MEMORY can be used to implement specialized schemes for segment ordering, such as the spin-waitloop. For example:

    USE,INTRINSIC :: ISO_FORTRAN_ENV
    LOGICAL(ATOMIC_LOGICAL_KIND),SAVE :: LOCKED

  • = .TRUE.
    LOGICAL :: VALINTEGER :: IAM, P, Q
    ...
    IAM = THIS_IMAGE()
    IF (IAM == P) THEN
    ! Segment Pi
    SYNC MEMORY ! A
    CALL ATOMIC_DEFINE (LOCKED, .FALSE.)
    ! Segment Pi+1
    ELSE IF (IAM == Q) THEN
    VAL = .TRUE.
    DO WHILE (VAL)
    ! Segment Qj-1
    CALL ATOMIC_REF (VAL, LOCKED)
    END DO
    SYNC MEMORY
    ! B! Segment Qj
    END IF

    The DO WHILE loop does not complete until VAL is defined with the value false. This is the cooperativesynchronization that provides the dependency that image Q does not complete segment Qj-1 until theCALL statement in segment Pi+1 completes. This ensures that the execution of segment Pi on image Pprecedes execution of segment Qj on image Q.

    The first SYNC MEMORY statement (A) ensures that the compiler does not reorder the following statement(segment Pi+1) with the previous statements, since the lock should be freed only after the work in segmentPi has been completed.

    The second SYNC MEMORY statement (B) marks the beginning of a new segment, informing the compilerthat the values of coarrays referenced in that segment might have been changed by other images in precedingsegments, so need to be loaded from memory.

    NOTE 8.42
    As a second example, the user might have access to an external procedure that performs synchronization between images. That library procedure might not be aware of the mechanisms used by the processor to manage remote data references and denitions, and therefore not, by itself, be able to ensure the correct memory state before and after its reference. The SYNC MEMORY statement provides the needed memoryordering that enables the safe use of the external synchronization routine. For example:

    INTEGER :: IAM
    REAL :: X

  • IAM = THIS_IMAGE()
    IF (IAM == 1) X = 1.0
    SYNC MEMORY
    CALL EXTERNAL_SYNC()
    SYNC MEMORY
    IF (IAM == 2) WRITE(*,*) X[1]

    where executing the subroutine EXTERNAL SYNC has an image synchronization effect similar to executinga SYNC ALL statement.
  • 0 Kudos
    IanH
    Honored Contributor III
    1,685 Views

    Like others, I'm also trying to get my head around coarrays.  I've noticed with examples similar to opmkl's that if the image which owns the particular coindex thingy's (stack_lock and stack_size here on image one) being used for coordination between images is sitting at a SYNC ALL or similar statement, then things tend to exhibit more parallel behaviour.  This implies to me that requests from other images to access coarrays on a particular image do not proced unless that particular image is executing a SYNC... statement of some form.

    Is this a behaviour required by the standard (I don't see why it would be) or just a consequence of the underlying implementation?

    Any further insight into the need or otherwise of the SYNC MEMORY appreciated.  I don't really understand the implications of the text in the standard, possibly due to executing a few too many SHIRAZ MEMORY statements over the years.

    (Do you actually need two - one before the reference of stack_size and one after the modification?  Do they both need to be inside the LOCK/UNLOCK?)

    0 Kudos
    OP1
    New Contributor III
    1,685 Views

    Steve,

    This thread was left in pending state last year - due to improvements to be brought to the Intel Compiler for the coarray implementation. Was the behavior highlighted above investigated and is it understood now? Also, the performance of the compiler (as far as speed is concerned) was rather dismal when coarrays were used - what type of quantitative improvements have occured since then?

    Thanks,

    Olivier

    0 Kudos
    Steven_L_Intel1
    Employee
    1,685 Views

    We have made some significant improvements in the 14.0 compiler, but there is always room for more. We have fixed some SYNC MEMORY issues over the past year or two. But I'm sorry to say that we didn't further investigate the issues you raised here. I'll make a note of this and try to get back to it soon.

    0 Kudos
    Reply