This is a compiler bug (OneAPI Classic compiler, x64) - and a pretty bad one at that (in terms of its consequence on the usability of Intel's implementation of coarrays).
In fact, I believe it has been reported over a year ago in a similar thread: https://community.intel.com/t5/Intel-Fortran-Compiler/Coarrays-of-UDTs-with-Allocatable-components/m...
PROGRAM P IMPLICIT NONE TYPE :: T INTEGER,ALLOCATABLE :: A(:) END TYPE T TYPE :: U TYPE(T),ALLOCATABLE :: P(:) END TYPE U TYPE(U),ALLOCATABLE :: X[:] INTEGER :: TEST ALLOCATE(X[*]) IF (THIS_IMAGE() == 1) THEN ALLOCATE(X%P(1)) X%P(1)%A = [1,2] END IF SYNC ALL IF (THIS_IMAGE() == 2) THEN TEST = 1 ! TEST = 1 will result in an access violation. ! TEST = 2 works. IF (TEST == 1) THEN X = X ELSE ALLOCATE(X%P(1)) X%P(1)%A = X%P(1)%A END IF WRITE(*,*) X%P(1)%A END IF END PROGRAM P
The ifort compiler has implemented coarrays in symmetric memory already. Allocatable components of derived type coarrays are implemented in non-symmetric memory.
In the code example, coarray X of Type (U) (which is itself allocatable, -something I would not do with allocatable components-, but as a coarray still in symmetric memory) does contain an allocatable component (P) of derived type (T) (non-symmetric memory addressing) that contains an allocatable integer component (P) (certainly also non-symmetric memory). That nesting alone could already be a huge problem for the runtime to get the nested non-symmetric memory addressing done. I don’t believe that the Intel compiler team will solve that level of nesting with non-symmetric objects.
The rest of the code does not make too much sense to me either: X = X does mix symmetric with non-symmetric memory assignments, or with other words it does mix highly efficient with non-efficient memory assignments. (Even if the derived type U does not contain a non-allocatable component, coarray X would still be allocated in symmetric memory). That alone could easily corrupt the coarray runtime, I would say.
I believe that this program conforms to the standard and should work. The standard says that intrinsic assignment to a derived type variable operates as if each component were assigned as if by intrinsic assignment from the corresponding expression. In this example, X (without cosubscripts) is the local image's part of the whole coarray, so in image 2, X should be the same as X. I do know that ifort sometimes has trouble with this, but replacing X with X still gets an access violation.
From my point of view, allocatable components of derived type coarrays are difficult enough to handle for both, the implementer as well as the Fortran programmer. Such extended nesting does only worsen things. This small code example appears to be fairly simple, but is it really?
Let’s assume it’s a working code example for the moment:
In this case the assignment
X = X
is assumed by the programmer to include transfer of the (nested) non-symmetric (non-coarray) components as part of the data transfer with the encompassing derived type coarray.
As far as I understand it, for this to work the implementation would be at least required to do the following steps on image 2 to complete successfully (I’d assume something like this in the most simple case):
1. Go to image 1 and get the memory addresses of component P and nested component A (this could be a two-step process, once for each component nesting),
2. now, with the remote memory addresses present, go again to image 1 and get the data of component P and nested component A (this could also be a two-step process, once for each component nesting),
3. now, intermediate save the received data somewhere on image 2 because components P and A are not allocated yet on image 2,
4. get the bounds of components P and A (on image 1) to allocate components P and A on image 2 (this would not require additional data transfer, because in Fortran the bounds are part of the data and thus, should already be received with the data transfer in step 2),
5. now with the bounds from image 1, allocate components P and A on image 2 and save the data from step 2 there.
Here, the programmer has full responsibility to not try to remote access non-allocated components and to make sure remote access is within allocated bounds. Difficult enough without such component nesting, but the nesting will require the programmer to track allocations at each level of nesting.
In my experiences, the use of coarrays is much simpler and safer for the programmer, and coarray teams are already perfectly implemented from the start (that’s my personal experience with OpenCoarrays) to overcome SPMD with Fortran 2018. Personally, I did quit using allocatable components of derived type coarrays with OpenCoarrays: It may work for some simple codes but not at scale, I think.
(I can't use ifort with my codes because it does not support using OOP with atomic coarrays yet. As far as I now, the ifort compiler team is still working on this.)
That’s just my personal opinion and viewpoint.
The reference to X sends a request to image 1 to obtain the value of X on image 1. That some of that value comes from allocatable components is not relevant, unless the components are not allocated, but that is not the case here. The values are then sent to image 2 and assigned to X on image 2 by the rules of intrinsic assignment.
It's different if the components are pointers - you're not allowed to reference pointers on another image.
My guess here is that ifort is having difficulty with nested allocatable components on a coindexed reference. Maybe an Intel support person will pick this up from here, but if you have current support it would be better to report this at the Intel Online Service Center.
Thanks Steve for confirming this. I will open a ticket with Intel - for us this is a high priority fix. One of my concerns is that it seems that Intel will not issue any new version of the classic compiler, and if this also means no new update then this bug fix will not be available with the classic compiler. Since the IFX compiler (the beta version that was released) is, literally, incapable of handling coarrays (for now), we may end up with an unresolved situation for a long time.
Of course, the problem here is certainly the nested allocatable components and nothing else.
I just came across a statement made in Aleksandar Donev’s paper http://caf.rice.edu/documentation/Aleksandar-Donev-Coarray-Rationale-N1702-2007-09.pdf, on page 6, that could apply here as well (-or not, decide yourself-):
“In effect, co-arrays are not hierarchically composable as are derived types (i.e.,arrays of arrays). Co-arrays exist at the top level across all images. One can extend this view in future revisions but the gain is not obvious and the complications are great.“
I wouldn’t be surprised if an implementer would agree with an extended view of this for nested allocatable components of derived type coarrays as well.
This quote has nothing to do with the example above, and it seems you may misinterpret what the coarray capabilities are. If you refer to Metcalf/Reid/Cohen's excellent reference book "Modern Fortran Explained, incorporating Fortran 2018", you will find these two passages that explain what a coarray can and cannot be:
- Section 17.7: "A coarray is permitted to be of a derived type with allocatable or pointer components."
- Section 17.8: "A (derived type) component may be a coarray, and if so must be allocatable. A variable or component of a type that has an ultimate coarray component cannot itself be a coarray and must be a non-pointer non-allocatable scalar."
Now, in the case of intrinsic assignment as shown in my example, I agree with your statement that there may be a performance cost associated with it. I would note however that the benefit of such assignment is significant for the ease of transferring (maybe only occasionally, in an initialization phase) data structures from one image to another (for further analysis).
Our data structures are becoming extremely complex and distributed. Parallelism is not just about 'large matrices or large algebra problems' anymore; it's also about large heterogeneous calculations performed on hierarchically complex data. Think about analyses which are so large that their definition itself (not just the calculations) requires a distributed approach.
Coarrays offer a very elegant solution to this problem, and the addition of teams and events is really bringing them to a next level of usefulness. But... today there are still some substantial compiler limitations.
Just to clarify: Only in the code example and within the derived type coarray X of Type U, I consider only the deep nesting of allocatable types as the problem here, specifically component P of nested type T which contains another nested allocatable integer component. As far as I understand it, at each level of allocation, the implementation will be required to create memory addresses that must be accessible for remote access from other coarray images. But at the same time, if allocated on several images (which is not a requirement, the non-coarray components could also be allocated on a single image only) there will be different memory addresses for remote access at each nested component level on each image. This is hidden from the Fortran programmer but is what the implementer must solve.
The Fortran programmer on the other hand, if desires to handle the construct safely, should track allocation status not only on a single level but through all levels of nested allocatables and make it accessible to other images (because allocation status can’t be accessed remotly by the Fortran language). Therefore, specifically the deep nesting of allocatables inside a derived type coarray could easily become a major source of errors through trying to access such non-allocated components from a remote coarray image. Thus the question: Is there any gain from such deep nested allocatables that’s worth the implementer’s effort? Or is it enough for the programmer to leave the allocatables non-deep nested at a single level instead where multiple allocatables could be used instead?
I don’t now.
(because allocation status can’t be accessed remotly by the Fortran language).
This is not correct, see example below. An image has access to all the information (addresses and content of descriptors of allocatable components, in particular) required for performing an intrinsic assignment with lhs allocation with coarray data from another image.
There is no reason such intrinsic assignments should not work the same way they do for non-coarray variables.
PROGRAM P IMPLICIT NONE TYPE :: T INTEGER,ALLOCATABLE :: A(:) END TYPE T TYPE(T) :: X[*] IF (THIS_IMAGE() == 2) THEN ALLOCATE(X%A(10)) END IF SYNC ALL IF (THIS_IMAGE() == 1) THEN IF (ALLOCATED(X%A)) THEN WRITE(*,*) ALLOCATED(X%A) WRITE(*,*) SIZE(X%A) WRITE(*,*) LBOUND(X%A) WRITE(*,*) UBOUND(X%A) END IF END IF END PROGRAM P
Steve, I think there was a comment by either @Ron_Green or @Barbara_P_Intel to that effect - but I am not sure. It could have been that the comment was that no 'new versions' of IFORT will be released but that 'new updates' of the existing (and last version) are still possible.
Now that I think about it, the comment may also have been that no new non-OneAPI versions of the compiler will be released (which implies that new OneAPI ifort versions are still possible).
Some clarification may be useful.
To clarify, there will be no new releases of Parallel Studio. That package morphed into oneAPI where ifort is still your go-to Fortran compiler. Ron Green did another webinar on the transition on Wednesday this week. It was recorded. If you missed it, the recording will be available on demand shortly from this site.
ifort and ifx share the Fortran Front End and the Fortran Runtime Libraries. Any issues you find with either of those compiler components will be addressed. Coarrays fall in that category. ifort issues around code generation and optimization will probably not.
BTW... coarrays are not yet implemented in ifx.