Invalid sequence association when passing substring

Dave_Allured · ‎02-21-2012

In the following test, the main program passes substrings from a character array, via the explicit interface of sub1. Sub1 then passes on its input array to sub2 via an implicit interface. I test two variations of the actual argument in the call statements for sub2:

[fortran]! Version string14.f90, 2012-feb-21 subroutine sub2 (s2) character(len=*) s2(*) print "(' ""', a, '""')", s2(1:4) end subroutine sub2 module mod1 contains subroutine sub1 (s1) character(len=*), intent(in) :: s1(:) print *, 'len (s1) = ', len (s1) print *, 'shape (s1) = ', shape (s1) print *, 'Pass by reference to whole array:' call sub2 (s1) print *, 'Pass by reference to first array element:' call sub2 (s1(1)) end subroutine sub1 end module mod1 program str_test use mod1 character(len=3) s(4) s = (/ '123', '456', '789', 'ABC' /) call sub1 (s(:)(1:2)) end program str_test [/fortran]
I would expect to get the same results either way, by sequence association, but there is a surprise:

[bash]mac56:~/bugs/string-array 83> ifort string14.f90 mac56:~/bugs/string-array 84> ./a.out len (s1) = 2 shape (s1) = 4 Pass by reference to whole array: "12" "45" "78" "AB" Pass by reference to first array element: "12" "34" "56" "78" [/bash]
Is this valid Fortran? Is this a compiler bug?

I get the same results with two versions of Intel Fortran. On Mac OS 10.6.8:
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100806 Package ID: m_cprof_p_11.1.089

On Mac OS 10.7.3 (Lion):
Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.1.246 Build 20111011

Also note that the following compilers get the "right" answer:
gfortran version 4.5.2 on Mac
pgf90 version 8.0-2 on Linux

And Sun Fortran 95 version 8.3 on Linux gets the "wrong" answer, same as Intel Fortran.

Thanks for any insights about this.

--Dave A.

JVanB · ‎02-21-2012

Your code is invalid because sequence association doesn't work for assumed-shape arrays (except maybe for those with the CONTIGUOUS attribute in f2008 but I haven't checked this).For more consistent results give both actual argument s and dummy argument s1 the TARGET attribute.

Steven_L_Intel1 · ‎02-22-2012

Repeat Offender is mistaken - sequence association is valid for assumed-shape arrays. My reading of the Fortran standard has Intel and Sun giving correct results for this program.

In the case where you pass s1(1), you get sequence association. But the standard calls out character arguments for special treatment:

"If the actual argument is default character or of type character with the C character kind, and is an array expression, array element, or array element substring designator, the element sequence consists of the storage units beginning with the first storage unit of the actual argument and continuing to the end of the array. The storage units of an array element substring designator are viewed as array elements consisting of consecutive groups of storage units having the character length of the dummy array."

Then there's a note:

NOTE 12.32
Some of the elements in the element sequence may consist of storage units from different elements of the
original array.

So in this case, you pass the characters "12". Sequence association then simply takes the following characters in storage order, even though some of those characters are not in the elements!

JVanB · ‎02-22-2012

Sorry, but you're looking in the wrong section, 12.5.2.11 instead of 12.5.2.4, which reads:

If the actual argument is a coindexed scalar, the corresponding dummy argument shall be scalar. If the actual argument is a noncoindexed scalar, the corresponding dummy argument shall be scalar unless the actual argument is default character, of type character with the C character kind (15.2.2), or is an element or substring of an element of an array that is not an assumed-shape, pointer, or polymorphic array. If the procedure is nonelemental and is referenced by a generic name or as a defined operator or defined assignment, the ranks of the actual arguments and corresponding dummy arguments shall agree.

The fact that the actual argument s1(1) is an element of an assumed shape array makes a critical difference because s1 is not guaranteed to be contiguous.The standard allows the Fortran processor to make a copy of a non-contiguous actual argument and associate that copy with thedummy argument unless the dummy argument and the actual argument both have the TARGET attribute and the dummy argument satisfies some other condition such as being scalar or an assumed-shape array. Thus s1 could be an array descriptor for the non-contiguous data of array s or it could be an array descriptor for a contiguous copy the the data of array s that constitutes the actual argument to subroutine sub1.

Since this ambiguity exists, the standard must forbid it somehow. If both s and s1 had the TARGET attribute, there would be no ambiguity and all compilers should give the same results even though the code is still invalid.

IanH · ‎02-22-2012

There's an "or" in that list of "unless" requirements. The actual argument is of default character, so the unless requirements are satisfied (otherwise C interoperability would be rather tricky for character stuff). The bit Steve's quoted then (maybe) explains what happens in that case.

Your argument about contiguity makes me think that ifort's response is wrong. If the compiler is allowed to make a copy of non-contiguous actual arguments (it is processor dependent whether s(:)(1:2) is contiguous as it doesn't refer to all the characters in s), it could have chosen to do so when sub1 was called. In that case sub2 would have seen a different sequence of characters. That implies that the visible behaviour of this program is processor dependent and I don't see that being called out in the list of processor dependent behaviours.

There's also some discussion about ultimate and effective arguments in F2008 that makes me wonder whether ifort's response is right. If ifort's response is wrong, then it is a shame in many ways, as the compiler would have to make a copy of the entire character array when only a single element is apparently referenced in the calling code, regardless of whether that single element or all elements are referenced by the called code. Ouch.

(I don't believe ifort or sun claim full F2003 conformance, so a simple answer could be "Sorry, the code is invalid under F95".)

JVanB · ‎02-22-2012

I don't agree. Think about code like:

subroutine sub(x)
character x(4)*(2)
write(*,'(a)') x
end subroutine sub

program main
character*10 a
a = '0123456789'
call sub(a(2:9))
end program main

Here the actual argument is of type default character so it is covered by the stuff before the conjunction 'or'. Why then did you need the part after 'or' about a substring of an element of an array that is not, among other things, an assumed-shape array? After all, such a substring should have been covered just as it was for the code above. OK, I have to say that sequence association can work for an element of an assumed-shape array in the sense that the sequence of elements doesn't go past that element. If array s1 had len=8 or greater then s1(1) could be sequence associated with character s2(4)*(2) but not as in the original program. The standard seems to say that sequence association doesn't allow you to peek at holes in data the way transfer potentially can. If you keep that in mind the interactions between sequence association and potentially non-contiguous objects in the standard makes sense.

IanH · ‎02-22-2012

"or is an element or substring of an element of an array that is not an assumed-shape, pointer, or polymorphic array" applies to things that are not default character and not CHARACTER(C_CHAR) - namely all other types (integer, real, user defined, ...) and other character kinds.

Those specific character kinds get special mention so that you can easily pass Fortran strings to C code (or a fortran procedure that has been written with a C compatible interface) - the interoperable interface for a fortran string is a CHARACTER(1) array. Without that special mention, everytime you wanted to pass a scalar string to C you would need to copy the string to a character(1) array, which would be an unnecessary nuisance given the layout in memory is likely the same.

JVanB · ‎02-23-2012

The part about a substring above cannot apply to things that are not CHARACTER(KIND=KIND('A')) nor CHARACTER(KIND=C_CHAR) (correcting your error above: CHARACTER(C_CHAR) means CHARACTER(LEN=C_CHAR) which is an often seen mistake) because substrings are only possible for CHARACTER variables or constants and sequence association doesn't work for other CHARACTER kinds.

Thus I don't see what point you are trying to make in your post because not allowing sequence association to scan through holes in data structures doesn't affect your ability to use sequence association as modified in f2003 as part of C interoperability to pass a scalar CHARACTER string as an actual argument to a CHARACTER(LEN=1,KIND=C_CHAR) dummy array.

Steven_L_Intel1 · ‎02-23-2012

The exception carved out for character array element references permits sequence association to occur in this example. You are passing only one element, a scalar. There is no copying going on. The address of that one element is passed, and the called routine uses sequence association to get at subsequent storage units. I see no ambiguity, but I sympathize with those who tear their hair out trying to sort out this complex issue.

JVanB · ‎02-23-2012

The standard allows copying:

If the dummy argument has the TARGET attribute, does not have the VALUE attribute, and either the effective argument is simply contiguous or the dummy argument is a scalar or an assumed-shape array that does not have the CONTIGUOUS attribute, and the effective argument has the TARGET attribute but is not a coindexed object or an array section with a vector subscript then

any pointers associated with the effective argument become associated with the corresponding dummy argument on invocation of the procedure, and

when execution of the procedure completes, any pointers that do not become undefined (16.5.2.5) and are associated with the dummy argument remain associated with the effective argument.

So when the main program invokes subroutine s1 in the original example, the standard permits the Fortran processor to make a contiguous copy of the substring array of s (the actual argument) to dummy argument s1 in subroutine sub1. This is in fact what gfortran and pgf90 seem to be doing. Had the original poster given both s in the main program and s1 in subroutine sub1 the TARGET attribute the above-quoted passage of the standard would not have permitted this and gfortran and pgf90 should have given the same results as ifort.

To see where the copying happens, give only s1 the TARGET attribute and compute transfer(c_loc(s1(2)(1:1)),0_c_intptr_t)-transfer(c_loc(s1(1)(1:1)),0_c_intptr_t) in subroutine s1. You should get 3 for ifort and sun but 2 for gfortran and pgf90. Now give also array s in the main program the TARGET attribute and all compilers should give a pointer difference of 3 and the same outputs because copying is now forbidden by the passage I quoted above.

The exception carved out for character array element references has itselfan exception for assumed-shape arrays, see reply #3 above. The exception is there because unless both the effective argument with which the assumed-shape array is associated and the assumed-shape array itself have the TARGET attribute, the compiler is free to make a contiguous copy of the effective argument and associate that with the assumed-shape dummy so that the results of sequence association with the assumed-shape dummy are processor-dependent as already seen. Actually the compiler could lay out the assumed-shape dummy in more creative ways in memory than as-is or contiguous, for example each element could lie at the end of a page in memory where the next page is not readable so that any overrun as in the original program would cause a crash.

Steven_L_Intel1 · ‎02-23-2012

RO, I don't understand your argument. We are discussing the actual argument s1(1). That is a scalar. I agree that when the whole array s1 is passed that a copy will be needed.

JVanB · ‎02-23-2012

Yes, this is tricky to communicate when one is talking about assumed-shape arrays. The copy I am talking about is the one made by gfortran and pgf90 when subroutine sub1 is invoked, not any copy that might be made when subroutine sub2 is invoked. If we are talking about s1(1) as a scalar, then the rules for sequence asociation with scalars apply which say that the dummy argument associated with that scalar can't overrun the storage of that scalar, which would invalidate the code in the original post because s2 takes up 8 bytes of storage. If we are talking about s1(1) as an element of an array, then sequence association with array s2 isn't possible because the array s1 is an assumed-shape array.

I'm not saying that ifort is doing anything wrong because in general the compiler doesn't know that sequence association is going on and even if it did know then in the case of sequence association with an element of a character array it wouldn't know whether the individual array element was overrun so that the restriction about assumed-shape arrays comes into play. There is no requirement for the compiler to detect this kind of error.

Steven_L_Intel1 · ‎02-23-2012

I very much doubt any of the compilers make a copy when sub1 is invoked. They can see an explicit interface for sub1 with the dummy being assumed-shape, so all would pass a "descriptor" (or "dope vector") with bounds and stride information for the non-contiguous slice.

For the sub2 call, you are correct that the compilers cannot see the interface for sub2 and can't know if sub2 will access beyond the bounds of the s1 array. But my interpretation of the standard is that the sequence association, which is legal on its own, is ok here because the storage accessed is not beyond the last element of s1. It's a fine point, I'll grant you.

However, you bring up an important point - there are many ways that a program can violate the standard that a compiler is not required to detect.

Dave_Allured · ‎02-23-2012

All,

To reduce confusion, I need to point out the importace of distinguishing between the two subroutines and their two different interfaces. Some of the early discussion has confused this distinction. I think each interface must be examined separately to reach a full understanding.

Sub1 has an *assumed-shape* dummy argument, s1. It uses normal "array element association", per 12.5.2.4 paragraphs 14 and 15 (Fortran 2008 standard). Note that this dummy argument is disqualified from sequence association by this key sentence in 12.5.2.11 paragraph 4:

"An actual argument that represents an element sequence and corresponds to a dummy argument that is an array is sequence associated with the dummy argument if the dummy argument is an explicit-shape or assumed-size array."

Sub2 has an *assumed-size* dummy argument, s2. Therefore, *both* calls to sub2 use sequence association, also per 12.5.2.11 paragraph 4.

Now, what happens when the dummy argument s1 is used as the actual argument to sub2? Consider 12.5.4 lurking in the background:

"When a subroutine is invoked, all actual argument expressions are evaluated, then the arguments are associated, and then the subroutine is executed."

I think this is an explicit specification for behavior. I think this statement *requires* the original association (array element, not sequence) of dummy argument s1 for *all purposes* within the execution of sub1. This includes usage in actual arguments for call sub2, no matter how they are subscripted.

Therefore, the sequence association back to the array "s" in the main program, as Steve initially described, is not valid. I am avoiding discussion of copy-in and other under-the-hood behavior, so far, because that should be hidden.

Now if there was something in the standard that prohibited the use of an assumed-shape array as an actual argument to another procedure lacking an explicit interface, I would have to agree to a different conclusion. But I could find no such statement.

I now am fairly convinced that the original example is valid Fortran with unambiguous results. Further comments welcome.

--Dave

JVanB · ‎02-23-2012

Now we see why it's so difficult to communicate about this problem. Consider the following code:
[bash]! Version seq_assoc2.f90, 2012-feb-23 module mod1 contains subroutine sub1 (s1) character(len=*), intent(in) :: s1(:) integer i, j do i = 1, size(s1) do j = 1, len(s1) write(*,'(4(a,i0))',advance='no') & 'Address of s1(',i,')(',j,':',j,') = ' call print_address(s1(i)(j:j)) end do end do end subroutine sub1 subroutine print_address(x) bind(C) use iso_c_binding, only: c_intptr_t, c_loc, c_char character(1,C_CHAR),TARGET :: x character(40) fmt integer width width = bit_size(0_c_intptr_t)/4 write(fmt,'(3(a,i0))') '(Z',width,'.',width,')' write(*,fmt) transfer(c_loc(x),0_c_intptr_t) end subroutine print_address end module mod1 program str_test use mod1 character(len=3) s(4) s = (/ '123', '456', '789', 'ABC' /) call sub1 (s(:)(1:2)) end program str_test [/bash]
Output with gfortran:

[bash]Address of s1(1)(1:1) = 000000000022FE00 Address of s1(1)(2:2) = 000000000022FE01 Address of s1(2)(1:1) = 000000000022FE02 Address of s1(2)(2:2) = 000000000022FE03 Address of s1(3)(1:1) = 000000000022FE04 Address of s1(3)(2:2) = 000000000022FE05 Address of s1(4)(1:1) = 000000000022FE06 Address of s1(4)(2:2) = 000000000022FE07[/bash]
As can be seen, gfortran does in fact make the copy when subroutine sub1 is invoked as I expect is the case also with pgf90. Surely they all pass an array descriptor, but gfortran and pgf90 pass a descriptor of a contiguous copy of the subscript array. This behavior is required if array s1 has the CONTIGUOUS attribute but I don't know if any compilers implement that f2008 feature as yet.

I anticipate that when you post the results of the above program with ifort or sun, the addresses will not be contiguous because as you say the compiler will create a descriptor to the subscript array as it sits in memory without making a copy. This behavior is required if both array s in program str_test and array s1 in subroutine sub1 have the TARGET attribute.

The array s1 may or may not have holes and if the programmer cared, he would specify which behavior he wanted as noted above. Since he didn't, the compiler is free to choose and a program that exposes this kind of choice is normally considered nonconforming. If it really is conforming, it's a bug in the standard.

JVanB · ‎02-23-2012

Copy-in has to occur at some point when sub2 is invoked via call sub2(s1). gfortran does the copy when sub1 is invoked, creating a contiguous s1 array. ifort does the copy when sub2 is invoked, making a contiguous copy of s1 which then gets associated with s2. Explicit shape or assumed size dummy arguments can't deal with non-contiguous actual arguments so the compiler always makes a contiguous copy when a discontiguous array is passed to them and the standard doesn't permit an array element of an assumed-shape or pointer array to be passed to them except in one special case as noted earlier in the thread which isn't applicable here.

There is a difference between passing an assumed shape array as an actual argument and passing an element of an assumed shape array as an actual argument and expecting the dummy argument to be sequence associated with the whole assumed shape array as I pointed out in reply #3. That's why your code is nonconforming. When you pass the whole array as an actual argument the compiler knows to make a contiguous copy and everything works as you originally expected but when you pass an array element the compiler only passesits address and for an assumed shape or pointer array this is not sufficient information to access all of the data in the array correctly.

Dave_Allured · ‎02-23-2012

Thank you, Repeat Offender. Now we are getting somewhere!

I agree with your earlier point in reply #14 that the descriptors and physical storage passed for subroutine sub1 can vary by compiler. My recent experiences with these four compilers show circumstantial evidence for this, though I have not peeked directly like you did.

So far, these storage details for the sub1 interface are all hidden under the hood, as they should be. Inside sub1, I care only that I can access the elements of dummy argument s1 via normal Fortran references, and that they contain the values that are supposed to be there by the association rules for an assumed-shape array. At this point I do not care whether the physical storage is contiguous or has holes. That is only the compiler's business -- so far.

RO: "The array s1 may or may not have holes and if the programmer cared, he would specify which behavior he wanted as noted above. Since he didn't, the compiler is free to choose and a program that exposes this kind of choice is normally considered nonconforming."

You have lept from "the compiler is free to choose [the method of physcal storage of s1]" to "a program that exposes this" without considering what happens in between. You have not yet made your case.

RO: "Copy-in has to occur at some point when sub2 is invoked via call sub2(s1). ifort does the copy when sub2 is invoked, making a contiguous copy of s1 which then gets associated with s2. Explicit shape or assumed size dummy arguments can't deal with non-contiguous actual arguments so the compiler always makes a contiguous copy when a discontiguous array is passed to them "

Agree completely, right up to this point!

RO: "There is a difference between passing an assumed shape array as an actual argument and passing an element of an assumed shape array as an actual argument and expecting the dummy argument to be sequence associated with the whole assumed shape array as I pointed out in reply #3."

12.5.2.11 paragraph 1 addresses this explicitly. The third sentence is definitive:

"An actual argument represents an element sequence if it is an array expression, an array element designator, a default character scalar, or a scalar of type character with the C character kind (15.2.2). If the actual argument is an array expression, the element sequence consists of the elements in array element order. If the actual argument is an array element designator, the element sequence consists of that array element and each element that follows it in array element order.

s1(1) is an array element designator. In the second call above to sub2, the actual argument s1(1) must be interpreted according to the third sentence (among others). This means exactly what it says; the element sequence includes the following elements through to the last element of s1 (as per paragraph 2).

An element sequence of s1 from the first element to the last element is the same thing as the sequence of all of the elements of s1. Therefore, in this particular case, there is NO difference between passing an assumed shape array as an actual argument, and passing the first element of an assumed shape array as an actual argument.

Now for anyone who has a hard time believing this, I remind you that this method of passing an assumed-size array, by specifying the inital element, is legacy Fortran support, going back to Fortran 77 or earlier. It is not intuitive today, and I would not normally code this. However, this is a maintenance issue in some legacy code that brought this to my attention.

The reference to s1(1) in the second call to sub2 is therefore equivalent to the whole array s1 as the actual argument. This triggers the requirement for the compiler to provide a contiguous array to be passed in both of the above calls to sub2.

--Dave

JVanB · ‎02-23-2012

Sorry, but the paragraph I quoted in reply #3 trumps 12.5.2.11 paragraph 1 because it doesn't permit an array element designator of an assumed-shape array in this context. If you think about it, that would require the whole assumed shape array to be copied in and then copied out each time an array element of a non contiguous array was passed. At the copy-out stage some of these copies can potentially overwrite changes made through the other copies even though none of the dummy arguments were arrays. There would be no way to guarantee sensible results even for quite tame standard-conforming code, so the standard somehow has to disallow your program.

Dave_Allured · ‎02-23-2012

I will consider this. Would you please identify which of the three quoted sentences in reply #3 you think applies, and in precisely which context? We are discussing two different subroutine calls here.

IanH · ‎02-23-2012

For a different twist, this might be of interest: http://j3-fortran.org/pipermail/j3/2011-February/004193.html

That message says that passing a scalar to an array, as is being done in the second call to sub2, is permitted, because the restrictions in 12.5.2.4 (as quoted in #3) are independent. The actual argument, being of default character, meets the first clause of that "or".

However, it then says that the size of the assumed size array inside sub2 (s2) is only one. This would make the reference "s2(1:4)" inside sub2 invalid.

This "behaviour" to me is what would be ideal, as you get the c interop niceness and it means the compiler isn't required to go and make potentially expensive copies of a non-contiguous array when an element is passed to a procedure with no explicit interface or a procedure with explicit interface that has an assumed size dummy argument (if the programmer wants the association to apply to the "whole" array, regardless of what that means, they can set it up themselves). But I can't see where (actual text, as opposed to a presumption of the author's intent) in the standard this second "size is one" aspect of the behaviour is specified - if anything I can see stuff that points the other way.

(This isn't really a legacy issue - you could not have legally been passing elements of an assumed shape array to a array dummy argument prior to F2003. The standard could have explicitly specified the behaviour described in the linked mail message without breaking old code.).

Dave_Allured · ‎02-23-2012

IanH,

I need to look at your comment in the context of my original example. Requoting the second sentence of F2008 12.5.2.4 paragraph 13:

"... If the actual argument is a noncoindexed scalar, the corresponding dummy argument shall be scalar unless the actual argument is default character, of type character with the C character kind (15.2.2), or is an element or substring of an element of an array that is not an assumed-shape, pointer, or polymorphic array. ..."

I put on my lawyer glasses, and parse this as follows. There is multiple negation here, ugh, so please follow carefully.

First I see "unless the actual argument is default character". Okay, all of the subroutine calls in my example are default character, so this is true for any of them. The three parts of the "unless" clause are connected by "OR"; therefore the "unless" condition is collectively true.

The meaning of "unless" is to disqualify what preceeds it. What preceeds it is the restriction "shall be scalar". Therefore this restriction is disqualified. There is no other action clause in this sentence. Therefore, nothing in this sentence is relevant to my original example!

Now in the thread that you linked to, their example is a pointer associated to a character entity. I claim that this is not analagous to my example because (a) I am not using any pointers; and (b) the standard does not guarantee contiguous storage for this kind of pointer usage. (b) is discussed in the thread.