- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a string type using the obvious approach of encapsulating an "character(len=:), allocatable" variable. I came upon a bug in my code, where I did something like (look at the code below):
str = str%cs(1:5)
which lead to a result with some control characters in it. Depending on the code, valgrind either show invalid memory reads, or overlapping addresses in call to memcpy. The obvious solution is to make a copy of the right hand side character sequence, deallocate the derived-type component of the left hand side and use the copy to initialise the string. Is there a way to tell the compiler that self%cs and cs might overlap in memory, so that the compiler can generate appropriate code on its own? In particular, the copying is not necessary in general, only in a few cases.
BTW, there is also a compiler bug (already reported), with the overloading of the len intrinsic. Adding an "intrinsic :: len" solves the issue, at least for ifort19.
module mod implicit none private type, public :: string character(len=:), allocatable :: cs contains procedure, public :: len => string_len procedure, public :: assign, assign_notworking ! generic, public :: assignment(=) => assign generic, public :: assignment(=) => assign_notworking end type string interface len module procedure string_len end interface len contains elemental function string_len(self) result(l) integer :: l class(string), intent(in) :: self l = len(self%cs) end function string_len elemental subroutine assign(self, cs) class(string), intent(inout) :: self character(len=*), intent(in) :: cs ! this declaration is required due to a bug in ifort intrinsic :: len character(len=len(cs)) :: cs_copy ! first make a copy as cs and self%cs might overlap cs_copy = cs deallocate(self%cs) allocate(self%cs, source = cs_copy) end subroutine assign elemental subroutine assign_notworking(self, cs) class(string), intent(inout) :: self character(len=*), intent(in) :: cs ! this is not working if cs and self%cs overlap in memory self%cs = cs end subroutine assign_notworking end module mod program assign_overlap use mod implicit none type(string) :: str str%cs = '0123456789' print *, '"'//str%cs//'"' str = str%cs(1:5) print *, '"'//str%cs//'"' end program assign_overlap
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last feather broke the camel's back, as an old saying goes(?). I have been informed awhile back that the bug I reported has been removed. I just tested with the beta prereview release and indeed, valgrind does not show any invalid reads and everything looks fine. Thanks for fixing.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Martin, I think this is a violation of the aliasing rules, however, something which the compiler does not have to detect.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, I guessed as much. Even though it is very easy to miss in this circumstance (blessed be valgrind!)
But is there a way to tell the compiler? Performance-wise it would be nice to avoid the copying. But this is nothing serious, because super-fast string operations are not a main concern in the typical fortran code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It depends a little on your software design I would say. Assignment to me means that the string type you are assigning to is empty or cleared (your operation of deallocating and new allocation). However, if you want to be able to insert (sub)strings into an existing string, things are more complicated. If you really want to assign and not insert, then I would clear the memory. Note that the assign operators in the iso_varying_string module by Rich Townsend have intent(out), not intent(inout). So, no aliasing rules could be violated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two problems:
First gfortran compaints that intent(out) is not allowed in this context. It gives the error message
Error: INTENT(OUT) argument ‘self’ of pure procedure ‘assign_notworking’ at (1) may not be polymorphic.
Originally I had a intent(out) and got this error. Interestingly, ifort compiles without complaints, which might actually be a bug in ifort as the standard seems to indeed prohibit this combination.
But it also does not solve the aliasing issue. Ifort compiles, but still valgrind compaints. Internally I expect that the assign routine deallocates the self%cs (due to intent(out)), then allocates a new pointer and only then access argument cs, which already has become invalid. The intent(out) does not help the compiler to foresee the aliasing issue.
Regarding design: From a string type, I expect that update operations like str = str // something or str = something // str just work (much like other type like reals or integers). So aliasing issues should be handled internally, and not be visible to any user of the class.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I took a look at iso_varying_string, and changed my small test code accordingly. In order to work, I had to change the class(string) to type(string) for argument self of routine assign. But now assign cannot be a type-bound procedure anymore. Not a big deal. I added a constructor fromChars. So now the code looks a lot like the iso_varying_string routines. And voila, the compiler has no aliasing issue! I am a bit dumbfounded. It looks like that the constructor fromChars, which copies cs, is called before str is deallocated. But it the compiler required to do that? Or might it also be allowed to clear the intent(out) argument first and then access cs, which would break the code? Even this solution looks fragile? At least I can reason the intent(inout) plus explicit copying has no issues, whereas the iso_varying_string approach might break if the compiler optimises differently.
Very confusing!
Here is the updated code
module mod implicit none private public assignment(=) type, public :: string character(len=:), allocatable :: cs end type string interface assignment(=) module procedure assign end interface assignment(=) contains pure function fromChars(cs) result(str) type(string) :: str character(len=*), intent(in) :: cs str%cs = cs end function fromChars subroutine assign(self, cs) type(string), intent(out) :: self character(len=*), intent(in) :: cs self = fromChars(cs) end subroutine assign end module mod program assign_string use mod implicit none type(string) :: str str%cs = '0123456789' print *, '"'//str%cs//'"' str = str%cs(1:5) print *, '"'//str%cs//'"' end program assign_string
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There should not be an aliasing problem - Fortran's semantics are that the right side of an assignment are completely evaluated before the left side is modified. This also goes for defined assignment, where the standard specifies that the right side is treated as if it were enclosed in parentheses.
However, Intel Fortran has for a very long time not honored this rule (for defined assignment), and my guess is that is what is happening here. Try enclosing the str%cs(1:5) in parentheses to see what happens. I know I reported this issue to the developers many times over the years, and am not aware that it was ever addressed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Ret.) wrote:Thanks Steve for the clarification. In view of this example, it sounds like a sensible requirement by the standard. However, I assumed that it was a mistake on my part because gfortran shows the same problem. Parenthesis around str%cs(1:5) do not help (with -O0). However, str = str%cs(1:5) // '' solves the issue with both compilers. On the other hand if I append the empty string inside assign_notworking, then ifort (no matter which optimisation level) fails, but gfortran works. If this has been reported many times, I guess there is no need to do it once more.
There should not be an aliasing problem - Fortran's semantics are that the right side of an assignment are completely evaluated before the left side is modified. This also goes for defined assignment, where the standard specifies that the right side is treated as if it were enclosed in parentheses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please DO report it once more. The more reports, the more likely it is to get fixed. I am rather astonished that gfortran has the same bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have reported the issue and got the following answer (I reported a reduced testcode, where "assign_notworking" has the name "assign"):
For your case there is an alias between the two arguments of subroutine "assign". By default Intel compiler will assume there is no memory shared among dummy arguments of a subroutine call. To change this behavior please add compiler option "-assume dummy_alias" to the command line. More information about this compiler option please refer to <https://software.intel.com/en-us/fortran-compiler-18.0-developer-guide-and-reference-assume>.
But this is overshooting the mark by a large margin. So I guess ifort will continue to violate the fortran standard on purpose. What is the motivation for doing so? Or is this behaviour covered by the standard (though I very much trust that Steve is right here)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are coding errors that are the programmer's responsibility, and there are errors that the compiler is required (by the Standard) to detect and handle. The number of errors in the first group is considerably larger. "Professional" compilers such as Ifort tend to favor high performance, and to skip error checking that carries more than negligible cost.
The compiler vendor has to make a compromise between no-checks => high performance and checking for as many errors as can be done, even those errors that are the programmer's responsibility; that level of checking usually causes the compilation and/or the compiled code to be markedly slower and consume more memory. I have seen codes slowed down by a factor of 100 by having full checking turned on. Often, the user has to specify (through compiler switches, configuration files, etc.) that such extra checking is desired.
On the other hand, if an error message (or even a warning) is issued when none was warranted, or is worded incorrectly, it may take some persistence to have the issue corrected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Martin, the response you received is incorrect, but understandable if someone doesn't understand enough of the language.
F2008, 12.4.3.4.3 (Defined assignments) says: "A defined assignment is treated as a reference to the subroutine, with the left-hand side as the first argument and the right-hand side enclosed in parentheses as the second argument."
Note the "enclosed in parentheses". This makes the second argument an expression which cannot alias the left side. In essence, it follows the same rules as for intrinsic assignment. As I wrote above, ifort's treatment of defined assignment has had this issue for many years, and there are existing reports filed on it. Please update your report to include the above information and let me know the response. If the person responding to your issue fails to recognize the error, let me know (include the case number) as I have contacts....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej4 wrote:
There are coding errors that are the programmer's responsibility, and there are errors that the compiler is required (by the Standard) to detect and handle. The number of errors in the first group is considerably larger. "Professional" compilers such as Ifort tend to favor high performance, and to skip error checking that carries more than negligible cost.
As you can see from my first post, I was originally thinking along these lines and just asking for a way to tell the compiler for this one case to assume that arguments might alias (so it can check it and generate an extra code path...).
In general having compiler options to add checks for debug runs is very valuable. Unfortunately, ifort does not optimise anymore with runtime checks enabled. With the old ifort compilers, it was pretty nice to optimise as much as possible with runtime options, so computation times were acceptable and large simulations could still be checked occasionally. And most interesting bugs (out of bounds array checks, initialised local variables, pointers) were usually not optimised out. Any reason why these runtime checks disable optimisation nowadays?
Steve Lionel (Ret.) wrote:
F2008, 12.4.3.4.3 (Defined assignments) says: "A defined assignment is treated as a reference to the subroutine, with the left-hand side as the first argument and the right-hand side enclosed in parentheses as the second argument."
Thanks, especially for the reference. Searching for it, there are few meaningful hits, but sufficient to do some reading myself.
(It is request 03558650, and I have answered citing the standard 12.4.3.4.3. But I will be offline for the next two weeks.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While the compiler team investigates this issue and decides on its response (which can take a while), Fortranners seeking to "roll" their own "string classes" might consider sticking to the basics of object-oriented design which is to encapsulate "class" data (i.e., not make the data public) and also what other languages have long done that is to provide getter method(s) for the data e..g, substr in C++ string class. Such an option is shown below. This should simplify the "class" design for coders, consumers of the "class", as well as the compiler. An issue with possible leak in Intel Fortran compiler with reallocation of LHS in the defined assignment persists but otherwise the issue with aliasing can be avoided this way.
module m implicit none private type, public :: string_t private character(len=:), allocatable :: cs contains private procedure, pass(self) :: get_substr_pos procedure, pass(self) :: get_substr_pos_len procedure, pass(self) :: assign_t procedure, pass(self), public :: length => string_len generic, public :: assignment(=) => assign_t generic, public :: substr => get_substr_pos, get_substr_pos_len end type string_t contains elemental function string_len(self) result(l) class(string_t), intent(in) :: self ! Function result integer :: l l = len(self%cs) end function string_len elemental subroutine assign_t(self, cs) class(string_t), intent(inout) :: self character(len=*), intent(in) :: cs self%cs = cs end subroutine assign_t elemental function get_substr_pos( self, pos ) result( substr ) ! Argument list class(string_t), intent(in) :: self integer, intent(in) :: pos ! Function result character(len=len(self%cs)-pos+1) :: substr ! checks elided substr = self%cs(pos:) return end function get_substr_pos elemental function get_substr_pos_len( self, pos, slen ) result( substr ) ! Argument list class(string_t), intent(in) :: self integer, intent(in) :: pos integer, intent(in) :: slen ! Function result character(len=slen) :: substr ! checks elided substr = self%cs(pos:(pos+slen-1)) return end function get_substr_pos_len end module m program assign_overlap use m, only : string_t implicit none type(string_t) :: str str = '0123456789' print *, '"'//str%substr( pos=1 )//'"' str = str%substr( pos=1, slen=5 ) print *, '"'//str%substr( pos=1 )//'"' stop end program assign_overlap
I believe the code above in get_substr_pos* functions uses features from Fortran 2003 with the use of dummy arguments in specification expressions - this may not work with other compilers that do not yet support such Fortran 2003 features.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last feather broke the camel's back, as an old saying goes(?). I have been informed awhile back that the bug I reported has been removed. I just tested with the beta prereview release and indeed, valgrind does not show any invalid reads and everything looks fine. Thanks for fixing.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page