Upgrading a string-numeric conversion in a very old program

dboggs · ‎07-27-2013

In a very old (F77 or maybe even earlier) program I am attempting to upgrade, a large REAL*4 buffer array was used to contain a variety of dynamically changing quatities, including lots of CHARACTER*4 strings. This was done (believe it or not) using a simple assignment statement of the form BUFFER(I) = STRING i.e. a real variable was used to store a string. No compiler complaints, and it worked! (There may be hundreds of these strings, and the idea was to "dynamically" store them along with thousands of other numeric variables in a single buffer array dimensioned to the largest possible value.)

The only way I can think of to perform this today is with the combination

[fortran]

REAL(4) BUFFER(1000)

CHARACTER(4) STRING

REAL(4) RSTRING

EQUIVALENCE (STRING, RSTRING)

! Repeat following as needed to load all values of STRING into buffer

BUFFER(I) = RSTRING

[/fortran]

But I am leary of this because EQUIVALENCE is frowned upon nowadays, and in fact extinction is threatened. Can anybody recommend how to salvage this old code without massive rearranging of the storage scheme?

mecej4 · ‎07-27-2013

Fortran 66 did not have character variables, and what you described was a fairly routine way of handling character strings in that language. You can replace all real variables that were used only to hold character strings (such as BUFFER) by CHARACTER(len=4) variables, following which there will be no need for any EQUIVALENCE statements.

If the old program contains character variables in common blocks, and the same block appears with different types of variables in different subprograms, however, you are in for "interesting times".

andrew_4619 · ‎07-27-2013

I guess the real problem is not storing the strings it is how the program accesses the large real array to use them. This is what you need to explain/understand i think. Is there for example another integer array that is used as an index?

mecej4 · ‎07-27-2013

I don't think that I want to forget that this is the year 2013 and run the risk of being burned at the stake for writing strings into real arrays. Besides, since

No compiler complaints, and it worked!

why do you need to change anything if you are just interested in running the code?

It is probably enough to change all those REAL(4) variables in your old program that are used to hold strings to CHARACTER(len=4) variables and remove all the equivalence statements. Here are a couple of lines of code similar in intent to what you showed:

[fortran]

CHARACTER*4 BUFFER(1000)
! Repeat following as needed to load all values of STRING into buffer
DO i=1,1000
write(BUF(i),'(I4)')i+3
END DO[/fortran]

dboggs · ‎07-27-2013

mecej4: You're missing the point that the strings need to be stored in a real buffer array.

app4619: Yes, there are integer pointer variables (needn't be an array) that give the address within BUFFER where a certain group of other variables start. The pointer values are set at run time. It's not an integer array in my case, but I realize that as little as I explained the problem you might think it would be. I thought that part irrelevant.

An example may help. The array BUFFER stores first all of the character strings (4 bytes, * of them, maybe in a character array), followed by four different real arrays NORTH(*), SOUTH(*), EAST(*), WEST(*). The value of all the different * is determined at run time. For a particular run they happen to all be 200. As long as BUFFER is dimensioned to at least 1000, it can hold all of this data. (In another run the sizes will be different, say 500 characters, 500 NORTH variables, and 0 others). The pointer variables will be calculated as ISTRPTR = 1, INORTHPTR = 201, ISOUTHPTR = 401, IEASTPTR = 601, and IWESTPTR = 801). Typically, calculations with each type of variable will be performed in a subroutine, for example

[fortran]

CALL CHARVALS (BUFFER(1))

CALL NORTHVALS (BUFFER(INORTHPTR))

...

SUBROUTINE CHARCALCS (CHARDATA)

REAL*4 CHARDATA(*)

CHARDATA(1) = 'abc1'

CHARDATA(2) = 'abc2''

...

END

SUBROUTINE NORTHCALCS (NORTHDATA)

REAL*4 NORTHDATA(*)

NORTHDATA(1) = 1.234

...

END

[/fortran]

In this way, a single large storage array could "dynamically allocate" storage for all kinds of data, and of mixed type character, real, integer, and even complex. (Actually this was a fairly standard technique in the F77 days before "real" dynamic allocation came along. It was done originally, without the need of EQUIVALENCE, because the statement CHARDATA(1) = 'abc1' worked without error. Today, I only know how to accomplish it using equivalence.

I hope this helps.

mecej4 · ‎07-27-2013

dboggs wrote:
mecej4: You're missing the point that the strings need to be stored in a real buffer array

In Fortran 66, they did, because that Fortran had no CHARACTER type variables. As of today, that reason is invalid and, for the kind of string shuffling that you describe, you can simply declare the variable as CHARACTER*4 and there is no need for the EQUIVALENCE statements. In fact, using EQUIVALENCE statements makes your code invalid under a strict interpretation of the standard:

5.7.1.3 Equivalence of default character objects
1 A default character data object shall not be equivalenced to an object that is not default character and not of a
character sequence type.

The following two versions of a subroutine result in identical object code when compiled with IFort.

Version-1[fortran]

SUBROUTINE CHARCALCS (CHARDATA)
REAL*4 CHARDATA(*)
CHARDATA(1) = 'abc1'
CHARDATA(2) = 'abc2'
RETURN
END[/fortran]

Version-2:

[fortran]

SUBROUTINE CHARCALCS (CHARDATA)
CHARACTER*4 CHARDATA(*)
CHARDATA(1) = 'abc1'
CHARDATA(2) = 'abc2'
RETURN
END[/fortran]

Steven_L_Intel1 · ‎07-27-2013

I strongly recommend against using REAL variables to store non-REAL data, especially if you're doing REAL assignments. The reason is that some bit patterns that look like "Signaling NaNs", or in some cases, denormalzed values, will get changed on assignment by the processor. It is much safer to use INTEGER for this if you don't want to rewrite the code to avoid the mixed types entirely.

John_Campbell · ‎07-27-2013

You should look at the TRANSFER function, as this should be able to do what you want; copy character*4 to and from a real*4 variable.
Defining a derived type structure would be a more elegant solution, but that is a lot of work.
Hopefully the code is not old CDC fortran, which did not have 8-bit characters.

John

andrew_4619 · ‎07-28-2013

So if I am reading that correctly you have INORTHPTR-1 ( or INORTHPTR - ISTRPTR) sets of 4 character strings, that is how the data is stored in the real array but how is it accessed? It must be accesed by other integer variables that take values beweeen ISTRPTR and INORTHPTR - 1 in which case an array of character*4 could be accessed in the same way? Depending on how the 'real' characters are then processes you may need to mod the code or create a function for accessing the strings that does some manipulation to make the data compatible with the rest of the code.

dboggs · ‎07-29-2013

I thank everyone for this discussion. I think I have absorbed it all now. A confusing issue is that character constants are treated differently from character variables. I can summarize all observations concisely with the following program:

[fortran]

PROGRAM STORE_STRING_AS_NUMBER

IMPLICIT NONE

CHARACTER(4) :: $CHAR

REAL(4) :: RCHAR

INTEGER(4) :: ICHAR

$CHAR = 'abcd' ! OK of course

RCHAR = 'xyza' ! OK surprizing!

!RCHAR = $CHAR ! error #6303 "Assignment invalid for data types"

ICHAR = 'xyza' ! OK surprizing!

!ICHAR = $CHAR ! error #6303

! Conclusion: string constant can be assigned to a real or integer variable,

! but a string variable cannot.

! But use of a real is unreliable as explained by S.Lionel.

RCHAR = TRANSFER ($CHAR, 0.0) ! OK

PRINT*,'RCHAR W/ * = ', RCHAR ! prints 1.67...E+22

PRINT'(A, A4)','RCHAR w/ A4 = ', RCHAR ! prints abcd

ICHAR = TRANSFER ($CHAR, 0_4) ! OK

PRINT*,'ICHAR w/ * = ', ICHAR ! prints 1684234849

PRINT'(A, A4)','ICHAR w/ A4 = ', ICHAR ! prints abcd

! Conclusion: a string variable can be "transferred" to a real or integer variable.

! Is use of a real unreliable?

END

[/fortran]

So, I cannot store the string data by using EQUIVALENCE as mecej4 explained.

It can be done using TRANSFER (thanks John, I NEVER would have found that myself).

And thanks Steve, for recommending that only INTEGER variables should be used for this.

Finally, I am surprised that a real or integer variable can be printed using A format. Is this standard? Probably better to TRANSFER the data from integer back to character before using.

Steven_L_Intel1 · ‎07-29-2013

In Fortran 66, using an INTEGER or REAL with A format was your only option. It is not standard in Fortran 2008.

mecej4 · ‎07-29-2013

I think that what bothers the O.P. is that the code is not correct in either standard Fortran-66 (it had no character type) or in standard Fortran-200X (strings may not be assigned to REAL variables). That Intel Fortran makes the code work correctly by using extensions to the standard may be a mixed blessing to him.

John_Campbell · ‎07-29-2013

mecej4,

I do not agree with your description of the problem as requiring extensions to the standard. The code can work correctly if you use the TRANSFER function, rather than the EQUIVALENCE. The preferable solution would be to convert the data structure to a derived type. The reason for this approach was for memory management and to provide a simplified argument list to subroutines in FTN and F77 codes. I have worked on pipe analysis software which stored node coordinates, restraint and label in a single 2D array. This software works with many compilers. The stored character label for the node is stored and used for graphical or report labelling. In some cases, it can be used for a search for the selected node. I can not remember a fail on a .eq. test. It would be interesting to see what the A4 equivalent of "Signaling NaNs" values would look like, as I suspect they are unlikely label values.

John

dboggs · ‎07-29-2013

mecej4, please keep this in mind. I never implied that the technique I am seeking would be a good way of solving the problem, IF I were starting from scratch today. What I have is an existing code that I wish to salvage in the simplest possible way in order to avoid a substantial rewrite and all of the testing/debugging that goes with it. To this end, it appears that TRANSFER is a good way to do it. What I am still not clear on is whether transfering character data to a real variable is safe.

Thus the "need to store character data as numeric," which you claim does not exist, does exist due to the need for longevity in established code. Fortran has been a champion in this need, as it places a high priority in backwards compatibility. I'm just a little surprised that the backwards compatibility in this case is available only via a fairly obscure TRANSFER function.

To older FORTRAN programmers, there were many tricks (or at least they would be viewed so today) for accomplishing things using a "minimalist" language. The technique of memory allocation and mixed storage as John describes is an excellent example, and I am very familiar with it. Yes modern languages, which are bloated by comparison, would resort to OOP or at least derived structures to accomplish many of these tricks. But I suspect the old way would execute much faster, and--believe it or not--the code would be just as clear to those who used it often as today's use of derived structures and "objects everywhere."

mecej4 · ‎07-30-2013

Dboggs: I am an older Fortran programmer (started in 1968), so you are preaching to the choir.

If you have any concern about putting in much work into "upgrading" the old code and making the upgraded code dependent on non-standard extensions that would tether you to Intel Fortran, compile some portions with the /stand:f95 or /stand:f03 compiler option. For the "STORE_STRING_AS_NUMBER" program that you posted above, with /stand:f03 Intel Fortran says for Line-7 (and, similarly, for Line-9):[bash]

boggs.f90(7): warning #6931: Fortran 2003 does not allow this assignment statement. ['xy
za']
RCHAR = 'xyza' ! OK surprizing!
--------^

[/bash] This warning message brings to the foreground the dependence on an to the Fortran standard.

mecej4 · ‎07-30-2013

John Campbell wrote:

I can not remember a fail on a .eq. test. It would be interesting to see what the A4 equivalent of "Signaling NaNs" values would look like, as I suspect they are unlikely label values.

Here is an example. Compile and run with (i) default options, assuming those do not include/imply /fpe:0, and (ii) /fpe:0. Any 4-byte IEEE float between 0x7f800001 and 0x7fbfffff is an sNaN, and such a value is entered as a string.in Line-4/5 and assigned to RCHAR; Note that an editor may not show Line-5 correctly. With /fpe:0, a floating point exception occurs. With either option, the real-real comparison produces "UNEQUAL".

[fortran]

PROGRAM STORE_STRING_AS_NUMBER
IMPLICIT NONE
REAL(4) :: RCHAR,X

RCHAR = ''
PRINT*,'RCHAR W/ * = ', RCHAR
PRINT'(A, A4)','RCHAR w/ A4 = ', RCHAR
!
X=RCHAR
IF(X.EQ.RCHAR)THEN
write(*,*)'EQUAL'
ELSE
write(*,*)'UNEQUAL'
ENDIF
!
END[/fortran]

John_Campbell · ‎07-30-2013

mecej4,

Thanks for identifying an A4 character string that will fail both RCHAR = and TRANSFER. I modified your program, by first supplying a normal string, where parity is not set and <127> is not used. I am not sure of what attributes are required to fail, but these are not in character labels I would typically use.

I am attaching the program I adapted, as I do not know what would happen if I pasted this one !!

john

dboggs · ‎07-30-2013

And just for the record: If you use TRANSFER to store character data in a numeric variable, and you need to access this using standard Fortran (i.e. not using an internal WRITE to a character variable), you can transfer the numeric variable back to another character variable as in the following example:

[fortran]

CHARACTER(4) :: $CHARIN, $CHAROUT

REAL(4) :: RCHAR

$CHARIN = 'abcd'

RCHAR = TRANSFER ($CHARIN, 0.0)

$CHAROUT = TRANSFER (RCHAR, 'xxxx') ! 'xxxx' is the mold for a general 4-character string

PRINT '(A, A)' 'The retransferred value is ', $CHAROUT ! prints 'abcd'

[/fortran]

I'm not sure what to make of the exception example from mecej4, and whether transferring to/from an integer variable is any safer than transferring to/from a real variable.

John_Campbell · ‎07-31-2013

@dboggs,

The use of TRANSFER makes the code more standard. I think most of the original code would work.

My understanding of what is required is:
for storing : I'd use TRANSFER, if the compiler complains.
for printing : You can use A4 as a valid format for a real.
for comparing, either use TRANSFER to extract the string, or TRANSFER to store text into another REAL*4 variable. Both variables will probably be real*4 anyway.
for computation : this should never happen, so don't worry about it ( check possible initialisation with ' ' or 0 )

Converting from REAL*4 storage to INTEGER*4 or CHARACTER*4 storage depends on the way the rest of the storing array is used. If it is only used for character, then the changes to the subroutine argument declarations might be easy. As the likely use is only for storage and printing I would suspect, if it works, the more changes you make, the greater the chance to introduce a bug. Don't change it !

There are other things that you could change or check with this old code, such as use of generic intrinsic function names and checking for 1-pass DO loops. Don't listen to the pureists that insist to change to standard conforming F08, as you are likely to have a new conforming program that does not get the correct results.

You might be intersted in a document (attached) I wrote in about 1980, when converting old CDC FTN code to 32-bit Pr1me/Vax F77.
The best advice is every key stroke is a potential new bug.

John

dboggs · ‎07-31-2013

Thank you John. I agree with everything.

For this particular program, in the end it was simple enough to remove the character strings from the real allocation array and simply store them in a "just big enough" character array. But we will face other programs where things may be different. And there is always the principle of the thing...