Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

MARTIN 1966 Fortran Code

JohnNichols
Valued Contributor III
13,595 Views

In the twitter account of the guru who owns the degenerateconic.com website, he refers to a paper published in 1966 by Martin, I assume the precursor to Marriet Martin. It includes a complete computer program in Fortran for determining space trajectories for vehicles, this is the oldest Fortran I have ever seen, I have worked on a lot of 1968 code from USC Powell's famous group, but not this old.  

It is interesting to read the problems they encountered. 

I did use punch cards in 1976 at Newcastle University and the long slog to the computing center. 

I enclose the paper for anyone who wants a historical read. 

This is 20 years before the PC. 

John

0 Kudos
55 Replies
jimdempseyatthecove
Honored Contributor III
3,157 Views

Additional historical note that is pertinent today.

Recall that the 60's vintage computers such as the IBM 709x, Digital Equipment Corporation PDP-10, and other processors used 36-bit single precision floating point and 72-bit double precision floating point... though the bit field size may have differed. For example the IBM 7090/94 had 1 bit sign, 8-bits exponent (excess 127), and 27-bit mantissa. The Digital Equipment Corporation PDP-8 with 12-bit word used 12-bit exponent, 1-bit sign, 23-bit mantissa.

These systems pre-date any concept of "byte", which came in after the advent of the 8-bit processors.

Therefore there was no contextual meaning for REAL(4) or REAL(8)... Four what? Eight what?

Anyone that has been on the Fortran forum will note that Dr. Fortran (Steve Lionel), who is the Fortran cum laude in this neighborhood, has been stressing us programmers .NOT. to us REAL(4) and REAL(8) or 123.456_8, but rather use a size suitable for a desired precision using SELECTED_REAL_KIND or IEEE_SELECTED_REAL_KIND.

Yes, this is a bit awkward to someone who's accustomed to using REAL(4), (8) or (16), but we programmers should learn from the past. At a point in the past, floating point precision was not expressed in terms of bytes. In the future, who is to say that floating point formats will keep an identity related to the byte. While I do not anticipate a resurgence of 36-bit word size processors, consider what might happen to floating point formats with the introduction of quantum computers.... IMHO there will be no dimensional equivalent to an 8-bit BYTE, 16-bit WORD, 32-bit DWORD, 64-bit QWORD, 128-bit DQWORD.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,157 Views

BTW these systems typically used the term character, but were of differing formats:

36-bit word representations:

Six 6-bit characters
Five 7-bit characters (plus spare bit)
Four 8-bit characters (plus four spare bits)
Four 9-bit characters (8-bit + parity)
Three 12-bit Hollerith characters

Jim Dempsey

 

0 Kudos
Steve_Lionel
Honored Contributor III
3,157 Views

Pedantic note: The 4 and 8 in REAL(4) and REAL(8) are kind numbers, not byte sizes. Most but not all developers (NAG being a prominent exception) chose kind numbers (mostly) matching byte sizes with their Fortran 90 and later compilers, but one could argue that this just confused users more. (I say mostly because COMPLEX(4) is like COMPLEX*8).

I don't think there are any processors that support Fortran 90 or later which don't have 8-bit bytes. But the Fortran standard is mostly agnostic to the size of a numeric or character storage unit. It's mainly in C interop where it becomes more evident.

0 Kudos
GVautier
New Contributor III
3,157 Views

jimdempseyatthecove (Blackbelt) wrote:

BTW these systems typically used the term character, but were of differing formats:

36-bit word representations:

Six 6-bit characters

Considering the use of only 6 characters "strings" in the program it confirms that the target system was a 36-bit word system (integer and single precision floating point numbers) and 72 bits for double precision.

I had to change some of the variables to character*6 to allow the program to be compiled successfully witfh CVF. Watcom Fortran is more permissive.

 

0 Kudos
JohnNichols
Valued Contributor III
3,157 Views

Dear God:

I apologize for creating a nightmare.

She may forgive me. 

PS: it is easy to create a base module that has all the definitions you need as mecej4 showed me in Magni -- it is on the forum somewhere. Works a treat. and removes real 4 and real 8 etc.  

 

Re JUMPS -- yes it can be a lot fun removing those --

Regards

John

0 Kudos
JohnNichols
Valued Contributor III
3,157 Views

Steve Lionel (Ret.) (Blackbelt) wrote:

Pedantic note: The 4 and 8 in REAL(4) and REAL(8) are kind numbers, not byte sizes. Most but not all developers (NAG being a prominent exception) chose kind numbers (mostly) matching byte sizes with their Fortran 90 and later compilers, but one could argue that this just confused users more. (I say mostly because COMPLEX(4) is like COMPLEX*8).

I don't think there are any processors that support Fortran 90 or later which don't have 8-bit bytes. But the Fortran standard is mostly agnostic to the size of a numeric or character storage unit. It's mainly in C interop where it becomes more evident.

Steve: You are not pedantic - merely brilliant 

0 Kudos
GVautier
New Contributor III
3,157 Views

Nichols, John wrote:

Dear God:

I apologize for creating a nightmare.

She may forgive me. 

PS: it is easy to create a base module that has all the definitions you need as mecej4 showed me in Magni -- it is on the forum somewhere. Works a treat. and removes real 4 and real 8 etc.  

 

Re JUMPS -- yes it can be a lot fun removing those --

Regards

John

It"s not a nightmare for me. It's the kind of job I prefer.

I have already replaced all labeled do loops and if goto's. The next steps are removing goto's of main and Vector subroutine which is not the most simple, replace arrays with structures and finally create appropriate modules.

0 Kudos
JohnNichols
Valued Contributor III
3,157 Views

Dear Gilles:

I was referring more to the complete length of this forum thread, I did not expect the code to take off -- it is fun to watch 

I understand your joy in doing this sort of work. 

Can you replicate the drawing?

John

0 Kudos
GVautier
New Contributor III
3,157 Views

I think about making the graphics even 3d .

But I will not use a HP98xx plotter

0 Kudos
GVautier
New Contributor III
3,157 Views

I found that comment in the conclusion of the paper :

"The sample circumlunar trajectory included in this report gives the spacecraft position accurate to within 0.02 naut mi at t = 70 hr (approximately 0.33 h before pericynthion) in 160 sec on an IBM 7094 digital computer."

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,157 Views

>>replace arrays with structures

Performance may suffer....

Jim Dempsey

0 Kudos
GVautier
New Contributor III
3,157 Views

Performance is no more an issue. The execution time is now around 0.10 second.

But readability is.

0 Kudos
andrew_4619
Honored Contributor III
3,157 Views

In the mid 60s a 7094 would have cost about $3.5 million so maybe around $30 million in todays money. I am guessing Gilles is using something a bit cheaper to get his x1600 speed increase!

0 Kudos
JohnNichols
Valued Contributor III
3,157 Views

I think I remember using a DMP Plotter - like a 50 with four pens, but even the internet has trouble with this one. 

A Raspberry PI would be able to run it faster 

0 Kudos
GVautier
New Contributor III
3,157 Views

andrew_4619 wrote:

In the mid 60s a 7094 would have cost about $3.5 million so maybe around $30 million in todays money. I am guessing Gilles is using something a bit cheaper to get his x1600 speed increase!

It's just an average pro x64 desktop.

I don't understand why structures would be slower than arrays. If I have :

array V(10)

structure S={V1,V2,...,v10}

For me addressing the fifth element has the same cost for both :

array loc(V)+5*sizeof(V) may be optimized at compilation time

structure loc(S)+offset(V5)

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,157 Views

Modern (last 15 years or so) have introduced a class of instructions called Single Instruction Multiple Data (SIMD). In general terms a SIMD instruction (add, subtract, multiply, divide, sine, cosine, square root, etc...) can operate on multiple operands, provided they are adjacent in memory. For example, a CPU with AVX512 can perform a single operator on (up to) 64 bytes, 32 words, 16 dwords (float), 8 qwords (double), 4 dqwords, ...


Consider a particle experiment. The particles have multiple properties: mass, position X, position Y, position Z, velocity X, velocity Y, velocity Z, acceleration X, acceleration Y, acceleration Z, etc...


You can (OOP programming aka Array of Structures/AoS) define an object type, say Particle_t that contains those properties, then construct an array of Particle_t's.


.OR.


You can (Structure of Arrays/SoA) declare property arrays of size of number of particles, one for each property.


Assume the particles are under the influence of a central force, gravity. Part of the calculation for the motion of the particles is determining the distance from the central force.


AoS:

type Particle_t
  real :: X
  real :: Y
  real :: Z
  ...
end type Particle_t

type (Particle_t), allocatable :: Patricles(:)
type (Particle_t) :: Planet
...
R = SQRT( (Particles(I)%X-Planet%X)**2 + (Particles(I)%Y-Planet%Y)**2 + (Particles(I)%Z-Planet%Z)**2)


To produce R


memory fetch Particles(I)%X
memory fetch Planet%X
subtract
square
memory fetch Particles(I)%Y
memory fetch Planet%Y
subtract
square
memory fetch Particles(I)%Z
memory fetch Planet%Z
subtract
square
add the X to Y
add to z
sqrt
store R


The above is performed for each particle


SoA (SIMD)

type Particles_t
  real, allocatable :: X(:)
  real, allocatable :: Y(:)
  real, allocatable :: Z(:)
  ...
end type Particles_t

type(Particles_t) :: Particles
...

To produce sixteen R's (pseudo Fortran)


R(1:16) = SQRT(Particles%X(I:I+15) - Planet%X)**2 + Particles%Y(I:I+15) - Planet%Y)**2 + Particles%Z(I:I+15) - Planet%Z)**2)


memory fetch Particles%X(I:I+15)
memory fetch Planet%X
broadcast propigate scalar across simd register
subtract
square
memory fetch Particles%Y(I:I+15)
memory fetch Planet%Y
broadcast propigate scalar across simd register
subtract
square
memory fetch Particles%Z(I:I+15)
memory fetch Planet%Z
broadcast propigate scalar across simd register
subtract
square
add the X to Y
add to z
sqrt
store R(1:16)


Note, the loop that performs this will have compiler optimizations such that the Planet position (or simd vectors of position) will be maintained in a register. Thus producing a loop statement of something like:


AoS:

memory fetch Particles(I)%X
subtract registered Planet%X
square
memory fetch Particles(I)%Y
subtract registered Planet%Y
square
memory fetch Particles(I)%Z
subtract registered Planet%Z
square
add the X to Y
add to z
sqrt
store R

SoA:


memory fetch Particles%X(I:I+15)
subtract simd registered and broadcast Planet%X
square
memory fetch Particles%Y(I:I+15)
subtract simd registered and broadcast Planet%Y
square
memory fetch Particles%Z(I:I+15)
subtract simd registered and broadcast Planet%Z
square
add the X to Y
add to z
sqrt
store R(1:16)


In essence the AoS and SoA methods have the same number of instructions...
... however AoS is producing one R value, while SoA is producing sixteen R values

** The execution time is not necessarily 16x different due to memory and cache latency issues.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,157 Views

I should comment that the pseudo Fortran code

R(1:16) = SQRT(Particles%X(I:I+15) - Planet%X)**2 + Particles%Y(I:I+15) - Planet%Y)**2 + Particles%Z(I:I+15) - Planet%Z)**2)

Is not written that way in your source code, rather the compiler optimizations will perform the above within a loop with statement of

R = SQRT(Particles%X(I) - Planet%X)**2 + Particles%Y(I) - Planet%Y)**2 + Particles%Z(I) - Planet%Z)**2)

IOW when R has local scope and the compiler can determine the loop is incremental, and properties are contiguous, then it will auto vectorize the code (to the extent of the vectorization options given at compile time).

Note, other than the type declarations (and allocations), the stylization difference in source code is... well style

DO I=1,nParticles
  R = SQRT( (Particles(I)%X-Planet%X)**2 + (Particles(I)%Y-Planet%Y)**2 + (Particles(I)%Z-Planet%Z)**2)

verses

DO I=1,nParticles
  R = SQRT(Particles%X(I) - Planet%X)**2 + Particles%Y(I) - Planet%Y)**2 + Particles%Z(I) - Planet%Z)**2)

Performance impact is quite different.

Jim Dempsey

0 Kudos
GVautier
New Contributor III
3,157 Views

Thank you for the explanations.

But there is a compromise to find between performance and comprehension of the code.

Originally, I think that grouping all variables in an array simplified the definition of the common block but the code is almost impossible to understand without the flow charts and the data schemes.

 

0 Kudos
JohnNichols
Valued Contributor III
3,157 Views

Vautier, Gilles wrote:

Thank you for the explanations.

But there is a compromise to find between performance and comprehension of the code.

Originally, I think that grouping all variables in an array simplified the definition of the common block but the code is almost impossible to understand without the flow charts and the data schemes.

 

True:  But I like the technique mecej4 taught me that you create objects and manipulate the objects, he showed me this in Magni - code on this site - and it really simplifies coding a lot. 

of course one learns the same thing in LISP -- but mecej4 method is beautiful in the simplicity 

There is always a faster computer next year -- 

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,170 Views

>>There is always a faster computer next year

That places you in the position of always waiting until next year. (being behind one year)

IOW the complexity (value) of a program grows to meet the computational capability today.

Jim Dempsey

0 Kudos
JohnNichols
Valued Contributor III
3,170 Views

Alan Perlis said it best -- 3 is so true -- we use a lot of Syntactic sugar - aka Python is a classic example - writing in Python is like petting a rattlesnake - a bad idea from every viewpoint  - Fortran is not too bad - but a few advancements would be nice  - LISP is best 

C# is just pure spun sugar 

117 is great

 

One man's constant is another man's variable.

3: Syntactic sugar causes cancer of the semi-colons.

8: A programming language is low level when its programs require attention to the irrelevant.

11: If you have a procedure with 10 parameters, you probably missed some.

16: Every program has (at least) two purposes: the one for which it was written and another for which it wasn't.

19: A language that doesn't affect the way you think about programming, is not worth knowing.

31: Simplicity does not precede complexity, but follows it.

39: A picture is worth 10K words - but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures.

41: Some programming languages manage to absorb change, but withstand progress.

42: You can measure a programmer's perspective by noting his attitude on the continuing vitality of FORTRAN.

55: LISP programmers know the value of everything and the cost of nothing.

57: It is easier to change the specification to fit the program than vice versa.

58: Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it.

59: In English every word can be verbed. Would that it were so in our programming languages.

64: Often it is means that justify ends: Goals advance technique and technique survives even when goal structures crumble.

75: The computing field is always in need of new cliches: Banality sooths our nerves.

79: A year spent in artificial intelligence is enough to make one believe in God.

80: Prolonged contact with the computer turns mathematicians into clerks and vice versa.

95: Don't have good ideas if you aren't willing to be responsible for them.

101 Dealing with failure is easy: Work hard to improve. Success is also easy to handle: You've solved the wrong problem. Work hard to improve.

116: You think you know when you learn, are more sure when you can write, even more when you can teach, but certain when you can program.

117: It goes against the grain of modern education to teach children to program. What fun is there to making plans, acquiring discipline in organizing thoughts, devoting attention to detail and, learning to be self-critical?

0 Kudos
Reply