CALL for attention and discussion: recent paper comparing an equation solver implementation in Fortran vs C++ vs Python

FortranFan · ‎04-08-2014

Fortran enthusiasts may have seen and/or be interested in the findings and observations in a recent open literature publication by Arabas, et al. at the Institute of Geophysics at the University of Warsaw. Here’s the link to this paper: http://arxiv.org/abs/1301.1334

It is very interesting to read Arabas et al. find “that C++11/Blitz++, Python/NumPy and Fortran 2008 provide comparable functionalities in terms of matching the blackboard abstractions within the program code. Taking into account solely the part of code representing particular formulæ (e.g. listings C.21, P.17, F.20 and equation 4) all three languages allow to match (or surpass) LATEX in its brevity of formula translation syntax. All three languages were shown to be capable of providing mechanisms to compactly represent such abstractions as: • loop-free array arithmetics; • definitions of functions returning array-valued expressions; • permutations of array indices allowing dimensionindependent definitions of unctions (see e.g. listings C.12 and C.13, P.10 and P.11, F.11 and F.12); • fractional indexing of arrays corresponding to employment of a staggered grid.”

The authors make pointed comments about some of the limitations they faced with Fortran, “Three issues specific to Fortran that resulted in employment of a more repetitive or cumbersome syntax than in C++ or Python were observed: • Fortran does not feature a mechanism allowing to reuse a single piece of code (algorithm) with different data types (compare e.g. listings C.6, P.5 and F.4) such as templates in C++ and the so-called duck typing in Python; • Fortran does not allow function calls to appear on the left hand side of assignment (see e.g. how the ptr pointers were used as a workaround in the cyclic_fill_halos method in listing F.8); • Fortran lacks support for arrays of arrays (cf. Sect. 2.2).”

In addition, Fortran users won’t be surprised to read the following, but would do well to make a note of it: “Fortran is a domain-specific language while Python and C++ are general-purpose languages with disproportionately larger users’ communities. The OOP features of Fortran have not gained wide popularity among users. Fortran is no longer routinely taught at the universities [28], in contrast to C++ and Python. An example of decreasing popularity of Fortran in academia is the discontinuation of Fortran printed editions of the ”Numerical Recipes” series of Press et al.”

The paper points out a few other shortcomings in working with Fortran: " The built-in standard libraries of Python and C++ are richer than those of Fortran and offer versatile data types, collections of algorithms and facilities for interaction with host operating system. In the authors’ experience, the small popularity of OOP techniques among Fortran users is reflected in the library designs (including the Fortran’s built-in library routines). What makes correct use of external libraries more difficult with Fortran is the lack of standard exception handling mechanism, a feature long and much requested by the numerical community"

But the least surprising finding for die-hard Fortran coders will be, “The performance evaluation revealed that: • the Fortran set-up offered shortest execution times,”

What are your take-aways from this study? I request you all to post your comments.

jimdempseyatthecove · ‎04-09-2014

Let's look at the complaints:

1) ... such as templates ...

I do miss not having templates. You can implement template-like behavior with a preprocessor such as Blockit classes in PyF95 (http://sourceforge.net/apps/mediawiki/blockit/index.php?title=PyF95%2B%2B).

I think a better way would be to extend the generic interface to include a new "type" (cover your ears while the Fortran committee hollars "Warning Will Rogers - new type, type"). Something like:

INTERFACE RADIUS
    FUNCTION RADIUSauto(X, Y, Z) RESULT(R)
    AUTO<T> :: X,Y,Z,R
    R = SQRT(X**2+Y**2+Z**2) ! Statement in interface block like statement in header
  END FUNCTION RADIUSauto
END INTERFACE

This will have to be thought out a bit more than the 2 minutes I spent on it. AUTO<T> where T is programmers choice of arbitrary type moniker. The above states that the result type of R matches the input types of X, Y, Z..

2) Fortran does not allow function calls to appear on the left hand side of assignment.

Fortran is a "pass by reference" type of language, you'd think that they would accept a syntax that includes "return by reference".

3) Fortran lacks support for arrays of arrays

Am I missing something, doesn't adding a subscript do that.

Jim Dempsey

Izaak_Beekman · ‎04-09-2014

This Article has been sitting on my desk for the past two weeks and I have yet to read it, but I've been meaning to. However, comp.lang.fortran might be a more appropriate forum for discussion, as it is not specific to Intel's implementation of the language.

FortranFan · ‎04-09-2014

jimdempseyatthecove wrote:

Let's look at the complaints:

1) ... such as templates ...

I do miss not having templates. You can implement template-like behavior with a preprocessor such as Blockit classes in PyF95 (http://sourceforge.net/apps/mediawiki/blockit/index.php?title=PyF95%2B%2B).

I think a better way would be to extend the generic interface to include a new "type" (cover your ears while the Fortran committee hollars "Warning Will Rogers - new type, type"). Something like:
INTERFACE RADIUS
    FUNCTION RADIUSauto(X, Y, Z) RESULT(R)
    AUTO<T> :: X,Y,Z,R
    R = SQRT(X**2+Y**2+Z**2) ! Statement in interface block like statement in header
  END FUNCTION RADIUSauto
END INTERFACE
This will have to be thought out a bit more than the 2 minutes I spent on it. AUTO<T> where T is programmers choice of arbitrary type moniker. The above states that the result type of R matches the input types of X, Y, Z..

2) Fortran does not allow function calls to appear on the left hand side of assignment.

Fortran is a "pass by reference" type of language, you'd think that they would accept a syntax that includes "return by reference".

3) Fortran lacks support for arrays of arrays

Am I missing something, doesn't adding a subscript do that.

Jim Dempsey

Jim,

Interesting idea re: templates - I hope sooner or later the Fortran standards body will come around to accepting the concept, but should that happen, compiler implementation won't occur during my working career! See how long it is taking for parameterized derived types.

Back to the paper, I agree with you: of the 3 main complaints, only the first one on templates feels like a genuine gap.

Re: the second complaint on "Fortran does not allow function calls to appear on the left hand side of assignment", I'm not sure I understand their use case completely, but you may know Fortran 2008 standard includes the feature explained by John Reid in his "The new features of Fortran 2008" paper:

6.5     Pointer Functions

A reference to a pointer function is treated as a variable and is permitted in any variable definition context. For example, this function might calculate where to store values depending on a key

[fortran]

   function storage(key) result(loc)

      integer, intent(in) :: key

      real, pointer :: loc

      loc=>...

   end function

[/fortran]

which would allow a value to be set thus:

[fortran]

   storage(5)=0.5

[/fortran]

Such a feature has not been implemented in any compiler other than that by Cray. I wonder if this is what the authors are referrring to and if so, it is only a matter of time.

Re: the last major complaint about array of arrays, I think it shows the C++ style of thinking by the lead author, whose PhD dissertation work led mostly to this paper and who indicates therein his strong preference for C++. Unless I'm missing something too, native array features in Fortran obviates such a need.

Another valid complaint in this paper and a few others is the lack of structured exception handling in Fortran similar to try/catch/finally in C++ - I don't know if this will be incorporated in any future Fortran standard though.

Thanks,

FortranFan · ‎04-09-2014

Izaak Beekman wrote:

... However, comp.lang.fortran might be a more appropriate forum for discussion, as it is not specific to Intel's implementation of the language.

Thanks, I've been thinking about posting the same on c.l.f forum. But I'm generally very disappointed by the quality of discussions on that forum; other than some "how to" or "what is wrong with this code" type of questions, the comments tend to quickly degenerate to a "fish market" level and it appears most of the frequent posters are no longer actively coding, which I feel limits the perspective. I'll wait and see how this forum responds to a general Fortran discussion topic.

Also, I've been seeing cases here and there of how well gfortran performs relative to Intel Fortran on many of the Fortran 2003 and 2008 features, particularly OOP implementations. So I was wondering whether someone at Intel might be stirred into doing some performance analysis, use the code in this paper as one of their test cases, and report some results here or somewhere in the open (white paper?) perhaps even leading to further improvements in Intel compiler optimizations. It'll be great to see more of their competitive juices flowing..

Steven_L_Intel1 · ‎04-09-2014

Fortran DOES allow functions on the left of an assignment, if the function returns a data pointer. This is a Fortran 2008 feature not yet widely implemented, including in ifort, so I'm not astonished that the author was unaware of it. I did get the feeling that the author didn't really know Fortran well and allowed personal bias to insert itself into the conclusions. To say that Fortran is "domain specific" is a bogus argument.

FortranFan · ‎04-09-2014

Steve Lionel (Intel) wrote:

... To say that Fortran is "domain specific" is a bogus argument.

Steve,

I largely agree with your comments, but to give the authors some leeway, hasn't the use of Fortran become essentially domain dependent? If someone wants to create a new math library, a new OS, a new word-processor, a new web-browser, a new social media app, or whatever software system they can think of, they might seriously consider C++. But Fortran would only get considered now for a few specific applications, if at all. Isn't that the ground reality no matter how much we might dislike or bemoan it?

Steven_L_Intel1 · ‎04-09-2014

My perspective is that choice of implementation language is primarily driven by programmer preferences and organizational politics, not a rational decision as to which language is "best". The article does have a valid point that there are often multiple plausible choices, and which you pick depends on external factors (what do the programmers and maintainers know, is there some existing code being reused, etc.) I am not a "language bigot" - no one language is best for everything. But you CAN do a lot in Fortran if you approach it right. It is also important to understand that you are not restricted to a single language (at least not with Fortran in the mix.)

FortranFan · ‎04-10-2014

Steve, JIm:

Thanks again for your comments.

By the way, did you get a chance to look at figures in the paper? Specifically, figures 3, 4, and 5 in the paper? Do the performance results make sense to you, especially how C++ and Python show significant improvement as problem size increases whereas Fortran goes the other way, albeit with a smaller, absolute rate of change? This would indicate some sort of scalability benefits in the other two implementations that Fortran doesn't have - but why would that be? Appreciate greatly if you can offer some insight.

TimP · ‎04-10-2014

The graphs, and comments in the paper, indicate that even C++ incurs more overhead than Fortran in calling libraries and starting up functions and loops. g++ vectorization may be as good as gfortran, giving similar asymptotic performance for some of the quoted cases.

The authors make a comment about preferring OpenMP compilers over Python for multi-core scaling; that relies on the OpenMP implementation which may be the same one for C++ and Fortran.

Once one has learned the ropes with some system such as blitz++, the additional effort which these authors apparently were prepared to take may not influence the choices. As the previous discussions indicate, the authors seem less familiar with Fortran analogues to implement their programming preferences.

gfortran 4.8 is a newer and better version than the older ones usually cited in comparisons with ifort. Both gfortran and ifort are making big forward strides even since this publication.

Izaak_Beekman · ‎04-10-2014

When comparing inter-language performance, and parallel performance it's very import to consider how optimized each of the codes are, how optimized the serial version vs parallel version is, and whether the codes exhibit good strong and weak scaling. I still haven't read it, but I always look at performance comparisons very critically.

FortranFan · ‎04-10-2014

Thanks Tim and Izaak. You're right about looking at any performance results critically and considering how well a particular language implemention is optimized.

This is the figure that had me puzzled.

Izaak mentions "strong and weak scaling" - is the above indicating good scaling with C++ etc. and weak scaling in terms of how the authors implemented their gfortan solver, or is there something inherent in Fortran that may not scale?

mecej4 · ‎04-11-2014

I cannot comment on why the plots for the other languages look the way they do, but for Fortran my suspicion is that what you see are the effects of cache memory. After the grid size exceeds a certain threshold, most of the arrays are too large to stay in the cache, which accounts for the slowdown.

Steven_L_Intel1 · ‎04-11-2014

Also, the use of gfortran for performance comparisons is limiting - gfortran is well behind other implementations in performance.

FortranFan · ‎04-11-2014

Steve Lionel (Intel) wrote:

Also, the use of gfortran for performance comparisons is limiting - gfortran is well behind other implementations in performance.

Steve,

Not sure if that's true with gfortran 4.7 and later versions. When I stated in Quote #5,

FortranFan wrote:

Also, I've been seeing cases here and there of how well gfortran performs relative to Intel Fortran on many of the Fortran 2003 and 2008 features, particularly OOP implementations. So I was wondering whether someone at Intel might be stirred into doing some performance analysis, use the code in this paper as one of their test cases, and report some results here or somewhere in the open (white paper?) perhaps even leading to further improvements in Intel compiler optimizations. It'll be great to see more of their competitive juices flowing.

I was referring to studies such as the one in this report by Noah Trebesch (School of Physics and Astronomy, Department of Computer Science and Engineering at the University of Minnesota) on Hy3S, a program developed to simulate the stochastic nature of networks of chemical or biochemical reactions. The author of this report shows and states:

Noah Trebesch wrote:

It is fairly surprising that the new version of the code executed in less time when compiled using gfortran rather than ifort. Care was taken to ensure that equivalent compiler flags were used during compilation with both gfortran and ifort, so I do not believe that more optimizations were applied when compiling with gfortran. I think it is more likely that the optimizations applied produced more efficient code when compiling using gfortran.

The different GCC compilers (C, C++, Fortran, etc.) use different front ends but share the same back end. This means that, once the Fortran front end had been updated to support object-oriented structures, all the object-oriented optimizations available to C++ code were also available to object-oriented Fortran code [8]. It appears that the Intel C and C++ compilers share the same back end but that the Intel Fortran compiler's back end is separate. This is not definitively true, but it could explain why the new code compiled by ifort takes more time to execute if it is. All optimizations that already exist for object-oriented structures in the C/C++ back end would have to be rewritten for the Fortran backend, which could potentially take a long time. Regardless of the reason the ifort compiled code takes more time to execute, it is likely that this result will eventually be reversed due to the fact that the Intel compilers are commercial, and, thus, there are more resources available to develop them.

Steven_L_Intel1 · ‎04-11-2014

FortranFan wrote:

Quote:

Steve Lionel (Intel) wrote:
Also, the use of gfortran for performance comparisons is limiting - gfortran is well behind other implementations in performance.

Steve,

Not sure if that's true with gfortran 4.7 and later versions. When I stated in Quote #5.

See http://polyhedron.com/pb05-lin64-f90bench_SBhtml

FortranFan · ‎04-11-2014

Steve Lionel (Intel) wrote:

Quote:

FortranFan wrote:
Quote:

Steve Lionel (Intel) wrote:

Also, the use of gfortran for performance comparisons is limiting - gfortran is well behind other implementations in performance.

Steve,

Not sure if that's true with gfortran 4.7 and later versions. When I stated in Quote #5.

See http://polyhedron.com/pb05-lin64-f90bench_SBhtml

Thanks Steve. Note my quote that follows soon states, ".. seeing cases here and there of how well gfortran performs relative to Intel Fortran on many of the Fortran 2003 and 2008 features, particularly OOP implementations .."

I believe the benchmark cases at PolyHedron mainly go up to Fortran 95. No doubt, Intel Fortran rocks with these cases! I'm rooting for similar superiority in the OOP features of Fortran 2003 and 2008..

Regards,

Izaak_Beekman · ‎04-12-2014

FortranFan wrote:

Thanks Tim and Izaak. You're right about looking at any performance results critically and considering how well a particular language implemention is optimized.

[snip]

Izaak mentions "strong and weak scaling" - is the above indicating good scaling with C++ etc. and weak scaling in terms of how the authors implemented their gfortan solver, or is there something inherent in Fortran that may not scale?

Strong and weak scaling concepts apply to parallel codes, and I still haven't read the paper discussed herein so I'm not sure these concepts are 100% applicable, but I will press on none the less. Judging by the picture and caption it looks like they are increasing the number of grid points but not increasing the number of cores, so this isn't weak scaling in the formal sense, but it could be thought of as a type of weak scaling. I suspect mecej4 is correct, and that this is a data motion issue. If they gave info on cache hits/misses this suspicion could be confirmed. Depending on the nature of their solver, there are optimization techniques which could help alleviate cache misses, like cache tiling (AKA cache blocking, loop tiling, loop blocking, etc.) but in large codes, the compiler might not always be able to tell that it is safe to do this.

Since the Fortran execution time is considerably lower for smaller grids, and remains the fastest even for the largest case, it is likely that the other implementations will see a similar upturn in execution time per grid point as the problem size grows. Since the Fortran code has such low overhead relative to the other implementations, they must move to much larger problem sizes to decrease the percentage of execution time the overhead consumes.

It makes sense that as the problem size grows, data must be fetched from progressively more distant memory. If you were to exhaust the RAM on the host computer, then you would have to send stuff to and from swap/page file if you have it enabled, which resides on the hard drive and this would result in even further slow down. So, in my mind, the Fortran behaviour seems to be the expected behavior and not at all anomalous. It is the overhead of the other languages (which appears to be amortized when moving to a larger problem, only because it is computed as time per step per grid point) that is anomalous. I would bet that if the program size was further increased, a similar rate of increase in execution time with problem size would be observed.

Greynolds__Alan · ‎04-12-2014

Steve Lionel (Intel) wrote:

Also, the use of gfortran for performance comparisons is limiting - gfortran is well behind other implementations in performance.

On my 40000 line pure Fortran-95/OpenMP optical engineering application, gfortran jumped over others (including intel) in performance when 4.7 came out and has stayed ahead since (see attached plot)

Al Greynolds

320163

TimP · ‎04-12-2014

I already mentioned that marketing comparisons tend to compare the most popular past version of gfortran against new ifort releases. I suppose it's partly a question of which versions of gfortran Intel marketing is most hopeful of displacing; the market which is willing to test and install current releases of gfortran (involving a higher degree of self-support) is not so directly targeted.

Among the few instances where gfortran has out-performed ifort has been in the treatment of remainders for vectorized loops. Intel apparently has significant improvements on the way in vectorized remainders for AVX, as well as in the introduction of shuffles for stride -1, which are already in gfortran release candidate.

Gfortran also performs relatively well for non-vectorizable code, or code which ifort stretches to vectorize in lieu of good non-vector optimization. gfortran doesn't often do well on vectorization of loops with multiple assignments, particularly those with multiple conditionals, nor does it perform any loop fusion that I have found. Implementation of OpenMP 4 in gfortran hasn't appeared yet.

As the relative importance of these factors, and the effort which developers are willing to expend to use the special optimization facilities of one compiler or the other may be limited, it doesn't make much sense to try to give an overall comparison figure.

FortranFan · ‎04-12-2014

As suggested by Tim, it's better if this thread doesn't reduce to a debate around some transient differences between Intel Fortran and gfortran. I only posted the couple of comments to suggest that gfortran of late has made some marked performance improvements in certain areas and that the authors' use of it for their Fortran implementation was reasonable, especially given the fact that they wanted to apply the terms of GNU General Public License to their code and since their C++ implementation was based on gcc.

Let us not get distracted by any Intel Fortran v gfortran aspects any further.

Instead, let us get back to the topic in question i.e., OOP implementations of equation solvers for tough problems such as weather, climate, and ocean simulation systems:

On the ease-of-use for code developers, I think there is agreement that the templating capability plus vastly improved editors and IDEs for OOP features will serve Fortran well. In these two aspects, the latter is at least feasible: Code::Blocks with Fortran plug-in on both Linux and Windows, Intel Fortran with Visual Studio on Windows, etc. are making good strides forward and hopefully the trend will continue. But templates may never make it into the standard, is it a major cause for concern for Fortran's long-term health?
On the performance side, Fortran has conceptually leapt ahead with standardized SPMD implementation in the form of coarray Fortran. On the MPDATA equation solver described in the paper, how much performance improvement can coarray Fortran bring to the table if one were to stick to the platforms used by the authors i.e., Intel ® CoreTM i5−2467M CPU 1.60GHz and AMD Phenom TM II X6 1055T Processor?