Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
2 Views

Long Execution Time

Edited subroutines and functions that ran very well in CVF to be suitable for INTEL fortran, later version of the software.  Altered lines to have ampersand, &, at the end of the current line as opposed to the start of the subsequent line.  Working with REAL 16.  Left "IF" blocks with ".LE." in lieu of "<=", for example.  Debugging is enabled.  Results of an execution yield almost the same exact result in the INTEL version as was achieved in the CVF version.  Problem is it takes an ENORMOUS amount of time to run.  Any ideas?  Thanks in advance.

0 Kudos
12 Replies
Highlighted
Valued Contributor II
2 Views

Real(16) is a huge huge

Real(16) is a huge huge overhead compared to real(8) do you actually need real(16) that is very unusual? It is hard to make any comments based of the lack of information given. 

0 Kudos
Highlighted
Black Belt
2 Views

One aspect of editing a

One aspect of editing a working program that merits suspicion is that a mistaken change may have caused the main algorithm to change in complexity. The impaired algorithm still converges and gets home, but in a limping way.

For instance, if you have a quick-sort subroutine, and an intended cosmetic change (moving comments, changing variable names, source format, etc.) may cause an unintended change to the algorithm. Instead of O(N lg N), the complexity becomes O(N2). For N = 106, the sorting part of the program will run 50,000 times slower.

If you want a more specific answer, you will need to show the code or a simplified version of it.

0 Kudos
Highlighted
Black Belt
2 Views

Or run the program under a

Or run the program under a profiler such as Intel VTune Amplifier XE and see where it is spending its time.

Steve (aka "Doctor Fortran") - https://stevelionel.com/drfortran
0 Kudos
Highlighted
Black Belt
2 Views

The newer compiler does more

The newer compiler does more runtime debugging checks.   If you want performance, turn off the runtime checks.

The editing of the source that you describe should not have been required.


 

0 Kudos
Highlighted
2 Views

Yesterday, manipulating

Yesterday, manipulating switches through the properties, I decided to go back to Real *8 as the program was set in CVF.  Sure enough, the execution time was reduced dramatically.  From 24 minutes to 26 seconds.  Optimization reduced the execution time to 15 seconds.  Apparently, holding the increased accuracy requires further calculations due to hardware characteristics.

0 Kudos
Highlighted
Black Belt
2 Views

Quote:Guaglardi, Paul wrote:

Guaglardi, Paul wrote:
 Apparently, holding the increased accuracy requires further calculations due to hardware characteristics.

"Hardware characteristics" as in "It ain't there!". REAL*16 and COMPLEX*32 arithmetic has to be simulated in software using 64-bit floating point arithmetic, since there are almost no mainstream processors today that have 128-bit floating point arithmetic. Intel's software version of 128-bit floating point prioritizes precision over speed.

Don't use REAL*16 without a thorough assessment of whether it is needed and the performance hit that it entails.

0 Kudos
Highlighted
2 Views

Generally, one does not need

Generally, one does not need REAL*16 throughout an entire program (if at all). Try restricting the use to where you absolutely require that precision.

Jim Dempsey

0 Kudos
Highlighted
Black Belt
2 Views

>>>Optimization reduced the

>>>Optimization reduced the execution time to 15 seconds.  Apparently, holding the increased accuracy requires further calculations due to hardware characteristics>>>

There is no HW acceleration (CPU register based) of REAL*16 primitive data type. When you will choose this data type representation all the releveant calculations will be emulated in software probably operating on stack located variables and that's mean a lot of load/store operation through the whole program execution.

0 Kudos
Highlighted
Black Belt
2 Views

Quote:Steve Lionel (Ret.)

Steve Lionel (Ret.) wrote:

Or run the program under a profiler such as Intel VTune Amplifier XE and see where it is spending its time.

If REAL*16 is emulated in software VTune will show probably a lot of time spent on load, store and floating-point arithmetic operations.

I'm not sure how ifort will store 128-bit numbers will it use lower part(s) of YMM/ZMM resgister to load only single quad-prec number?

0 Kudos
Highlighted
Black Belt
2 Views

REAL(16) is not loaded into

REAL(16) is not loaded into registers as such. The software library almost certainly does integer loads of the various parts.

Steve (aka "Doctor Fortran") - https://stevelionel.com/drfortran
0 Kudos
Highlighted
Black Belt
2 Views

So REAL*16 types are treated

So REAL*16 types are treated as double-double values and their decomposition is loaded into vector registers. One variable will occupy two registers lower parts.

0 Kudos
Highlighted
Black Belt
2 Views

No, they are not treated as

No, they are not treated as “double double values”. I don’t know what the internals of the Intel quad-precision library look like, but I have extensive experience with DEC’s and I very much doubt vector registers are used at all, nor floating point instructions.

Steve (aka "Doctor Fortran") - https://stevelionel.com/drfortran
0 Kudos