Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Long Execution Time

Guaglardi__Paul
Beginner
673 Views

Edited subroutines and functions that ran very well in CVF to be suitable for INTEL fortran, later version of the software.  Altered lines to have ampersand, &, at the end of the current line as opposed to the start of the subsequent line.  Working with REAL 16.  Left "IF" blocks with ".LE." in lieu of "<=", for example.  Debugging is enabled.  Results of an execution yield almost the same exact result in the INTEL version as was achieved in the CVF version.  Problem is it takes an ENORMOUS amount of time to run.  Any ideas?  Thanks in advance.

0 Kudos
12 Replies
andrew_4619
Honored Contributor II
673 Views

Real(16) is a huge huge overhead compared to real(8) do you actually need real(16) that is very unusual? It is hard to make any comments based of the lack of information given. 

0 Kudos
mecej4
Honored Contributor III
673 Views

One aspect of editing a working program that merits suspicion is that a mistaken change may have caused the main algorithm to change in complexity. The impaired algorithm still converges and gets home, but in a limping way.

For instance, if you have a quick-sort subroutine, and an intended cosmetic change (moving comments, changing variable names, source format, etc.) may cause an unintended change to the algorithm. Instead of O(N lg N), the complexity becomes O(N2). For N = 106, the sorting part of the program will run 50,000 times slower.

If you want a more specific answer, you will need to show the code or a simplified version of it.

0 Kudos
Steve_Lionel
Honored Contributor III
673 Views

Or run the program under a profiler such as Intel VTune Amplifier XE and see where it is spending its time.

0 Kudos
IanH
Honored Contributor II
673 Views

The newer compiler does more runtime debugging checks.   If you want performance, turn off the runtime checks.

The editing of the source that you describe should not have been required.


 

0 Kudos
Guaglardi__Paul
Beginner
673 Views

Yesterday, manipulating switches through the properties, I decided to go back to Real *8 as the program was set in CVF.  Sure enough, the execution time was reduced dramatically.  From 24 minutes to 26 seconds.  Optimization reduced the execution time to 15 seconds.  Apparently, holding the increased accuracy requires further calculations due to hardware characteristics.

0 Kudos
mecej4
Honored Contributor III
673 Views

Guaglardi, Paul wrote:
 Apparently, holding the increased accuracy requires further calculations due to hardware characteristics.

"Hardware characteristics" as in "It ain't there!". REAL*16 and COMPLEX*32 arithmetic has to be simulated in software using 64-bit floating point arithmetic, since there are almost no mainstream processors today that have 128-bit floating point arithmetic. Intel's software version of 128-bit floating point prioritizes precision over speed.

Don't use REAL*16 without a thorough assessment of whether it is needed and the performance hit that it entails.

0 Kudos
jimdempseyatthecove
Honored Contributor III
673 Views

Generally, one does not need REAL*16 throughout an entire program (if at all). Try restricting the use to where you absolutely require that precision.

Jim Dempsey

0 Kudos
Bernard
Valued Contributor I
673 Views

>>>Optimization reduced the execution time to 15 seconds.  Apparently, holding the increased accuracy requires further calculations due to hardware characteristics>>>

There is no HW acceleration (CPU register based) of REAL*16 primitive data type. When you will choose this data type representation all the releveant calculations will be emulated in software probably operating on stack located variables and that's mean a lot of load/store operation through the whole program execution.

0 Kudos
Bernard
Valued Contributor I
673 Views

Steve Lionel (Ret.) wrote:

Or run the program under a profiler such as Intel VTune Amplifier XE and see where it is spending its time.

If REAL*16 is emulated in software VTune will show probably a lot of time spent on load, store and floating-point arithmetic operations.

I'm not sure how ifort will store 128-bit numbers will it use lower part(s) of YMM/ZMM resgister to load only single quad-prec number?

0 Kudos
Steve_Lionel
Honored Contributor III
673 Views

REAL(16) is not loaded into registers as such. The software library almost certainly does integer loads of the various parts.

0 Kudos
Bernard
Valued Contributor I
673 Views

So REAL*16 types are treated as double-double values and their decomposition is loaded into vector registers. One variable will occupy two registers lower parts.

0 Kudos
Steve_Lionel
Honored Contributor III
673 Views

No, they are not treated as “double double values”. I don’t know what the internals of the Intel quad-precision library look like, but I have extensive experience with DEC’s and I very much doubt vector registers are used at all, nor floating point instructions.

0 Kudos
Reply