Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28445 Discussions

Subroutines run 3x slower when compiled with 12.1

John_B_10
Beginner
512 Views

After upgrading from 11.1 to 12.1 our subroutines take 3x longer to execute. We have tired many different optimization settings but are unable to get our performance back.

What are we doing wrong?

0 Kudos
1 Solution
Martyn_C_Intel
Employee
512 Views

Sorry, tried to post Friday, but it didn’t seem to ‘take’. However, that gave me time to check my old records in more detail.

I don’t know of any change between 11.1 and 12.1 likely to cause such a large change in compiled code. The compiler would not have been released if there was a big impact on performance of applications in general.

 I also don’t believe there is a substantial performance difference between the single and multi-threaded versions of the Microsoft C run-time library, ( or they wouldn’t have stopped shipping the single threaded one). Note that here, “multi-threaded” does not mean that the library itself is threaded, only that it is safe to call it from a multi-threaded program. “single-threaded” means that it is not safe to call it from a multi-threaded program.

However, I do know of one area in which the multi-threaded version of the Intel Fortran run-time library (libifcoremt or libifcoremd)  is significantly slower than the single threaded version (libifcoremt), even when called from unthreaded code, due to the protections for multithreading. This involves Fortran I/O, in particular, formatted internal reads (reads from one variable into another of different type). Intel Fortran continues to ship the single threaded library libifcore, but if you build inside the Visual Studio 2010 IDE, it is the multithreaded version that gets linked by default. If you build from the command line, I believe you get the single threaded version by default.

 

       To test whether this could be impacting your application, I suggest adding libifcore.lib to your linker command line explicitly inside Visual Studio 2010, and see if this makes a difference compared to libifcoremt or libifcoremd.  I don’t know how to do this instead with switches inside the IDE, but you could alternatively do something in source code in the main program:

USE IFCORE
INTEGER(4):: MODE
MODE = FOR_SET_REENTRANCY(FOR_K_REENTRANCY_NONE)

This initializes even the multithreaded Fortran library in single-threaded mode. To reset to multi-threaded (threadsafe mode), use

MODE = FOR_SET_REENTRANCY(FOR_K_REENTRANCY_THREADED)

If I read your last post right, you also saw a slowdown when linking the multithreaded libraries using VS2003. You could test whether setting FOR_K_REENTRANCY_NONE restores performance here, also. FOR_SET_REENTRANCY is documented in the Fortran compiler user guide.

Please let us know what you find, and especially if the slowdown seems to result from something other than an internal read, e.g. other forms of Fortran I/O.

View solution in original post

0 Kudos
14 Replies
TimP
Honored Contributor III
512 Views

You may need to dig into optimization reports.  From what you have given us, we don't know whether vectorization or parallelization plays a part, or whether you have shifted from a default IA32 x87 mode to a default SSE2 mode.  In the latter case, mixed single/double/long double will pose a performance problem.

0 Kudos
jimdempseyatthecove
Honored Contributor III
512 Views

Have you by chance enabled index out of bounds runtime checks? And/or other runtime checks?

Jim Dempsey

0 Kudos
John_B_10
Beginner
512 Views

jimdempseyatthecove wrote:

Have you by chance enabled index out of bounds runtime checks? And/or other runtime checks?

Jim Dempsey

Thank you for this suggestion, but these checks are off, and also I turned all checks off and it did not help.

Parallelization is set to NO, and the Par/Vec thresholds are 100.

I have compiled with arch:/IA32 and QxHost, nothing seems to make any significant impact.

One big difference in our compiler settings in going from VS2003 + 11.1 to VS2010 + 12.1 is that we no longer have the single threaded option to use?

Intel, any more suggestions?

0 Kudos
andrew_4619
Honored Contributor II
512 Views

what optimisation level is used? Posting a buildlog will show the options being used.

 

0 Kudos
John_B_10
Beginner
512 Views

Build log deleted, as it is no longer relevant in this topic... John B.

0 Kudos
jimdempseyatthecove
Honored Contributor III
512 Views

Add to options:

      /Qdiag-disable:8290

This should remove the remarks for the edit descriptor width. Assuming you don't have numbers with "-" you may be OK.

Jim Dempsey

0 Kudos
Kevin_D_Intel
Employee
512 Views

I’m sorry, I don’t have any immediate ideas. I don’t recall specifics back to the 12.1.5.344 to know what issues we may have been seen with that particular package or other 12.1 releases, or in moving to VS2010.  I will ask some colleagues and do some looking back around the time of that release.

Perhaps your question about loss of the single threaded option relates to a change Microsoft made in VS2008 mentioned here, https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/293095#comment-1527437.

0 Kudos
jimdempseyatthecove
Honored Contributor III
512 Views

RE: 3x performance drop.

Can use VTune or otherwise identify a bottom level subroutine that is 3x slower, and then paste at least the entry and data declarations. This may help us in identifying something that may not be obvious. Around V12 some changes were made relating to heap arrays (local arrays allocated on heap).

Jim Dempsey

0 Kudos
John_B_10
Beginner
512 Views

I will look into VTune...

I did test VS2003 compiled with mult-threaded dll vs. single-threaded dll, and the performance outcome matches the original testing. So I think the compiler is not the issue here, but the single vs. mult-threaded. option.

Is there any settings or strategy we can use to compensate of the single threading option being depreciated by MS?

 

 

0 Kudos
Martyn_C_Intel
Employee
513 Views

Sorry, tried to post Friday, but it didn’t seem to ‘take’. However, that gave me time to check my old records in more detail.

I don’t know of any change between 11.1 and 12.1 likely to cause such a large change in compiled code. The compiler would not have been released if there was a big impact on performance of applications in general.

 I also don’t believe there is a substantial performance difference between the single and multi-threaded versions of the Microsoft C run-time library, ( or they wouldn’t have stopped shipping the single threaded one). Note that here, “multi-threaded” does not mean that the library itself is threaded, only that it is safe to call it from a multi-threaded program. “single-threaded” means that it is not safe to call it from a multi-threaded program.

However, I do know of one area in which the multi-threaded version of the Intel Fortran run-time library (libifcoremt or libifcoremd)  is significantly slower than the single threaded version (libifcoremt), even when called from unthreaded code, due to the protections for multithreading. This involves Fortran I/O, in particular, formatted internal reads (reads from one variable into another of different type). Intel Fortran continues to ship the single threaded library libifcore, but if you build inside the Visual Studio 2010 IDE, it is the multithreaded version that gets linked by default. If you build from the command line, I believe you get the single threaded version by default.

 

       To test whether this could be impacting your application, I suggest adding libifcore.lib to your linker command line explicitly inside Visual Studio 2010, and see if this makes a difference compared to libifcoremt or libifcoremd.  I don’t know how to do this instead with switches inside the IDE, but you could alternatively do something in source code in the main program:

USE IFCORE
INTEGER(4):: MODE
MODE = FOR_SET_REENTRANCY(FOR_K_REENTRANCY_NONE)

This initializes even the multithreaded Fortran library in single-threaded mode. To reset to multi-threaded (threadsafe mode), use

MODE = FOR_SET_REENTRANCY(FOR_K_REENTRANCY_THREADED)

If I read your last post right, you also saw a slowdown when linking the multithreaded libraries using VS2003. You could test whether setting FOR_K_REENTRANCY_NONE restores performance here, also. FOR_SET_REENTRANCY is documented in the Fortran compiler user guide.

Please let us know what you find, and especially if the slowdown seems to result from something other than an internal read, e.g. other forms of Fortran I/O.

0 Kudos
John_B_10
Beginner
512 Views

Martyn, Thank you very much for your response, this was extremely helpful!.

To clarify, adding libifcorertd.lib to the Linker additional dependency field in the project configuration has caused the program to run with the performance we had expected.

 

Thanks Again!

0 Kudos
Martyn_C_Intel
Employee
512 Views

That's great to hear.

If you know which parts of your program slowed down and then sped up, please let us know. That might help us to figure out where to look for improvements to the multithreaded versions of the library.

0 Kudos
John_B_10
Beginner
512 Views

I asked some of the guru's around here about this program's functions, and they said that the file I/O which is reading in dat files as bytes and casting them to structures defined in included files, was affected. This was also true for reading in the parameters (integers, reals) checking and setting values in said structures.

Also he had this about execution time increases (from single-threaded to multi-threaded)

Data loading and initialization from 16ms to 93ms, calculation from 16ms to 28ms, and file save 15ms to 51ms.

so total changed from 47ms to 172ms

0 Kudos
Martyn_C_Intel
Employee
512 Views

Thanks.

0 Kudos
Reply