Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Slow performance using /fpe-all:0

Andrew_Smith
Valued Contributor I
1,035 Views
I tried this on a large dll and it is now running at least 10 times slower, but cricky we are now getting our first ever floating point error trapping in 10 years of using DVF, CVF and IVF!

Do I understand correctly that all subroutine and function calls are wrapped in set/unset calls for CPU error trapping? If I just set /fpe-all:0 on entry point routines would that help performance and still trap errors in the whole code?

Andy Smith
0 Kudos
10 Replies
Steven_L_Intel1
Employee
1,035 Views

You understand correctly. And yes, the setting of the CPU flags for error trapping is slow.
0 Kudos
Andrew_Smith
Valued Contributor I
1,035 Views

You understand correctly. And yes, the setting of the CPU flags for error trapping is slow.

So if I only set the flag on dll entry point routines, then calls down into other functions in other source files would not require /fpe-all:0 compile switch since the trapping would still be in effect?

Is so, then it would have been better for /fpe-all to just wrap dll exported routines and the dll would then run almost as fast as without /fpe-all

Andy Smith
0 Kudos
Steven_L_Intel1
Employee
1,035 Views

It's not just for DLLs - could be static library calls too. I think that what you propose will work.
0 Kudos
Ilie__Daniel
Beginner
1,035 Views

It's not just for DLLs - could be static library calls too. I think that what you propose will work.

This is a very interesting topic.
Is using /fpe-all:0 on the main program unit only (assuming there are no other procedures defined in that file), the same as setting /fpe:0 globally?

Does setting /fpe:3 also set /Qftz- (provided there is no specification for /Qftz[-])? This is a bit confusing as the default for IA-32 is /Qftz.

Daniel.
0 Kudos
TimP
Honored Contributor III
1,035 Views
Quoting - Daniel I.
the default for IA-32 is /Qftz.

I'm confused about what you might mean here. /arch:ia32 uses x87 instructions, where ftz settings have no effect. Among options which include the effect of /Qftz- are /fp-all:3 (without setting of -O2 or -O3 !) and /fp:source. /Qftz may be set after options which affect it (taking effect only when set for main program), or you may
USE ieee_arithmetic
call ieee_set_underflow_mode(.false.)
So it seems useful to use one of the explicit methods for specifying underflow mode along with flags which affect it.
0 Kudos
Ilie__Daniel
Beginner
1,035 Views
Quoting - tim18
I'm confused about what you might mean here. /arch:ia32 uses x87 instructions, where ftz settings have no effect. Among options which include the effect of /Qftz- are /fp-all:3 (without setting of -O2 or -O3 !) and /fp:source. /Qftz may be set after options which affect it (taking effect only when set for main program), or you may
USE ieee_arithmetic
call ieee_set_underflow_mode(.false.)
So it seems useful to use one of the explicit methods for specifying underflow mode along with flags which affect it.

I was referring to the Platform IA-32, rather than the IA32 instruction set. I was trying to understand how various options get set. I did not know that FTZ has no effect when /arch:ia32.

Thank you for answering my second question. Any thoughts on the first one?
0 Kudos
Lorri_M_Intel
Employee
1,035 Views
Quoting - Daniel I.

This is a very interesting topic.
Is using /fpe-all:0 on the main program unit only (assuming there are no other procedures defined in that file), the same as setting /fpe:0 globally?

Does setting /fpe:3 also set /Qftz- (provided there is no specification for /Qftz[-])? This is a bit confusing as the default for IA-32 is /Qftz.

Daniel.

I don't think it's safe to assume that setting fpe-all:0 on a file that contains only a main program will give you the same results as setting fpe:0

If you want fpe:0, please set it explicitly. That would just be safer.

BTW, setting fpe:0 affects only the main program, not any subroutines.

- Lorri


0 Kudos
Ilie__Daniel
Beginner
1,035 Views

I don't think it's safe to assume that setting fpe-all:0 on a file that contains only a main program will give you the same results as setting fpe:0

If you want fpe:0, please set it explicitly. That would just be safer.

BTW, setting fpe:0 affects only the main program, not any subroutines.

- Lorri


Lorri,

Thank you for your reply.

From what you are saying, setting /fpe:0 affects only the main program, not any subroutines.
From what Steve replied (Reply #1), setting /fpe-all:0 on entry routines in a DLL, also traps errors in the child code in the DLL.

It would seem that setting /fpe-all:0 just for the file, which contains the main program, then I may just trap the whole program.

Am I right?

Daniel.

0 Kudos
Lorri_M_Intel
Employee
1,035 Views
Quoting - Daniel I.
Lorri,

Thank you for your reply.

From what you are saying, setting /fpe:0 affects only the main program, not any subroutines.
From what Steve replied (Reply #1), setting /fpe-all:0 on entry routines in a DLL, also traps errors in the child code in the DLL.

It would seem that setting /fpe-all:0 just for the file, which contains the main program, then I may just trap the whole program.

Am I right?

Daniel.


It's mid-afternoon here, and perhaps I need another cup of coffee.

I'm sorry Daniel, I'm not sure how to answer your question with a Yes or No, so let me give the longer explanation.

If you want fpe-all:xx to apply to a particular routine - such as the entry point to a DLL - you need to use the switch when you compile that routine. This is because we generate code within that routine that {basically} does a "push the current FPE stack; change it to } at the top of the routine, and a "pop the saved FPE value" at the end of the routine.

This means the new Floating Point Exception state is valid for any calls made by that routine in between the push/pop calls. That's why Steve recommended it at the global entry points to your DLL rather than for all subroutines in your DLL. And, that's why setting fpe-all when you built your whole application had such an aweful affect on performance; inside each subroutine called was the extra overhead of a push/set/pop

If you have a main routine that only calls the global entry points in your DLL, compiling it with fpe-all will have no effect. Setting fpe:0 will have an effect, because that applies to the main routine.

Did that make sense?

- Lorri


0 Kudos
Ilie__Daniel
Beginner
1,035 Views
Lorri,

I'm switching to black coffee myself.

Yes, that makes sense. Thank you for you time with this. Great software support as always!

Daniel.
0 Kudos
Reply