Re: Quad Precision vs. Double Precision Problem

wsec303 · ‎01-14-2009

Hi, everyone,

I have a double-precision code, which I am sure it is right. while I upgrade it to quad-precision, the output are wild, What is wrong? Any idea?

Steven_L_Intel1 · ‎01-14-2009

Without seeing an example, my first guess is that you have an argument type mismatch when calling a procedure. Try building with "-gen-interface -warn interface".

wsec303 · ‎01-14-2009

Hi, Steve,

I tried, but the output is still bad.

My compile options is

ifort -O0 -g -fp-model strict -real-size 64 -integer-size 16 -mcmodel=medium -shared-intel -gen-interface -warn interface a.f90

Include the interface option, as you advised.

And in terminal, I set

ulimit -s 1024000
export KMP_STACKSIZE='1024000K'

By the way, all the math functions in the quad-precion code, use quad-precision, such as qsqrt, instead of dsqrt.

Quoting - Steve Lionel (Intel)

Without seeing an example, my first guess is that you have an argument type mismatch when calling a procedure. Try building with "-gen-interface -warn interface".

Steven_L_Intel1 · ‎01-14-2009

You should be using the generic sqrt instead. Specific intrinsic names are for those stuck in Fortran 66.

Care to show an example?

TimP · ‎01-14-2009

I'd be a little worried about integer-size 16 with real-size 64, although it ought to be OK.

wsec303 · ‎01-14-2009

All the functions specified for quad-precision, were changed to the general ones, but still keep quad-precision specifications for real numbers, such as 1.0Q0

Unfortunately, I am lazy, use

implicit real(kind=16) (a-h,o-z), integer(kind=8) (i-n)

at the beginning of every subroutine. Will that be a cause?

Now compiler with

ifort -O0 -real-size 128 -intger-size 64 -zero -mcmodel=large -shared-intel a.f90 as suggested by Tim.

But Still Not works.

God, it has been two weeks for this bug.

Quoting - Steve Lionel (Intel)

You should be using the generic sqrt instead. Specific intrinsic names are for those stuck in Fortran 66.

Care to show an example?

Steven_L_Intel1 · ‎01-15-2009

Please provide us a test case so that we can see what the problem is.

wsec303 · ‎01-15-2009

How to provide a test case?

What kind of information needed?

Quoting - Steve Lionel (Intel)

Please provide us a test case so that we can see what the problem is.

Ron_Green · ‎01-15-2009

Quoting - wsec303

Steve has the instructions at the top of of the Forum, it is here:

http://software.intel.com/en-us/forums/showannouncement.php?a=79

and upload a.f90 and tell us how to run it, and what you expect the output to be.

OR you can open a problem report at http://premier.intel.com and upload the file there, if you don't want the whole world to see it.

ron

wsec303 · ‎01-15-2009

Thanks, but it is required to keep the code confidential. No offend.

One issue, that just jumps out of my head, is the double precision code made some modifications to avoid overflow.
The modifications are something like when the real number reaches the limit, then give the real number a specific value that below the limit. Will that be the cause?

Or the problem is caused by the accumulation of round-off errors?

How to debug efficiently using idb, to compare the double-precision code running results with the quad-precision code running results, to digg out where the problem is?

The code consists of a lot of subroutines, so if you can tell me how to debug efficiently and speculate the running outputs, maybe help a lot.

Quoting - Ronald Green (Intel)

Steve has the instructions at the top of of the Forum, it is here:

http://software.intel.com/en-us/forums/showannouncement.php?a=79

and upload a.f90 and tell us how to run it, and what you expect the output to be.

OR you can open a problem report at http://premier.intel.com and upload the file there, if you don't want the whole world to see it.

ron

Steven_L_Intel1 · ‎01-15-2009

What I usually do is add WRITE statements in various places that write out intermediate values using the same format. I'll also add some sort of tag in the line so I can tell where it's coming from. I then run the program the "old" way and then the "new" way and compare the output files to see where the results start to diverge.

I usually need to refine this - once I've identified a section of code that creates the difference, I start writing out more detail until I find the specific operatiion and values that is the culprit. Then and only then can I determine what the problem is. I find this method easier than using the debugger.

If you can't provide us the code, then you'll have to do this on your own. There is no substitute for analysis.

Note that if you have a license which entitles you to support, your submissions to Intel Premier Support are kept confidential. Whatever happens, please let us know how it turns out.

TimP · ‎01-17-2009

Yes, it would be perplexing as to how your code could ever have worked. You imply taking log() of a negative number. If you meant abs() or some more careful treatment, you needed to write that in your source code.

Steven_L_Intel1 · ‎01-17-2009

I agree with Tim - it is a mystery how this code worked before. Did you have it working in quad precision with the other compilers? Does the double-precision version also try to take logs of negative numbers? It could be that you're ending up with NaNs that are propagating or being dropped later.

wsec303 · ‎01-17-2009

Sorry. I provided wrong information.

The bug should be in the code below.

q=cmplx(-1.1,0.0)
q2=q**(0.25)

For quad-precison, the output of q2 is NaN, but db-precision, it works.

I really don't know why.

My solution to this bug is rewritting the code as

q=dcmplx(-1.1,0.0)
q2=qcmplx(q**0.25d0)

Hirchert__Kurt_W · ‎01-18-2009

Quoting - wsec303

Sorry. I provided wrong information.

The bug should be in the code below.

q=cmplx(-1.1,0.0)
q2=q**(0.25)

For quad-precison, the output of q2 is NaN, but db-precision, it works.

I really don't know why.

My solution to this bug is rewritting the code as

q=dcmplx(-1.1,0.0)
q2=qcmplx(q**0.25d0)

This explains why your program has worked in the past -- you weren't really taking a negative number to a real power, just a complex number that looks something like a negative number.

1. If the statements above do what you say in a small (<10 line) program, you can submit such a program in a bug report to Intel without compromising the confidentiality of the original code.

2. Another alternative you could try, either as a workaround or to further study the bug, is

q=cmplx(-1.1,0.0)
q2=exp(0.25*log(q))

This latter expression is what q**(0.25) should translate into, but by writing it explicitly, you may either get the right answer, or be able to determine whether it is the log or the exp that is producing the NaN.

3. If your actual code really looks like these snippets, I should point out that although you are computing in quad or double precision, you are only producing results correct to single precision levels, for a couple of reasons:

a. 1.1 is a single precision constant. Converting it to quad precision later doesn't eliminate the error inherent in its value. If you are using a constant, you need to say something like -1.1q0 or -1.1_quad (provided you define quad to be a constant with the KIND value for quad procision).

b. Unless you specify the optional KIND argument, the cmplx intrinsic only produces a single precision result. Instead of what you have written, you might want to do something like

q=cmplx(-1.1q0,0.0q0,kind=kind(q))

c. I note in passing that nominally you have the same problem with 0.25 as you have with 1.1. In practice, however, 0.25 is exact in binary and thus can be converted between single precision and quad precision without loss of accuracy, unlike 1.1 which can only be approximated in binary and thus has less accuracy in single precision than quad precision.

4. If q really is defined by a constant, I would use a literal complex constant such as (-1.1q0,0.0q0) rather than using the cmplx intrinsic. [I would do this both for better readability and because some compilers might actually be stupid enough to execute the cmplx at run-time instead of at compile-time.]

If it is not constant, but the imaginary part really always is zero, I would omit the second argument rather trying to provide an explicit zero of the appropriate kind. [This would primarily be for readability, but also to avoid the opportunity for errors by me or the compiler in the interpretation of an explicit imaginary part.]

-Kurt

TimP · ‎01-18-2009

Quoting - hirchert

q=cmplx(-1.1,0.0)
q2=exp(0.25*log(q))

-Kurt

Kurt's concept makes sense, but in reality the ** operator ought to take more precautions to preserve accuracy and range than this replacement. If the suggestion substitution goes wrong, it might lead one to suspect a problem there.
As Kurt mentions, it is somewhat curious that you would use single precision constants in quad arithmetic. By invoking several direct conversions from single to quad precision, you may have found something which hasn't been tested adequately.
If the argument were not complex quad, sqrt(sqrt(q)) could be expected to be more accurate and/or faster than q**.25. If it's important, it should be tested for this case.

wsec303 · ‎01-19-2009

Quoting - hirchert

This explains why your program has worked in the past -- you weren't really taking a negative number to a real power, just a complex number that looks something like a negative number.

1. If the statements above do what you say in a small (<10 line) program, you can submit such a program in a bug report to Intel without compromising the confidentiality of the original code.

2. Another alternative you could try, either as a workaround or to further study the bug, is

q=cmplx(-1.1,0.0)
q2=exp(0.25*log(q))

This latter expression is what q**(0.25) should translate into, but by writing it explicitly, you may either get the right answer, or be able to determine whether it is the log or the exp that is producing the NaN.

3. If your actual code really looks like these snippets, I should point out that although you are computing in quad or double precision, you are only producing results correct to single precision levels, for a couple of reasons:

a. 1.1 is a single precision constant. Converting it to quad precision later doesn't eliminate the error inherent in its value. If you are using a constant, you need to say something like -1.1q0 or -1.1_quad (provided you define quad to be a constant with the KIND value for quad procision).

b. Unless you specify the optional KIND argument, the cmplx intrinsic only produces a single precision result. Instead of what you have written, you might want to do something like

q=cmplx(-1.1q0,0.0q0,kind=kind(q))

c. I note in passing that nominally you have the same problem with 0.25 as you have with 1.1. In practice, however, 0.25 is exact in binary and thus can be converted between single precision and quad precision without loss of accuracy, unlike 1.1 which can only be approximated in binary and thus has less accuracy in single precision than quad precision.

4. If q really is defined by a constant, I would use a literal complex constant such as (-1.1q0,0.0q0) rather than using the cmplx intrinsic. [I would do this both for better readability and because some compilers might actually be stupid enough to execute the cmplx at run-time instead of at compile-time.]

If it is not constant, but the imaginary part really always is zero, I would omit the second argument rather trying to provide an explicit zero of the appropriate kind. [This would primarily be for readability, but also to avoid the opportunity for errors by me or the compiler in the interpretation of an explicit imaginary part.]

-Kurt

Thanks. You advise really helps.
One quick question: where you learn that A**b, is translated into exp(b*log(A)) in Ifort.
I am not questioning that, just want to know where you find this?

Hirchert__Kurt_W · ‎01-20-2009

Quoting - wsec303

Quoting - hirchert

2. Another alternative you could try, either as a workaround or to further study the bug, is

q=cmplx(-1.1,0.0)
q2=exp(0.25*log(q))

This latter expression is what q**(0.25) should translate into, but by writing it explicitly, you may either get the right answer, or be able to determine whether it is the log or the exp that is producing the NaN.

-Kurt

Thanks. You advise really helps.
One quick question: where you learn that A**b, is translated into exp(b*log(A)) in Ifort.
I am not questioning that, just want to know where you find this?

I was making an assertion about mathematics, not ifort specifically, so we're talking about something I learned in high school. 40+ years later, I don't remember exactly where, but I suspect it may have been in my calculus class.

It is my experience that for non-integer b, every Fortran compiler I've ever used computes A**b effectively this way. Some have literally used the log and exp intrinsics; others used slightly tweaked equivalents to improve speed, accuracy, and/or range. [E.g., because the internal arithmetic of the computer is done base 2, the actual computation may be a bit of a cross between exp(b*log(A)) and 2**(b*log2(A)).] I have no specific knowledge of what ifort uses, so my suggestion allowed that this workaround might either give you the right answer (if the bug was entirely in a tweaked implementation of A**b) or it might provide insight into which half of the computation had the bug (if ifort actually uses the exp and log intrinsics or if the code in its A**b implementation is copied the buggy code from one of them).