Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29277 Discussions

How to convert a real(8) data to real(16) ?

Zhanghong_T_
Novice
3,977 Views

Hi all,

I tried to assign a variable defined as real(8) to another one defined as real(16) just by

b=a

However, it is not what I expected. For example,

a=2.48740685923698

then

b=2.48740685923698290338279548450373

However, I wish b be equal to

2.48740685923698000000000000000000

exactly.

Could anyone tell me how to assign the value of a to b?

Thanks,

Zhanghong Tang

0 Kudos
22 Replies
TimP
Honored Contributor III
3,542 Views
What you appear to be asking is impossible. You could come close by converting a to a character string and setting b from the character string, e.g. by using internal WRITE and READ.
The simple assignment of a REAL(8) to a REAL(16) makes the binary values exactly equal.
0 Kudos
Steven_L_Intel1
Employee
3,542 Views
As Tim says, you got the exact REAL(16) equivalent to the REAL(8) value, which is an approximation to the decimal value. Read the three-part article The Perils of Real Numbers in the Newsletter thread for further background.
0 Kudos
Zhanghong_T_
Novice
3,542 Views

Thank both of you very much!

The story comes from the following calculation:

I want to solve matrix A*x=b.

I get two approximate solution: x and x' which satisfy the following equations:

b-A*x=r, ||r||

r-A*x'=r', ||r'||

Then I think the solution

xnew=x+x'

is the higher precision solution since

b-A*(x+x')=r' and

||r'||

However, when I calculate the value ||b-A*xnew||, the value is still about eps1*b, not what I wished eps1*eps2*b, furthermore, I verfied two things:

||b-A*x-A*x'|| is about eps1*eps2*b

||b-A*x'-A*x|| is about eps1*b

So I doubt there are some error when I calculate ||b-A*x|| by real(8) data type, I changed the data to real(16) to calculate ||b-A*x|| after I got the solution x and x' (they are still calculated by real(8) data type). But the result is still not improved. Then I found when I assign a real(8) data to a real(16) data, they are 'not euqal'...

Do you think is it possible to get a higher precision for the current data type?

Thanks,

Zhanghong Tang

0 Kudos
John4
Valued Contributor I
3,542 Views
Trying to solve your problem by getting a higher precision might not work ---i.e., I bet it won't work, unless you have a special problem that cannot be scaled... and if that's the case, the higher precision should have been a must in the problem, and not just a decision making tool.

Having some noise in your data is very common, and you should try to solve the problem in the way everybody else does: Numerical Analysis. Get an estimate of the condition number, try least squares, perform singular value decomposition, etc... And if all else fails, try some iterative methods (GMRES comes to my mind as a last resort)... But you've already tried all that, haven't you?

John.
0 Kudos
Zhanghong_T_
Novice
3,542 Views

Hi John,

Thank you very much for your so detailed reply.

On the contrary, for such problem, some iterative methods can reach toa higher precision solution, i.e., ||b-A*x|| could be reached to eps1*eps2*||b||, except that they needs thousands of iterations, which is too slow to beaccepted.

Then I tried the AMG method, however, I found that after the residual has reached to a small value, such as eps1*||b||, the convergence becomes very slow, or become divergent. So I tried to solve the equation of residual again and got another approximate solution, just as I described previously. I think the solution x+x' has the higher precision. But the result is not what I wanted.

Do you have any idea of this method? Do think whether there are something wrong in my previous formula?

Thanks,

Zhanghong Tang

0 Kudos
John4
Valued Contributor I
3,542 Views
The multigrid method should have worked in your case (i.e., slow convergence rate), but if you mentioned that it may diverge, maybe your problem is in the statement itself ---initial and/or boundary conditions.

You said that you obtained two solutions for your problem... Are those solutions being obtained for the same region (e.g., with similar initial guesses)?

Also, combining AMG with another iterative method could help: For example, a few hundred iterations with a preconditioned CG method (e.g., Jacobi) to obtain an initial guess for your MG method.

And, don't try x+x' as your solution only because you think so, unless you have a good explanation for that. The solution might indeed be the linear combination c*x+d*x', but here you're just picking c=d=1 automatically.

John.
0 Kudos
TimP
Honored Contributor III
3,542 Views
If you want to use "iterative improvement" to get a more accurate solution than you achieved with double precision, you require (according to my 30 year old textbook) use of a real(16) copy of your matrix A to calculate the residual errors using all real(16) arithmetic. If your need for additional precision is due to ill conditioning, rounding your matrix to real(8) values could be the source of the problem.
It's likely to be more practical to work on improving the accuracy of your original solution. Steps in that direction include:
1) LU factorization using vectorized dot products, depending on improved ordering and batching of sums to increase effective precision as well as optimizing speed
2) LU factorization using x87 extended precision dot products (you must set precision mode to 64 bits, if using a compiler or OS which sets 53-bit precision mode).
0 Kudos
Zhanghong_T_
Novice
3,542 Views

Hi John,

Thank you again for your answer.

1. both of the solutions are obtained from the same matrix A, from the initial guesses ZERO, except the right hand terms different.

2. I also tried to combine AMG with PCG, CGS and so on, the results are almost the same. The result that is higher precision is obtained from the PCG with ILU(0) preconditioner.

3. In mathematics, the c and d is 1, do you think whether it is possible to get a higher precision solution from x and x', including by the linear combination? Then how to explain the linear combination?

Thanks,

Zhanghong Tang

0 Kudos
Zhanghong_T_
Novice
3,542 Views

Hi Tim,

Thank you very much for your reply.

Firstly, I think it's hard to further improve the original solution x, so I tried to find another solution based on the residual equation.

For your suggestions for using vectorized dot products, I do not well understand. Currently, the most calculation works are matrix-vector product, vector-vector dot product, I used the MKL's sparse matrix functions to process them. The only one I wrote the code myself is the matrix-matrix-matrix multiplicate, since I have not found any function.

In your second suggestion, what do you mean to set these precisions? Can I set them directly from the project and run without change any code? My running environment is IVF10+MKL9.1+VS.net2005, Windows XP OS.

Thanks,

Zhanghong Tang

0 Kudos
TimP
Honored Contributor III
3,542 Views
MKL would be coded for efficiency with SSE2, so the x87 precision mask setting would have no effect. If you compile your own code without SSE2, you could set 64-bit precision mode either by making a C function to reset the precision mode, using fldcw instruction or the Microsoft macro, by use of the CVF defined system call, or, according to current ifort documentation, by compiling with /pc80 set in your project. /pc80 would change the precision initialization in the main program compilation by the 32-bit compiler, so it would not be affected by the option used for compiling called functions. 64-bit Intel Fortran has no option to generate x87 code.
64-bit precision mode would enable more accurate double precision dot products, as a less expensive step than real(16). Changing precision mode presumably violates the calling convention for Microsoft library functions. Given the trend away from use of 32-bit compilers and x87 floating point, these possibilities may not be worth consideration.
0 Kudos
Zhanghong_T_
Novice
3,542 Views

Hi Tim,

Thank you very much for your kindly reply. But could you please where to set the option /pc80? I set the option and the following error message appears:

ifort: command line warning #10006: ignoring unknown option '/pc80'

LINK : warning LNK4044: unrecognized option '/pc80'; ignored

In addition, do you mean if this option is set, the program will run in 64-bit precision mode without any change of the code? Is the 64-bit precision mode only enable accurate tothe dot products, or all operations?

Thanks,

Zhanghong Tang

0 Kudos
TimP
Honored Contributor III
3,542 Views
Sorry; for Windows, as you will see in the ifort help menus, it's /Qpc80 .
0 Kudos
John4
Valued Contributor I
3,542 Views
If your b vectors are different, then you're obtaining two different solutions for two different problems ---both problems share the same model, which is represented by A, but that's it. You shouldn't try to mix the two solutions obtained... It's like having the equation x^2 = y, and trying to combine the solutions obtained for y=4 and y=9; using xnew=2+3 requires y=25 (yet another totally different problem).

In this case x=2 and x=3 are prime numbers (sort of "linearly independent"). Can you guarantee that your x and x' are linearly independent?

Maybe you should just forget about the idea of combining x and x', and stick to Tim's suggestions.

One more thing: Have you tried with b, instead of ZERO, as your initial guess?

John.
0 Kudos
Zhanghong_T_
Novice
3,542 Views

Hi John,

Thanks for your kindly reply again.

In my issue the problem to be solved isa linear system so I can say that xnew=x+x', do you mean if the data are near machine precision, the problem could become nonlinear? That's terrible, I wish it doesn't.

I have tried with b, of course the solution is the same, because everything is the same, the solver, the input A and b, and the initial guess.

Thanks,

Zhanghong Tang

0 Kudos
Zhanghong_T_
Novice
3,542 Views

Hi Tim,

Thanks for your help! I can build the program with the option /Qpc80 (modified fromboth the .cfg file and the project of the program). But it didn't improve the result.

Is there any other special settings? How to find the difference with and without the option?

Thanks,

Zhanghong Tang

0 Kudos
John4
Valued Contributor I
3,542 Views
No, linear dependency and "non-linear model" are different things. Your x and x' vectors are linearly independent if the relation x = k * x' is NEVER true (for any scalar k).

But again, it would be better for you to abandon the idea of trying to combine solutions to different data... And maybe rechecking the statement of your problem (since precision and faster solvers didn't seem to help).

John.
0 Kudos
g_f_thomas
Beginner
3,542 Views

That's not surprising. The processor has a limited number of registers for doing arithmetic in 80 bits so variables are sent to memory, rounded to double(64 bits), and rounded again when they are refetched. You have little control over this activity. The 2 extra bytes extend the precision and range so asto safeguard against under/overflow in intermediate calculations. As this in hardware it is quite fast relative to multiple precision arithmetic.

I agree with John's suggestion that you revisit the formulation of the problem which is presumably physically based and look at conditioning considerations. If it's badly conditioned then this isn't about to change by applying more and more resources at it.

Have you considered using interval or stochastic arithmetical techniques? Also, try posting to the sci.math.numerical-analysis forum for further advise.

Gerry

0 Kudos
davidspurr
Beginner
3,542 Views
Back to the OP.

re: "However, I wish b be equal to 2.48740685923698000000000000000000 exactly."
& "What you appear to be asking is impossible"


At least for IVF10, would something along the lines of the following not achieve the original intent (albeit "non-portable")?

b = 1.Q-14 * QEXT ( INT( a * 1.d14 ) )

Edit: Or perhaps that needs to be

b = 1.Q-14 * QEXT ( INT( a * 1.d14, 8 ) )

David
0 Kudos
Zhanghong_T_
Novice
3,542 Views

Thank all of you very much!

I will research my problem again carefully and then report the results to you. My problem is: for a given large sparse matrix (more than 1 million unknowns), all kinds of AMG preconditioned iterative methods can't reach to a give precision (for example, 1.e-10). But some other ILU(0) preconditioned iterative methods still can reach to the precision after thousands of iterations. I don't know what lead to such problem.

Gerry, you mentioned the'interval or stochastic arithmetical techniques', where can I find the introduction of that? is it helpful to solve my problem? By the way, I can't open the forum you said.

Thanks,

Zhanghong Tang

0 Kudos
John4
Valued Contributor I
3,338 Views
You have a system with millions of unknowns, and you expect it to be solved with less than a thousand iterations? Keep in mind that MG methods only guarantee that the number of iterations required for the *worst case* is equivalent to the number of unknowns.

If you're lucky and Gauss-Seidel requires only one million iterations, then you shouldn't complain if the MG-ILU method requires something between 10,000-100,000 iterations (that's 1-10% of the iterations required by GS).

MG is one of the fastest methods available... But don't expect miracles.

John.
0 Kudos
Reply