1. How is 128-bit precision arithmetic (Floating point) or Real*16 (Type=16) implemented in the 9.1 version of FORTRAN and what are the performance issues?
2. What is the best INTEL processor to use for best performance with REAL*16 calculations?
3. Can REAL*16 calculations be vectorized for enhanced performance in solving linear and non-linear systems? e.g newton-raphson solvers, etc.
Operations on REAL*16 are done through library calls. These operations are not vectorized, since the data types are as wide as the data types of the Streaming SIMD Extensions and vectorizing with vector length equal to one is really just code generation, not vectorization. Furthermore, the Streaming SIMD Extensions mainly deal with packed operands varying from 8-bit to 64-bit only and do not provide sufficiently general support to implement REAL*16 this way.
OK - Now to answer the question:
What is the performance difference between REAL*8 and REAL*16?
What do you mean Library calls?