Hello. I wonder if there is a good reason to not include 'long double' data types in BLAS. I mean, in addition to:
real, single precision
complex, single precision
real, double precision
complex, double precision
Why not include "real, long double precision" and "complex, long double precision" (as defined in ISO C99) equivalent functions?
TimP (Intel) wrote:I could and I did. The most promising one was the next, but was not very helpful: http://software.intel.com/en-us/forums/topic/285712 Did I miss another relevant post about this issue?
You could read earlier posts on this subject.
TimP (Intel) wrote:Thanks, this is is very interesting. Then, could I modify and build MKL/BLAS to add extra and quad precision support?
You're welcome to build BLAS yourself with the compiler and data types of your choice. Hand optimization such as MKL provides would have little to offer for long double.
TimP (Intel) wrote:Of course, C99 says only that long double may be the same as double but, in my humble opinion, if a developer takes the effort to add the specifier 'long' he or she is asking for a higher precision than double, at least, most of the times, although a C99 compliant compiler is not supposed to be enforced to provide more than double precision. In the case of quadruple precision, I hardly believe that a developer could use this data type and accept that the compiler is 'downgrading' such demand to double precision. Thanks again. Hector
C99 says only that long double may be the same as double, as it is in Visual Studio. There are multiple versions of long double implemented in widely used linux compilers, so this seems to encompass a wider variety of cases than you imply.
One thing that you could try is to port the Netlib Fortran BLAS implementation using a compiler that can automatically map doubles (or real*8) to real*16, and likewise for the complex types. That way you could see the performance implications for yourself. I don’t know if there is a corresponding implementation of the BLAS in C that would allow you to do a similar experiment with the C99 types.
I needed a more precise implementation of polynomial roots calculation that uses DGEEVX LAPACK function. I recompiled CLAPACK, C-port of LAPACK, with modified type definitions in f2c.h to __float128 and replaced prototypes for math functions (sin, sqr etc.). The resulted implementation significantly improved the roots precision but the calculation time was ~50 times slower than the original double implementation. GNU GCC compiler with 64bit and libquadmath library had been used. (The code is published at https://sourceforge.net/p/tmc-m-files-to-c-compiler , see qdlapack_tmcruntime ). May the Intel C++ compiler compile it ( __float128 ) and produce faster code ?
- Shmuel S.