long double data types in BLAS

Hector_L_ · ‎10-08-2012

Hello. I wonder if there is a good reason to not include 'long double' data types in BLAS. I mean, in addition to:

real, single precision
complex, single precision
real, double precision
complex, double precision

Why not include "real, long double precision" and "complex, long double precision" (as defined in ISO C99) equivalent functions?

Many thanks.

Hector.

TimP · ‎10-08-2012

You could read earlier posts on this subject. You're welcome to build BLAS yourself with the compiler and data types of your choice. Hand optimization such as MKL provides would have little to offer for long double. C99 says only that long double may be the same as double, as it is in Visual Studio. There are multiple versions of long double implemented in widely used linux compilers, so this seems to encompass a wider variety of cases than you imply.

Hector_L_ · ‎10-09-2012

Dear Tim, Thank you very much for your useful and polite reply.

TimP (Intel) wrote:
You could read earlier posts on this subject.

I could and I did. The most promising one was the next, but was not very helpful: http://software.intel.com/en-us/forums/topic/285712 Did I miss another relevant post about this issue?

TimP (Intel) wrote:
You're welcome to build BLAS yourself with the compiler and data types of your choice. Hand optimization such as MKL provides would have little to offer for long double.

Thanks, this is is very interesting. Then, could I modify and build MKL/BLAS to add extra and quad precision support?

TimP (Intel) wrote:
C99 says only that long double may be the same as double, as it is in Visual Studio. There are multiple versions of long double implemented in widely used linux compilers, so this seems to encompass a wider variety of cases than you imply.

Of course, C99 says only that long double may be the same as double but, in my humble opinion, if a developer takes the effort to add the specifier 'long' he or she is asking for a higher precision than double, at least, most of the times, although a C99 compliant compiler is not supposed to be enforced to provide more than double precision. In the case of quadruple precision, I hardly believe that a developer could use this data type and accept that the compiler is 'downgrading' such demand to double precision. Thanks again. Hector

Shane_S_Intel · ‎10-10-2012

Hi Hector, long double can mean different things to different people ... simply “double”on some systems, mapped to the 80-bit X87 floating point type by the Intel compilers on Linux/Windows when certain switches are applied, or even the IEEE Quad 128-bit floating point type. Presumably you're most interested in Quad BLAS support. Quad support for BLAS is on our longer term to-do list, but quite low in priority. It is generally understood that as problems sizes grow and computational speeds increase the need for additional accuracy follows the same trend. The main challenge associated with a high performance (or MKL suitable) Quad BLAS support is essentially the speed of the underlying Quad basic floating point operations, which at this time are implemented in software because there is no direct hardware support for them. So the performance impact moving from double to Quad would likely be significant. One thing that you could try is to port the Netlib Fortran BLAS implementation using a compiler that can automatically map doubles (or real*8) to real*16, and likewise for the complex types. That way you could see the performance implications for yourself. I don’t know if there is a corresponding implementation of the BLAS in C that would allow you to do a similar experiment with the C99 types. Our current focus areas are optimizations for the latest/upcoming Xeon processors, optimizations for the new Intel Xeon® PhiTM coprocessor, and conditional numerical reproducibilty. -Shane

Hector_L_ · ‎10-14-2012

Hi Shane, Thank you very much indeed for your comments, suggestions and even roadmap of the optimizations. After some preliminary testing with zgemm and zgemm3m I am afraid the precision achieved in the results is not enough for my specific application. As the performance is another goal (the final hardware target is a supercomputer), I cannot afford a generalized use of quad precision because the lack of hardware support that you pointed out. Anyway, I will try your suggestion but, in the worst case, I would develop a tailored solution. Thanks, Hector

shmuel_s_ · ‎11-23-2016

You wrote:

One thing that you could try is to port the Netlib Fortran BLAS implementation using a compiler that can automatically map doubles (or real*8) to real*16, and likewise for the complex types. That way you could see the performance implications for yourself. I don’t know if there is a corresponding implementation of the BLAS in C that would allow you to do a similar experiment with the C99 types.

I needed a more precise implementation of polynomial roots calculation that uses DGEEVX LAPACK function. I recompiled CLAPACK, C-port of LAPACK, with modified type definitions in f2c.h to __float128 and replaced prototypes for math functions (sin, sqr etc.). The resulted implementation significantly improved the roots precision but the calculation time was ~50 times slower than the original double implementation. GNU GCC compiler with 64bit and libquadmath library had been used. (The code is published at https://sourceforge.net/p/tmc-m-files-to-c-compiler , see qdlapack_tmcruntime ). May the Intel C++ compiler compile it ( __float128 ) and produce faster code ?

- Shmuel S.