Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
10 Views

long double operators and mathimf overloaded functions

Hey,

I'm having trouble converting my code to long double.

  • Some compiler flag in Visual Studio seems to reduce the precision when I use any operator (e.g. "*=", or "*")
    It seems that I can fix it with the compiler argument "/Qpc80".
    How does the "/Qpc80" interact with the Floating Point Model "Precise (/fp:precise)"?
  • The overloaded long double version of log() doesn't seem to be available, so the double version is used. I have to call the logl() directly.
    Isn't the overloaded version of log(long double) supposed to be provided by the mathimf.h?
  • In a larger project long double version of log() of Microsoft is used although I didn't include "math.h"
    It seems that some other std headers include the math.h if i didn't include mathimf.h first in each c/cpp file. Either I get linker errors or I can see in the debug mode, that the long double version of Microsoft math.h is used, which calls the double version.
    Should I include mathimf.h in each file before including other files?

 

 

I wrote a program to test narrow down the problems. Basically I'm using quad precision (which seems to work) to test the other data types:
 

	long double b = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ", "long double(1/3)");
	myDebugPrintDigits(b, 45);
	b *= 2;
	printf("\n%20s = ", "long double 2*(1/3)");
	myDebugPrintDigits(b, 45);
...
	b = log(2.0L);
	printf("\n%20s = ", "long double(log(2))");
	myDebugPrintDigits(b, 45);
	b = logl(2.0L);
	printf("\n%20s = ", "long double(logl(2))");
	myDebugPrintDigits(b, 45);

 

The result looks like this in Visual Studio 2015 (added compiler flags "/Qoption,cpp,--extended_float_type /Qlong-double"):

         double(1/3) = 3.33333333333333314829616256247390992939472198e-1
      double 2*(1/3) = 6.66666666666666629659232512494781985878944396e-1
    long double(1/3) = 3.33333333333333333342368351437379203616728773e-1
 long double 2*(1/3) = 6.66666666666666629659232512494781985878944396e-1
          _Quad(1/3) = 3.33333333333333333333333333333333307654267408e-1
       _Quad 2*(1/3) = 6.66666666666666666666666666666666615308534816e-1
      double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
 long double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
long double(logl(2)) = 6.93147180559945309428690474184975300886435434e-1
    _Quad(__logq(2)) = 6.93147180559945309417232121458176613602238496e-1
      correct log(2) = 6.93147180559945309417232121458176568075500134e-1

sizeof(double) = 8
sizeof(long double) = 16
sizeof(_Quad) = 16
__IMFLONGDOUBLE = 80

The list of compiler flags used by Visual Studio is "/GS /W3 /Zc:wchar_t /ZI /Od /Fd"x64\Debug\vc140.pdb" /D "_MBCS" /Zc:forScope /RTC1 /MDd /Fa"x64\Debug\" /EHsc /nologo /Fo"x64\Debug\" /Qprof-dir "x64\Debug\" /Fp"x64\Debug\Projekt1.pch" + "/Qoption,cpp,--extended_float_type /Qlong-double".
The long double precision is reduced to double precision when I multiply it with 2 (see "long double 2*(1/3)" compared to "long double(1/3)")!

 

 

If I compile the source directly with icl ("icl main.cpp /Qoption,cpp,--extended_float_type /Qlong-double"), then I get:

         double(1/3) = 3.33333333333333314829616256247390992939472198e-1
      double 2*(1/3) = 6.66666666666666629659232512494781985878944396e-1
    long double(1/3) = 3.33333333333333333342368351437379203616728773e-1
 long double 2*(1/3) = 6.66666666666666666684736702874758407233457546e-1
          _Quad(1/3) = 3.33333333333333333333333333333333307654267408e-1
       _Quad 2*(1/3) = 6.66666666666666666666666666666666615308534816e-1
      double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
 long double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
long double(logl(2)) = 6.93147180559945309428690474184975300886435434e-1
    _Quad(__logq(2)) = 6.93147180559945309417232121458176613602238496e-1
      correct log(2) = 6.93147180559945309417232121458176568075500134e-1

sizeof(double) = 8
sizeof(long double) = 16
sizeof(_Quad) = 16
__IMFLONGDOUBLE = 80


Compiling manually with icl or adding "/Qpc80" in Visual Studio seems to solve the multiply precision issue, but the log(long double) function is still not using the logl() method. Is this intended behavior?

Thanks,
Christian

 

The full code is:

#include <mathimf.h>
#include <stdio.h>

typedef _Quad float128_type;

extern "C" {
	_Quad __ldexpq(_Quad, int);
	_Quad __frexpq(_Quad, int*);
	_Quad __fabsq(_Quad);
	_Quad __floorq(_Quad);
	_Quad __ceilq(_Quad);
	_Quad __sqrtq(_Quad);
	_Quad __truncq(_Quad);
	_Quad __expq(_Quad);
	_Quad __powq(_Quad, _Quad);
	_Quad __logq(_Quad);
	_Quad __log10q(_Quad);
	_Quad __sinq(_Quad);
	_Quad __cosq(_Quad);
	_Quad __tanq(_Quad);
	_Quad __asinq(_Quad);
	_Quad __acosq(_Quad);
	_Quad __atanq(_Quad);
	_Quad __sinhq(_Quad);
	_Quad __coshq(_Quad);
	_Quad __tanhq(_Quad);
	_Quad __fmodq(_Quad, _Quad);
	_Quad __atan2q(_Quad, _Quad);
}


void myDebugPrintDigits(_Quad q, int noOfDigits) {
	int i,j,k;
	j = 0;
	while (q < 1) {
		q *= 10;
		j--;
	}
	while (q > 10) {
		q /= 10;
		j++;
	}
	i = floor((double)q);
	k = 0;
	while (q > 0 && k<noOfDigits) {
		q -= i;
		printf("%d", i);
		q *= 10;
		i = __floorq(q);
		if (k == 0)
			printf(".");
		k++;
	}
	printf("e%d",j);
}


int main() {

	double a = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ","double(1/3)");
	myDebugPrintDigits(a, 45);
	a *= 2;
	printf("\n%20s = ", "double 2*(1/3)");
	myDebugPrintDigits(a, 45);

	long double b = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ", "long double(1/3)");
	myDebugPrintDigits(b, 45);
	b *= 2;
	printf("\n%20s = ", "long double 2*(1/3)");
	myDebugPrintDigits(b, 45);

	_Quad c = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ", "_Quad(1/3)");
	myDebugPrintDigits(c, 45);
	c *= 2;
	printf("\n%20s = ", "_Quad 2*(1/3)");
	myDebugPrintDigits(c, 45);

	a = log(2.0f);
	printf("\n%20s = ", "double(log(2))");
	myDebugPrintDigits(a, 45);
	b = log(2.0L);
	printf("\n%20s = ", "long double(log(2))");
	myDebugPrintDigits(b, 45);
	b = logl(2.0L);
	printf("\n%20s = ", "long double(logl(2))");
	myDebugPrintDigits(b, 45);
	c = __logq(2.0q);
	printf("\n%20s = ", "_Quad(__logq(2))");
	myDebugPrintDigits(c, 45);

	printf("\n%20s = %s", "correct log(2)","6.93147180559945309417232121458176568075500134e-1");

	printf("\n");

	printf("\nsizeof(double) = %d", sizeof(double));
	printf("\nsizeof(long double) = %d", sizeof(long double));
	printf("\nsizeof(_Quad) = %d", sizeof(_Quad));
	printf("\n__IMFLONGDOUBLE = %d", __IMFLONGDOUBLE);
	return 0;
}

 

0 Kudos
5 Replies
Highlighted
10 Views

The underlying hardware instruction set of IA-32, Intel64, AMD64 do not support quad precision (SSE/AVX/AVX512). The FPU instruction set does not support quad precision either (16-byte floating point), it does support a 10-byte floating point precision which can be stored (0 padded) into a 16 byte location. Many of the intrinsic functions do not support inputs of 10-byte (FPU) floating point format. +-*/ do.

If you truly want quad precision, use Google to search for

C++ extended precision math

There are several implementation techniques.

Jim Dempsey

0 Kudos
Highlighted
Black Belt
10 Views

/Qpc80 when running 32-bit mode compilation with ICL sets the x87 precision back up to 64-bit.  It may do the same when running a 64-bit mode compilation, but I don't know that this is documented.  64-bit Windows sets it to 53 bits before handing control to .exe, so your application needs to over-ride in some such way if you want to see x87 80-bit long double.

ICL has documented support for 80-bit long double only for 32-bit mode with /arch:IA32 /Qlong-double.  AFAIK, Visual Studio 64-bit has only support for long double as an alias for double, and the default modes of ICL match that.

Basically, Windows treats long double as an obsoleted facility.

0 Kudos
Highlighted
Valued Contributor II
10 Views

>>...Should I include mathimf.h in each file before including other files? In case of projects compiled in Windows with any version of Visual Studios it can be added to stdafx.h only. Regarding long double data type. This is the most controversial Floating Point data type and questions about how it should be used never stop. So, it is better to stay with float and double instead. PS: Size of long double data type is different for different C++ compilers and if you look at math.h from a Visual Studio you will see a comment: ... ...long double is synonymous with double in this implementation... ...
0 Kudos
Highlighted
Beginner
10 Views

I had hoped that one or two more digits precision would be enough (long double), but currently only the quad solves my numerical problems. The long double result is about as inaccurate as the double version. The quad is slower by factor 10, but it works.

The plot shows the results of an explicit equation (IA32, calls to logl etc of mathimf). Maybe the structure of this equation is bad, but I have to use them. I couldn't find more mathematical simplifications, than I have already applied.PrecisionComparison.png

0 Kudos
Highlighted
Valued Contributor II
10 Views

>>...The plot shows the results of an explicit equation... Could clarify meaning for Y and X axes?
0 Kudos