Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7943 Discussions

Optimize pow(x, n) + pow(y, n) == pow(z, n)

DLake1
New Contributor I
2,716 Views

I'm having a go at the xn + yn = zn problem and want to see how fast I can make it go and if I can get a result but I cant figure out how to make my code vectorize or parallelize:

#include <iostream>
using namespace std;

int main(){
    unsigned long n = 0;
    cout << "n = ";
    cin >> n;
    for(unsigned long x = 1; x < 0xffffffff; ++x){
        cout << "x = " << x << "\n";
        for(unsigned long y = 2; y < 0xffffffff; ++y){
            const unsigned long xy = pow(x, n) + pow(y, n);
            cout << "y = " << y << "\n";
            for(unsigned long z = 1; z < 0xffffffff; ++z){
                if(xy == pow(z, n)) cout << x << "n + " << y << "n = " << z << "n\n";
            }
        }
    }
}

the optimization diagnostic gives 15523

I'm using Intel C++ compiler 18.0

0 Kudos
30 Replies
DLake1
New Contributor I
745 Views

I am very much aware of how floating point numbers work, I forgot to mention I went back to "unsigned long long" as double showed no benefit in this case.

Its obvious now the pow function converts the 64 bit integer to a double which is where the precision is lost so it is the pow function that is at fault as it does not use a multiply loop but some other floating point trickery at the cost of precision where 64 bit integers are used.

0 Kudos
jimdempseyatthecove
Honored Contributor III
745 Views

The pow function is not at fault.... the programmer is at fault. Take a look at the following:

#include "stdafx.h"
#include <iostream>
#include <math.h>

double powers[1048576];
int _tmain(int argc, _TCHAR* argv[])
{
	unsigned long long n = 3;
	for (unsigned long long i = 1; i < 1048576; ++i){
		auto powerrr = i;
		for (unsigned long long j = 1; j < n; ++j){
			powerrr *= i;
		}
		powers = powerrr;	// convert from unsigned long long to double
		// Assure that the test does not convert the unsigned long long powerrr
		// to double and thus drop lsb.
		// instead promote the precision of what is in powers from 53 bits to 64 bits
		// thus retrieving what it can from the saved bits
		if (powerrr != (unsigned long long)powers){
			std::cout << "\n" << powers << "\n" << powerrr;
		}
	}
}

*** The above is compiled as x64 (to assure FPU not used, but SSE, AVX or AVX512 used)

Jim

0 Kudos
DLake1
New Contributor I
745 Views

I said I went back to "unsigned long long".

0 Kudos
jimdempseyatthecove
Honored Contributor III
745 Views

OK let's examine this:

#include "stdafx.h"
#include <iostream>
#include <math.h>

unsigned long long powers[1048576];
int _tmain(int argc, _TCHAR* argv[])
{
	unsigned long long n = 3;
	for (unsigned long long i = 1; i < 1048576; ++i){
		auto powerrr = i;
		for (unsigned long long j = 1; j < n; ++j){
			powerrr *= i;
		}
		powers = powerrr;	// convert from unsigned long long to double
		// Assure that the test does not convert the unsigned long long powerrr
		// to double and thus drop lsb.
		// instead promote the precision of what is in powers from 53 bits to 64 bits
		// thus retrieving what it can from the saved bits
		if (powerrr != (unsigned long long)powers){
			std::cout << "\n" << powers << "\n" << powerrr;
		}
		powers = pow(i, n);
		if (powerrr != (unsigned long long)powers){
			std::cout << "\n" << powers << "\n" << powerrr;
		}
	}
}

The above does exhibit the mismatch on the second test with the same error as you originally reported. Therefore, it would seem that the math.h library is converting the unsigned long long into double, then running the pow(double, double);

In searching the VC include directories:

Find all "pow", Match case, Whole word, Subfolders, Find Results 2, "Visual C++ Include Directories", "*.c;*.cpp;*.cxx;*.cc;*.tli;*.tlh;*.h;*.hh;*.hpp;*.hxx;*.hh;*.inl;*.rc;*.resx;*.idl;*.asm;*.inc"
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\dvec.h(1038):	friend F64vec2 pow(const F64vec2 &a, const F64vec2 &b) { return _mm_pow_pd(a, b); }
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\dvec.h(1368):    friend F32vec8 pow(const F32vec8 &a, const F32vec8 &b) { return _mm256_pow_ps(a, b); }
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\dvec.h(1689):    friend F64vec4 pow(const F64vec4 &a, const F64vec4 &b) { return _mm256_pow_pd(a, b); }
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\dvec.h(2167):inline F32vec16 pow(const F32vec16 &a, const F32vec16 &b) {
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\dvec.h(2577):inline F64vec8 pow(const F64vec8 &a, const F64vec8 &b) {
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\fvec.h(216):	friend F32vec4 pow(const F32vec4 &a, const F32vec4 &b) { return _mm_pow_ps(a, b); }
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\math.h(788): _LIBIMF_EXTERN_C double   _LIBIMF_PUBAPI pow( double __x, double __y );
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\math.h(982):  {return (long double) pow((double)__x, (double) __y );}
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\tgmath.h(47):# undef pow(x,y)
  C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\tgmath.h(161):#define pow(x, y)  _REAL_CMPLX_2_ARG(x, y, pow)
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\amp_math.h(898):	inline float pow(float _X, float _Y) __GPU_ONLY
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\amp_math.h(1230):    using std::pow;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\amp_math.h(3361):	inline float pow(float _X, float _Y) __GPU_ONLY
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\amp_math.h(3378):	inline double pow(double _X, double _Y) __GPU_ONLY
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\amp_math.h(4292):    using std::pow;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\complex.h(226):    inline _Dcomplex __CRTDECL pow(_In_ _Dcomplex _X, _In_ _Dcomplex _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\complex.h(352):    inline _Fcomplex __CRTDECL pow(_In_ _Fcomplex _X, _In_ _Fcomplex _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\complex.h(493):    inline _Lcomplex __CRTDECL pow(_In_ _Lcomplex _X, _In_ _Lcomplex _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(512):double  __cdecl pow(_In_ double _X, _In_ double _Y);
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(760):    return (float)pow(_X, _Y);
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(933):    return pow((double)_X, (double)_Y);
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(1027):inline double __CRTDECL pow(_In_ double _X, _In_ int _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(1115):inline float __CRTDECL pow(_In_ float _X, _In_ float _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(1117):inline float __CRTDECL pow(_In_ float _X, _In_ int _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(1231):inline long double __CRTDECL pow(_In_ long double _X, _In_ long double _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\math.h(1233):inline long double __CRTDECL pow(_In_ long double _X, _In_ int _Y) throw()
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\xtgmath.h(75):extern "C" double __cdecl pow(_In_ double, _In_ double);
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\xtgmath.h(76):float __CRTDECL  pow(_In_ float, _In_ float) _NOEXCEPT;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\xtgmath.h(77):long double __CRTDECL  pow(_In_ long double, _In_ long double) _NOEXCEPT;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\xtgmath.h(84):	pow(const _Ty1 _Left, const _Ty2 _Right)
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\xtgmath.h(87):	return (_CSTD pow(type(_Left), type(_Right)));
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\xtgmath.h(107)://_GENERIC_MATH2(pow, _CRTDEFAULT)	// hand crafted
  C:\Program Files (x86)\Windows Kits\8.1\Include\um\DirectXMathMisc.inl(1890):    // clr = (1+a)*pow(lclr, 1/2.4) - a for lclr > 0.0031308 (where a = 0.055)
  C:\Program Files (x86)\Windows Kits\8.1\Include\um\DirectXMathMisc.inl(1913):    // lclr = pow( (clr + a) / (1+a), 2.4 )
  Matching lines: 34    Matching files: 9    Total files searched: 3449

None of these take, nor return, anything but float, double or long double. IOW the unsigned long long was implicitly converted into double.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
745 Views
0 Kudos
DLake1
New Contributor I
745 Views

Going back to my question in post #15, if a = bn then b = n root a, there is no n root function as far as I can see but an alternative is a1/n which will require floating point maths obviously reducing the precision unless I use a multi precision library but then it might end up being slower than the z loop.

0 Kudos
jimdempseyatthecove
Honored Contributor III
745 Views

>>a = bn If I have the value of a and n, can the value of b be calculated without a loop?

I do not think so. However, you can improve upon the improved version in #6

For example, if a is odd, then b and n must be odd. Meaning you can eliminate searches for cases b is even or n is even.
There may be similar rules where a is a multiple of some other number.
Also, you could reduce the number of iterations by performing something like a binary search on the permutations (convert incremental search to binary search) thus greatly reducing the number of pow's and/or 's.

Jim Dempsey

0 Kudos
DLake1
New Contributor I
745 Views

As I said, b = n root a OR a1/n.

Another way to get b is 10log(a)/n.

0 Kudos
jimdempseyatthecove
Honored Contributor III
745 Views

That also involves using a power function (in addition to the log function).

I am not a mathematician, so I cannot attest to any proof, ... if you were to make graphs of pow(a,n), varying a and n between 3 and infinity (some satisfying number), and then examine the charts you could conclude (but I do not know how to prove) that the sum of any two curves would never intersect another. This is not any different than when using strait lines and showing the slopes diverge, and then using that as a proof that they will not converge.

Jim Dempsey

0 Kudos
DLake1
New Contributor I
745 Views

That graph idea is fantastic, I noticed our prime minister Boris Johnson referenced Fermat's last theorem in his manifesto speech today.

0 Kudos
Reply