Integer vs unsigned integer performance

JJK · ‎11-14-2017

Hi,

Long post, please bear with me ;)

Last weekend at a Hackathon we were discussing the performance of signed vs unsigned integers; at an HPC workshop someone mentioned to me that you should use "int" instead of "unsigned" or "size_t", as 'int' would be much faster. Of course, nobody believed this, so I ended up cobbling together a piece of code to test it. The code is based on a (very dumb) example to determine whether a number is prime or not:

int isPrime(MyIntegerType n)
{
  if (n <= 1) return 0;
  for (MyIntegerType i = 2; i < n; i ++)
  {
     if (n % i == 0) return 0;
  }
  return 1;
}

I realize that this algorithm is very bad (and not even 100% correct, as one can aruge whether '2' is prime or not), and I realize that a compiler needs to do bound checking and that bounds checks are different for signed and unsigned integers. However, this is about the performance of using signed vs unsigned, nothing else.

With a wrapper around the above code I ended up with a (64 bit) test program that tests this using
MyIntegerType = {'int', 'unsigned', 'long', 'unsigned long', 'size_t' }
The results are quite surprising:

It does not really matter which compiler you use for this; gcc 4.8, gcc 7 and icc generate almost identical assembly code, but the assembly code for 'int' *IS* different from the code generated for 'unsigned int'
A good old Harpertown CPU can keep up in this test with CPUs that have a hi\gher clock speed and are 4+ years newer
Performance differs per hardware platform using the same binary
On Ivy Bridge, Haswell and Broadwell CPUs it makes sense to use 'int' instead of 'unsigned int'
On KNL (+KNC) and Atom based CPUs it's exactly the other way round: it makes sense to use 'unsigned' instead of 'int'
On all others there is no significant difference between using or the other
There is a *huge* penalty for using "long" on Intel CPUs
Performance of the KNL box is surprisingly low, even given the fact that it runs at 1.3 or 1.5 GHz - compared to e.g. a 3.3 GHz Pentium G4400

Attached is a spreadsheet with all the platforms it was tested on so far.

Now for my questions:

can someone explain the difference in execution time between different platforms , given the fact that the same executable was used on all platforms (hence no compiler differences) ?
can someone explain why there is such a huge performance penalty for 64bit ints vs 32bit ints (and note that CPUs made by a certain rival company do not display this behaviour)
how can a programmer know up-front what will be the best type for a given algorithm+CPU?
why is performance on the KNL so bad compared to the rest ?

Source code and/or binaries for the above test results is available upon request.

Thx, JJK

jimdempseyatthecove · ‎11-14-2017

This: http://www.agner.org/optimize/instruction_tables.pdf

may shed some light on the issue.

Jim Dempsey