Theoretical Sp and DP Peak Performances of an Intel Core i7 950 3.06 Ghz LGA 1366

Fausto_A_ · ‎11-09-2014

We are doing some benchmarks and to determine the efficiency of some codes we need to know the theoretical peak performance of your Core i7 950 3.06 Ghz LGA 1366.

I an not able to locate such data on any of your data sheets. I need to know the theoretical peak performances of the Intel Core i7 950 3.06 Ghz LGA1336 in number of single and double precision floating point instructions per second.

Let me know if you know where I can find these data or better, if you can, send me the peak performances and please explain me how you calculated them.

Thank you for your help!

Have a good day.

Fausto Artico

Patrick_F_Intel1 · ‎11-10-2014

Hello Fausto,

I believe that bloomfield uses the nehalem cpu architecture and I think that nehalem could get a max 8 SP FLOPS/cycle (per core I assume) and 4 DP flops/cycle/core.

See http://www.realworldtech.com/sandy-bridge/6/ (which discusses nehalem, sandybridge and AMD's bulldozer).

But 'theoretical peak flops' is probably not so useful in real life. I would try to find some published benchmark that does something similar to what you intend to use with your system.

Pat

Bernard · ‎11-10-2014

>>>I believe that bloomfield uses the nehalem cpu architecture and I think that nehalem could get a max 8 SP FLOPS/cycle (per core I assume) and 4 DP flops/cycle/core.>>>

Yes IIRC that CPU can achieve 8 SP FLOPS/cycle and 4 DP/cycle. MAX supported ISA extension is SSE 4.2.

http://www.anandtech.com/show/6355/intels-haswell-architecture/8

Bernard · ‎11-10-2014

>>>and please explain me how you calculated them.>>>

You can code your own benchmark by using for example SAXPY A = a * A + B like computation.

In pseudocode:

// Allocate memory for float* A and float* B by using malloc() function.

//Initialize both of arrays

//Use double for-loop where outer loop will run 1.0e+6 iterations of shorter inner loop.

Calculate FP operations/sec by using following formula

GFLOPS = 1/MAX CPU FREQ * outer loop count * inner loop count * FLOPS/cycle.

You should multiply GLOPS result by the number of CPU cores.

As a side note peak theoretical FLOP/s is CPU MAX FREQ * Number of Cores * Number of SP FLOPS/cycle.

Peak theoretical SP FLOP/s for Intel Bloomfield CPU will be:

3.36e+9 hz * 4 * 8 SP ~ 107.5 SP GLOPS.

Bernard · ‎11-11-2014

@Fausto

Here is interesting discussion about the FP benchmark.

http://stackoverflow.com/questions/8389648/how-do-i-achieve-the-theoretical-maximum-of-4-flops-per-cycle

Check also following link: https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs

TimP · ‎11-11-2014

The post here doesn't state ground rules for choice of compiler or options while that stack overflow post chose scalar sse2. If the rule is to be default options then intel c++ inner_product would be a relevant baseline

Vladimir_S_2 · ‎01-05-2016

SKYLAKE 64 SP FLOPS PER CYCLE?

TimP · ‎01-05-2016

Future skylake server may support peak simd performance per core doubling that of Haswell. Current skylake client CPU apparently is similar to Haswell in that respect.

Vladimir_S_2 · ‎01-05-2016

apple a6 8sp per cycle or 1 dp per cycle

apple a7/8 16 sp or 8 dp per cycle = intel ivy bridge

http://dench.flatlib.jp/opengl/devices?s[]=iphone&s[]=6s#ios_soc section - ios soc

and http://dench.flatlib.jp/opengl/cpufop

Its true?

Bernard · ‎01-25-2016

@Tim

Do you mean the future Xeon Skylake(Purely arch.) which incorporates 512-bit vector registers?

Bernard · ‎01-25-2016

Vladimir S. wrote:

apple a6 8sp per cycle or 1 dp per cycle

apple a7/8 16 sp or 8 dp per cycle = intel ivy bridge

http://dench.flatlib.jp/opengl/devices?s[]=iphone&s[]=6s#ios_soc section - ios soc

and http://dench.flatlib.jp/opengl/cpufop

Its true?

Why are you trying to compare Apple's SoC to Intel Haswell CPU?

Afaik Apple's solution is based on ARM architecture and probably incorporates NEON SIMD with its 128-bit vector registers.