Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1711 Discussions

Theoretical Sp and DP Peak Performances of an Intel Core i7 950 3.06 Ghz LGA 1366

Fausto_A_
Beginner
1,015 Views
We are doing some benchmarks and to determine the efficiency of some codes we need to know the theoretical peak performance of your Core i7 950 3.06 Ghz  LGA 1366.
 
I an not able to locate such data on any of your data sheets. I need to know the theoretical peak performances of the Intel Core i7 950 3.06 Ghz LGA1336 in number of single and double precision floating point instructions per second.
 
Let me know if you know where I can find these data or better, if you can, send me the peak performances and please explain me how you calculated them.
 
Thank you for your help!
Have a good day.

Fausto Artico
0 Kudos
10 Replies
Patrick_F_Intel1
Employee
1,015 Views

Hello Fausto,

I believe that bloomfield uses the nehalem cpu architecture and I think that nehalem could get a max 8 SP FLOPS/cycle (per core I assume) and 4 DP flops/cycle/core.

See http://www.realworldtech.com/sandy-bridge/6/ (which discusses nehalem, sandybridge and AMD's bulldozer).

But 'theoretical peak flops' is probably not so useful in real life. I would try to find some published benchmark that does something similar to what you intend to use with your system.

Pat

0 Kudos
Bernard
Valued Contributor I
1,015 Views

>>>I believe that bloomfield uses the nehalem cpu architecture and I think that nehalem could get a max 8 SP FLOPS/cycle (per core I assume) and 4 DP flops/cycle/core.>>>

Yes IIRC that CPU can achieve 8 SP FLOPS/cycle and 4 DP/cycle. MAX supported ISA extension is SSE 4.2.

http://www.anandtech.com/show/6355/intels-haswell-architecture/8

 

0 Kudos
Bernard
Valued Contributor I
1,015 Views

>>>and please explain me how you calculated them.>>>

You can code your own benchmark by using for example SAXPY  A = a *  A + B like computation.

In pseudocode:

// Allocate memory for  float* A and float* B by using malloc() function.

//Initialize both of arrays

//Use double for-loop where outer loop will run 1.0e+6 iterations of shorter inner loop.

Calculate FP operations/sec by using following formula

GFLOPS =  1/MAX CPU FREQ * outer loop count * inner loop count * FLOPS/cycle.

You should multiply GLOPS result by the number of CPU cores.

As a side note peak theoretical FLOP/s is  CPU MAX FREQ * Number of Cores * Number of SP FLOPS/cycle.

Peak theoretical SP FLOP/s for Intel Bloomfield CPU will be:

   3.36e+9 hz * 4 * 8 SP ~ 107.5 SP GLOPS.

0 Kudos
Bernard
Valued Contributor I
1,015 Views

@Fausto

Here is interesting discussion about the FP benchmark.

http://stackoverflow.com/questions/8389648/how-do-i-achieve-the-theoretical-maximum-of-4-flops-per-cycle

Check also following link: https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs

0 Kudos
TimP
Honored Contributor III
1,015 Views
The post here doesn't state ground rules for choice of compiler or options while that stack overflow post chose scalar sse2. If the rule is to be default options then intel c++ inner_product would be a relevant baseline
0 Kudos
Vladimir_S_2
Beginner
1,015 Views

SKYLAKE 64 SP FLOPS PER CYCLE?

 

0 Kudos
TimP
Honored Contributor III
1,015 Views

Future skylake server may support peak simd performance per core doubling that of Haswell.  Current skylake client CPU apparently is similar to Haswell in that respect.

0 Kudos
Vladimir_S_2
Beginner
1,015 Views

apple a6  8sp per cycle or 1 dp per cycle

apple a7/8 16 sp or 8 dp per cycle = intel ivy bridge

http://dench.flatlib.jp/opengl/devices?s[]=iphone&s[]=6s#ios_soc      section - ios soc

and http://dench.flatlib.jp/opengl/cpufop

Its true?

 

0 Kudos
Bernard
Valued Contributor I
1,015 Views

@Tim

Do you mean the future Xeon Skylake(Purely arch.) which incorporates 512-bit vector registers?

0 Kudos
Bernard
Valued Contributor I
1,015 Views

Vladimir S. wrote:

apple a6  8sp per cycle or 1 dp per cycle

apple a7/8 16 sp or 8 dp per cycle = intel ivy bridge

http://dench.flatlib.jp/opengl/devices?s[]=iphone&s[]=6s#ios_soc      section - ios soc

and http://dench.flatlib.jp/opengl/cpufop

Its true?

 

Why are you trying to compare Apple's SoC to Intel Haswell CPU?

Afaik Apple's solution is based on ARM architecture and probably incorporates NEON SIMD with its 128-bit vector registers.

0 Kudos
Reply