Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Fausto_A_
Beginner
154 Views

Theoretical Sp and DP Peak Performances of an Intel Core i7 950 3.06 Ghz LGA 1366

We are doing some benchmarks and to determine the efficiency of some codes we need to know the theoretical peak performance of your Core i7 950 3.06 Ghz  LGA 1366.
 
I an not able to locate such data on any of your data sheets. I need to know the theoretical peak performances of the Intel Core i7 950 3.06 Ghz LGA1336 in number of single and double precision floating point instructions per second.
 
Let me know if you know where I can find these data or better, if you can, send me the peak performances and please explain me how you calculated them.
 
Thank you for your help!
Have a good day.

Fausto Artico
0 Kudos
10 Replies
Patrick_F_Intel1
Employee
154 Views

Hello Fausto,

I believe that bloomfield uses the nehalem cpu architecture and I think that nehalem could get a max 8 SP FLOPS/cycle (per core I assume) and 4 DP flops/cycle/core.

See http://www.realworldtech.com/sandy-bridge/6/ (which discusses nehalem, sandybridge and AMD's bulldozer).

But 'theoretical peak flops' is probably not so useful in real life. I would try to find some published benchmark that does something similar to what you intend to use with your system.

Pat

Bernard
Black Belt
154 Views

>>>I believe that bloomfield uses the nehalem cpu architecture and I think that nehalem could get a max 8 SP FLOPS/cycle (per core I assume) and 4 DP flops/cycle/core.>>>

Yes IIRC that CPU can achieve 8 SP FLOPS/cycle and 4 DP/cycle. MAX supported ISA extension is SSE 4.2.

http://www.anandtech.com/show/6355/intels-haswell-architecture/8

 

Bernard
Black Belt
154 Views

>>>and please explain me how you calculated them.>>>

You can code your own benchmark by using for example SAXPY  A = a *  A + B like computation.

In pseudocode:

// Allocate memory for  float* A and float* B by using malloc() function.

//Initialize both of arrays

//Use double for-loop where outer loop will run 1.0e+6 iterations of shorter inner loop.

Calculate FP operations/sec by using following formula

GFLOPS =  1/MAX CPU FREQ * outer loop count * inner loop count * FLOPS/cycle.

You should multiply GLOPS result by the number of CPU cores.

As a side note peak theoretical FLOP/s is  CPU MAX FREQ * Number of Cores * Number of SP FLOPS/cycle.

Peak theoretical SP FLOP/s for Intel Bloomfield CPU will be:

   3.36e+9 hz * 4 * 8 SP ~ 107.5 SP GLOPS.

Bernard
Black Belt
154 Views

@Fausto

Here is interesting discussion about the FP benchmark.

http://stackoverflow.com/questions/8389648/how-do-i-achieve-the-theoretical-maximum-of-4-flops-per-c...

Check also following link: https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs

TimP
Black Belt
154 Views

The post here doesn't state ground rules for choice of compiler or options while that stack overflow post chose scalar sse2. If the rule is to be default options then intel c++ inner_product would be a relevant baseline
Vladimir_S_2
Beginner
154 Views

SKYLAKE 64 SP FLOPS PER CYCLE?

 

TimP
Black Belt
154 Views

Future skylake server may support peak simd performance per core doubling that of Haswell.  Current skylake client CPU apparently is similar to Haswell in that respect.

Vladimir_S_2
Beginner
154 Views

apple a6  8sp per cycle or 1 dp per cycle

apple a7/8 16 sp or 8 dp per cycle = intel ivy bridge

http://dench.flatlib.jp/opengl/devices?s[]=iphone&s[]=6s#ios_soc      section - ios soc

and http://dench.flatlib.jp/opengl/cpufop

Its true?

 

Bernard
Black Belt
154 Views

@Tim

Do you mean the future Xeon Skylake(Purely arch.) which incorporates 512-bit vector registers?

Bernard
Black Belt
154 Views

Vladimir S. wrote:

apple a6  8sp per cycle or 1 dp per cycle

apple a7/8 16 sp or 8 dp per cycle = intel ivy bridge

http://dench.flatlib.jp/opengl/devices?s[]=iphone&s[]=6s#ios_soc      section - ios soc

and http://dench.flatlib.jp/opengl/cpufop

Its true?

 

Why are you trying to compare Apple's SoC to Intel Haswell CPU?

Afaik Apple's solution is based on ARM architecture and probably incorporates NEON SIMD with its 128-bit vector registers.

Reply