Community
cancel
Showing results for 
Search instead for 
Did you mean: 
SergeyKostrov
Valued Contributor II
119 Views

Performance Evaluation of a Matrix Multiply: 2048 x 2048 \\ Data type 'float' \\ All matrix elements 1.0f

NOTE: I'msorry, but I decided to post again becausemy previous post became "deviated" from the subject.

Guys, if you have some time andcouldprovide some performancenumbers, obtained with any
version of MKL,I really appreciate it! If you can't... sorry that my post took a couple of seconds
of your valuabletime.


THIS IS WHAT I NEED:

I wonder if somebody, who has anMKL, could do a Performance Evaluation of aMatrix Multiplication function?

Test-Case:

- Both matrices2048 x 2048
- Data type 'float'
- All Elements Initialized to 1.0f

Please report aTime ( in secs )to Calculate aProduct of two matricesand somedetailsabout your CPU,
frequency, memory in GBs, etc.

I'm not interested in aresult of multiplication. I'm interested to know how longit takes to calculate it on
different computers with different CPUs using Intel'sMKL.

Thank you in advance.

Best regards,
Sergey
0 Kudos
9 Replies
Todd_R_Intel
Employee
119 Views

Sergey,

We post a number of benchmarks on our website but we don't expect that it will ever cover all customer questions. There are simply too many permutations.

Even your question above, leads to some other question... What OS? Are matrices transposed or not? You say both matrices, so is the third matrix in SGEMM, "C" zeroed with beta equal to 0?

And then naturally, there will be required full documentation and disclaimers when Intel posts some benchmark number.

So you see, what seems like a simple request can become a slightly bigger request, so we do our best here to provide some representative performance numbers that give an indication of the kinds of results you can get with Intel MKL and then for the other cases we provide a free evaluation copy of the fully functional version of Intel MKL so that you can give it a try on the case that is important to you.

Todd


SergeyKostrov
Valued Contributor II
119 Views

>>We post a number of benchmarks on our website but we don't expect...

These benchmarks are in 'Gflops', not in'Seconds'.

>>Even your question above, leads to some other question... What OS?

Any OS. No special requirements andwhatever is best for you. A computer with a latest or
older ( 1 - 2 year old )IntelCPU would be OK.

>>Are matrices transposed or not?

No. All matrix elements are initialized to 1.0. Both matrices are square, 2048 by 2048, it means that
it doesn't matter if you transposesome matrixor not. It will be the same.

>>You say both matrices, so is the third matrix in SGEMM, "C" zeroed with beta equal to 0?

Here is a C-pseudo code:

...
float fA[2048][2048];// Matrix A
float fB[2048][2048]; // Matrix B
float fC[2048][2048]; // Matrix C

for( int i=0; i<2048; i++)
{
for( int j=0; j<2048; j++ )
{
fA=1.0f;
fB=1.0f;
fC=0.0f;
}
}

t1 = GetTime();
fC = < MKLMatrixMultiply >( fA, fB );// Any MKL version
t2 = GetTime();

Delta = t2 - t1; // Time to multiply (in seconds, for example )
...

As you can see I don't need something really special.

Best regards,
Sergey
Murat_G_Intel
Employee
119 Views

Hi Sergey,

We report the performance numbersin flops (flop/sec), which is the number offloating point operations(flop)per second (sec). You can find the time required for a routine if you know flop and flop/sec.

For example, the number of floating point operations to compute SGEMM with M=N=K=2048,beta=0.0, alpha=1.0is given as:

2*M*N*K= 2*2048*2048*2048 = 17179869184 flop ~= 17.180 Giga-Flop (GFlop)

Now, if SGEMM runs at 200 GFlop/sec (or GFlops), then the time for SGEMM will be:

17.180 / 200 = 0.0859 secs

Double-precision GEMM (DGEMM) is shown on the performance charts, and as a rule-of-thumb, the single-precision performance is two times of the double-precision performance. Therefore, you can multiply the DGEMM GFlops by two to get an estimate of SGEMM GFlops.

Best wishes,

Efe

SergeyKostrov
Valued Contributor II
119 Views

Hi Efe,

Even if it issome kind of "calculated performance", not measured,it gives me better ideaabout performance of MKL.

I have a question. What is a number '2' in:

2*M*N*K= 2*2048*2048*2048 = 17179869184 flop ~= 17.180 Giga-Flop (GFlop)
^

Thank you for your time!

Best regards,
Sergey
Gennady_F_Intel
Moderator
119 Views

this is the number of multiplications and additions.
SergeyKostrov
Valued Contributor II
119 Views

>>...Now, if SGEMM runs at 200 GFlop/sec (or GFlops )

Question1:
What modernIntel's CPUs provide such performance?

Question 2:
I also would like to compare performance gainsrelative tosome older Intel CPUs, for example
Pentium 4 or Atom N270. So, how fast are they in terms of number of floating point operations in a second?

Best regards,
Sergey
TimP
Black Belt
119 Views

>>...Now, if SGEMM runs at 200 GFlop/sec (or GFlops )

Question1:
What modernIntel's CPUs provide such performance?

Question 2:
I also would like to compare performance gainsrelative tosome older Intel CPUs, for example
Pentium 4 or Atom N270. So, how fast are they in terms of number of floating point operations in a second?


An AVX CPU, even without fma, would have a peak rating of 16 single precision flop per core per Hz clock speed. So you are talking about e.g. an 8 core CPU at 2Ghz.
Most of the recent new entries on Top500 are exceeding 200 Gflops DGEMM per node (2 CPUs) and 80% "efficiency" (actual vs. peak rated performance), and that is sustained for over 10000 cores.
This (for P4, Atom), .... has been covered many times over in public internet posts.
SergeyKostrov
Valued Contributor II
119 Views

>>...2*M*N*K= 2*2048*2048*2048

It looks likeafamous T=O*(n^3) and O equals to '2'.

I'm not convinced that a classic (single-thread) algorithm for matrix multiplication is at the core of MLK's
SGEMM or DGEMM functions. I think Strassen or Strassen-Winograd algorithmshave to be used to boost a
speed ofcalculations.
SergeyKostrov
Valued Contributor II
119 Views

Merry Christmas and a Happy New Year!

Thanks to everybody who responded to my posts.

Best regards,
Sergey
Reply