- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Guys, if you have some time andcouldprovide some performancenumbers, obtained with any
version of MKL,I really appreciate it! If you can't... sorry that my post took a couple of seconds
of your valuabletime.
THIS IS WHAT I NEED:
I wonder if somebody, who has anMKL, could do a Performance Evaluation of aMatrix Multiplication function?
Test-Case:
- Both matrices2048 x 2048
- Data type 'float'
- All Elements Initialized to 1.0f
Please report aTime ( in secs )to Calculate aProduct of two matricesand somedetailsabout your CPU,
frequency, memory in GBs, etc.
I'm not interested in aresult of multiplication. I'm interested to know how longit takes to calculate it on
different computers with different CPUs using Intel'sMKL.
Thank you in advance.
Best regards,
Sergey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We post a number of benchmarks on our website but we don't expect that it will ever cover all customer questions. There are simply too many permutations.
Even your question above, leads to some other question... What OS? Are matrices transposed or not? You say both matrices, so is the third matrix in SGEMM, "C" zeroed with beta equal to 0?
And then naturally, there will be required full documentation and disclaimers when Intel posts some benchmark number.
So you see, what seems like a simple request can become a slightly bigger request, so we do our best here to provide some representative performance numbers that give an indication of the kinds of results you can get with Intel MKL and then for the other cases we provide a free evaluation copy of the fully functional version of Intel MKL so that you can give it a try on the case that is important to you.
Todd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These benchmarks are in 'Gflops', not in'Seconds'.
>>Even your question above, leads to some other question... What OS?
Any OS. No special requirements andwhatever is best for you. A computer with a latest or
older ( 1 - 2 year old )IntelCPU would be OK.
>>Are matrices transposed or not?
No. All matrix elements are initialized to 1.0. Both matrices are square, 2048 by 2048, it means that
it doesn't matter if you transposesome matrixor not. It will be the same.
>>You say both matrices, so is the third matrix in SGEMM, "C" zeroed with beta equal to 0?
Here is a C-pseudo code:
...
float fA[2048][2048];// Matrix A
float fB[2048][2048]; // Matrix B
float fC[2048][2048]; // Matrix C
for( int i=0; i<2048; i++)
{
for( int j=0; j<2048; j++ )
{
fA
fB
fC
}
}
t1 = GetTime();
fC = < MKLMatrixMultiply >( fA, fB );// Any MKL version
t2 = GetTime();
Delta = t2 - t1; // Time to multiply (in seconds, for example )
...
As you can see I don't need something really special.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
We report the performance numbersin flops (flop/sec), which is the number offloating point operations(flop)per second (sec). You can find the time required for a routine if you know flop and flop/sec.
For example, the number of floating point operations to compute SGEMM with M=N=K=2048,beta=0.0, alpha=1.0is given as:
2*M*N*K= 2*2048*2048*2048 = 17179869184 flop ~= 17.180 Giga-Flop (GFlop)
Now, if SGEMM runs at 200 GFlop/sec (or GFlops), then the time for SGEMM will be:
17.180 / 200 = 0.0859 secs
Double-precision GEMM (DGEMM) is shown on the performance charts, and as a rule-of-thumb, the single-precision performance is two times of the double-precision performance. Therefore, you can multiply the DGEMM GFlops by two to get an estimate of SGEMM GFlops.
Best wishes,
Efe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Even if it issome kind of "calculated performance", not measured,it gives me better ideaabout performance of MKL.
I have a question. What is a number '2' in:
2*M*N*K= 2*2048*2048*2048 = 17179869184 flop ~= 17.180 Giga-Flop (GFlop)
^
Thank you for your time!
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question1:
What modernIntel's CPUs provide such performance?
Question 2:
I also would like to compare performance gainsrelative tosome older Intel CPUs, for example
Pentium 4 or Atom N270. So, how fast are they in terms of number of floating point operations in a second?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question1:
What modernIntel's CPUs provide such performance?
Question 2:
I also would like to compare performance gainsrelative tosome older Intel CPUs, for example
Pentium 4 or Atom N270. So, how fast are they in terms of number of floating point operations in a second?
Most of the recent new entries on Top500 are exceeding 200 Gflops DGEMM per node (2 CPUs) and 80% "efficiency" (actual vs. peak rated performance), and that is sustained for over 10000 cores.
This (for P4, Atom), .... has been covered many times over in public internet posts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks likeafamous T=O*(n^3) and O equals to '2'.
I'm not convinced that a classic (single-thread) algorithm for matrix multiplication is at the core of MLK's
SGEMM or DGEMM functions. I think Strassen or Strassen-Winograd algorithmshave to be used to boost a
speed ofcalculations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks to everybody who responded to my posts.
Best regards,
Sergey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page