Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- Cost of covariance matrix estimation

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Jean-françois_D_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-29-2012
05:09 AM

41 Views

Cost of covariance matrix estimation

Hello,

I'm having a bit of trouble finding some informations about some BLAS functions. I want to estimate a covariance matrix from a set of K vectors (of length N).Two ways for doing that:

- put all the vectors x_k into a matrix X (size N*K) and use zgemm to do X*X^{H} so computational cost = 8*K*N²

- use zher K times updating each time the matrix with x_k*x_k^{H} --> what's the cost of that ?

Also, i'm a bit lost when talking about computing power calculation. If, for a given matrix-matrix multiplication, I need 200 GFlop per second (calculated with 8*K*N² / the time I have to do it). Can I compare these 200 GFlops to theorical power of my CPU ? Because i always see that the power of CPUs/GPUs is given in MAD GFlops. Does this mean that i can divide the 200 GFlops by 2 because one multiplication+addition is done per cycle ??

Thank you.

Jean-François

Link Copied

7 Replies

Ilya_B_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-29-2012
05:37 AM

41 Views

Jean-françois_D_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-29-2012
05:46 AM

41 Views

Zhang_Z_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-03-2012
10:13 AM

41 Views

Ilya_B_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-04-2012
12:11 AM

41 Views

Jean-françois_D_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-04-2012
12:52 AM

41 Views

Zhang Z (Intel) wrote:No I don't, I found approximations of the theorical peak here for exemple : http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_PressBriefing.pdf That is around 80 GFlops for my X5570, for the estimation of the covariance matrix, it is pretty simple if I use zgemm its 8*N²*K floating operations, so can I compare this number (divided by the time i have to make the calculation) to the theorical peak to have an idea if this should run fast enough ? When programming on GPUs, which i also use for bigger matrices, as far as i know, FMA i supported and thus i guess i can divide the 8*N²*M floating operations by 2 as one mul-add is made in one cycle.Jean-François,

Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?

Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.

Zhang_Z_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-05-2012
11:22 AM

41 Views

Jean-françois D. wrote:I cannot comment on GPU peak performance. But for Intel Xeon X5570, my calculation gives 94 GFlops theoretical peak performance for double precision floating-point operations and 188 GFlops for single precision floating-point operations. This is based on 2.93 GHz CPU frequency, 2 sockets, 4 cores per socket, and assumes all operations are vectorized. You can use this information to compute an upper limit of the speed for you operations.

Quote:Zhang Z (Intel)wrote:Jean-François,

Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?

Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.

No I don't, I found approximations of the theorical peak here for exemple : http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_P...

That is around 80 GFlops for my X5570,

for the estimation of the covariance matrix, it is pretty simple if I use zgemm its 8*N²*K floating operations, so can I compare this number (divided by the time i have to make the calculation) to the theorical peak to have an idea if this should run fast enough ?

When programming on GPUs, which i also use for bigger matrices, as far as i know, FMA i supported and thus i guess i can divide the 8*N²*M floating operations by 2 as one mul-add is made in one cycle.

Jean-françois_D_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2012
02:39 AM

41 Views

Zhang Z (Intel) wrote:Thank you, it really helps ! Indeed, I got two X5570 clocked at 2.93 GHz !

Quote:Jean-françois D.wrote:

Quote:

Zhang Z (Intel)wrote:Jean-François,

Would you please share how you get the theoretical peak performance (GFLOPS) of your processor? It seems you assume one multiplication and one addition can be done in one cycle. This is true ONLY if the processor is capable of FMA instructions (fused multiply-add). For Intel processors, FMA will be introduced in the upcoming Haswell microarchitecture in 2013. What processor do you have?

Ilya please comment on the computational cost (in terms of the number of floating-point operations) of covariance matrix estimation, but I think it should be on par with the cost of matrix multiplication.

No I don't, I found approximations of the theorical peak here for exemple : http://download.intel.com/pressroom/kits/xeon/5600series/pdf/Xeon_5600_P...

That is around 80 GFlops for my X5570,

for the estimation of the covariance matrix, it is pretty simple if I use zgemm its 8*N²*K floating operations, so can I compare this number (divided by the time i have to make the calculation) to the theorical peak to have an idea if this should run fast enough ?

When programming on GPUs, which i also use for bigger matrices, as far as i know, FMA i supported and thus i guess i can divide the 8*N²*M floating operations by 2 as one mul-add is made in one cycle.

I cannot comment on GPU peak performance. But for Intel Xeon X5570, my calculation gives 94 GFlops theoretical peak performance for double precision floating-point operations and 188 GFlops for single precision floating-point operations. This is based on 2.93 GHz CPU frequency, 2 sockets, 4 cores per socket, and assumes all operations are vectorized. You can use this information to compute an upper limit of the speed for you operations.

For more complete information about compiler optimizations, see our Optimization Notice.