processor peak vs. effective performance on parallel platforms question
Knowing the peak floating point operations rate of a processor, is there any rule of thumb for predicting the 'effective' FLOPs? In another words, how many percent of the peak FLOPs, in general, is the effective FLOPs?
In the Top500 supercomputer ratings, an "efficiency" is quoted as the ratio of actual performance to the peak flops rating. This rating can be achieved only where it is possible to use the vendor's optimized BLAS (MKL dgemm, in the case of Intel clusters) and adjust the problem size to maximize efficiency, and the memory bandwidth is not a significant limitation, as it would be for many real applications.