Community
cancel
Showing results for 
Search instead for 
Did you mean: 
woshiwuxin
Beginner
68 Views

FLOPS function in MKL

Jump to solution
Hi,
It's crucial to find out the theoretical peak performance of the machine when doing some benchmarks. We can use "mkl_get_max_cpu_frequency" function to determine the CPU frequency. But how many floating point operations per clock cycle may vary from one system to another. Is it possible for the Intel MKL developers to add a new function that can calculate this number?
Best regards,
Xin
0 Kudos
1 Solution
TimP
Black Belt
68 Views
There are so many ways already to use /proc/cpuinfo and other /proc/ facilities to check number of cores, including the use of diagnostics provided with Intel libraries and open source ones (e.g. hwloc), that it's not clear what you want in addition.
In an OpenMP or auto-parallel program, you use omp_get_max_threads and omp_get_num_threads for related purposes, and MKL library has its own versions.
In your source code, if you need conditional compilation according to which architecture is enabled for the compiler, you have
#if defined __SSE2__
#if defined __AVX__
#if defined __SSE4_1__
#if defined __SSE3__
etc.
Most brands of C and C++ compilers for these architectures include these pre-defined macros, as does Intel Fortran.
So it seems that a large part of what you ask for is already covered, and cost vs. benefit must be considered for new ones.
My own use of __SSE4_1__ is to determine whether there is efficient support for unaligned loads (on Intel platforms); I've never heard of anyone proposing a better alternative.

View solution in original post

5 Replies
68 Views
Hi Xin,

I am also very interested in this, though I must say I was mislead by the title of your OP. I understood from the title you wanted to get a "wall flop count" of how many flops were computed by MKL during a period of time.

Would there be such a function too? basically I am interested in providing Gflops plots of my algorithm and it would be a lot easier if MKL offered this since they know what they are doing separate from the known theoretical complexities and flops.

Thanks in advance,
Best regards,
Giovanni
woshiwuxin
Beginner
68 Views
Rightnow, before running any benchmarks, I need to check whether it's an AVX-supported processor? Do I need single or double in the benchmarks? And then do some changes in the source code. It's a bit tedious. In the future, Intel will release new processors with new instructions and wider vectors. So, it could be handy for the application developers to check the CPU info (how many cores are there, the floating point operations per cycle, etc) with API functions.
TimP
Black Belt
69 Views
There are so many ways already to use /proc/cpuinfo and other /proc/ facilities to check number of cores, including the use of diagnostics provided with Intel libraries and open source ones (e.g. hwloc), that it's not clear what you want in addition.
In an OpenMP or auto-parallel program, you use omp_get_max_threads and omp_get_num_threads for related purposes, and MKL library has its own versions.
In your source code, if you need conditional compilation according to which architecture is enabled for the compiler, you have
#if defined __SSE2__
#if defined __AVX__
#if defined __SSE4_1__
#if defined __SSE3__
etc.
Most brands of C and C++ compilers for these architectures include these pre-defined macros, as does Intel Fortran.
So it seems that a large part of what you ask for is already covered, and cost vs. benefit must be considered for new ones.
My own use of __SSE4_1__ is to determine whether there is efficient support for unaligned loads (on Intel platforms); I've never heard of anyone proposing a better alternative.

View solution in original post

woshiwuxin
Beginner
68 Views
I posted many questiones in Intel Forum. Tim always gives the professional anwser. Thank you very much!
Personally, I avoid parsing cpuinfo. Some systems do not come with /proc, e.g. Mac OS X. Moreover, some systems enable some features of CPU, e.g. SpeedStep, Turbo Boost, so the clock frequency in cpuinfo might be lower.
SergeyKostrov
Valued Contributor II
68 Views
Quoting woshiwuxin
...But how many floating point operations per clock cycle may vary from one system to another. Is it possible for the Intel MKL developers to add a new function that can calculate this number?


There is alreadyIntel Linpack 10.3.7 package,with binaries and sources,and you could try to use it.

Best regards,
Sergey

Reply