Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

FLOPS function in MKL

woshiwuxin
Novice
179 Views
Hi,
It's crucial to find out the theoretical peak performance of the machine when doing some benchmarks. We can use "mkl_get_max_cpu_frequency" function to determine the CPU frequency. But how many floating point operations per clock cycle may vary from one system to another. Is it possible for the Intel MKL developers to add a new function that can calculate this number?
Best regards,
Xin
0 Kudos
1 Solution
TimP
Black Belt
179 Views
There are so many ways already to use /proc/cpuinfo and other /proc/ facilities to check number of cores, including the use of diagnostics provided with Intel libraries and open source ones (e.g. hwloc), that it's not clear what you want in addition.
In an OpenMP or auto-parallel program, you use omp_get_max_threads and omp_get_num_threads for related purposes, and MKL library has its own versions.
In your source code, if you need conditional compilation according to which architecture is enabled for the compiler, you have
#if defined __SSE2__
#if defined __AVX__
#if defined __SSE4_1__
#if defined __SSE3__
etc.
Most brands of C and C++ compilers for these architectures include these pre-defined macros, as does Intel Fortran.
So it seems that a large part of what you ask for is already covered, and cost vs. benefit must be considered for new ones.
My own use of __SSE4_1__ is to determine whether there is efficient support for unaligned loads (on Intel platforms); I've never heard of anyone proposing a better alternative.

View solution in original post

5 Replies
Azua_Garcia__Giovann
179 Views
Hi Xin,

I am also very interested in this, though I must say I was mislead by the title of your OP. I understood from the title you wanted to get a "wall flop count" of how many flops were computed by MKL during a period of time.

Would there be such a function too? basically I am interested in providing Gflops plots of my algorithm and it would be a lot easier if MKL offered this since they know what they are doing separate from the known theoretical complexities and flops.

Thanks in advance,
Best regards,
Giovanni
woshiwuxin
Novice
179 Views
Rightnow, before running any benchmarks, I need to check whether it's an AVX-supported processor? Do I need single or double in the benchmarks? And then do some changes in the source code. It's a bit tedious. In the future, Intel will release new processors with new instructions and wider vectors. So, it could be handy for the application developers to check the CPU info (how many cores are there, the floating point operations per cycle, etc) with API functions.
TimP
Black Belt
180 Views
There are so many ways already to use /proc/cpuinfo and other /proc/ facilities to check number of cores, including the use of diagnostics provided with Intel libraries and open source ones (e.g. hwloc), that it's not clear what you want in addition.
In an OpenMP or auto-parallel program, you use omp_get_max_threads and omp_get_num_threads for related purposes, and MKL library has its own versions.
In your source code, if you need conditional compilation according to which architecture is enabled for the compiler, you have
#if defined __SSE2__
#if defined __AVX__
#if defined __SSE4_1__
#if defined __SSE3__
etc.
Most brands of C and C++ compilers for these architectures include these pre-defined macros, as does Intel Fortran.
So it seems that a large part of what you ask for is already covered, and cost vs. benefit must be considered for new ones.
My own use of __SSE4_1__ is to determine whether there is efficient support for unaligned loads (on Intel platforms); I've never heard of anyone proposing a better alternative.
woshiwuxin
Novice
179 Views
I posted many questiones in Intel Forum. Tim always gives the professional anwser. Thank you very much!
Personally, I avoid parsing cpuinfo. Some systems do not come with /proc, e.g. Mac OS X. Moreover, some systems enable some features of CPU, e.g. SpeedStep, Turbo Boost, so the clock frequency in cpuinfo might be lower.
SergeyKostrov
Valued Contributor II
179 Views
Quoting woshiwuxin
...But how many floating point operations per clock cycle may vary from one system to another. Is it possible for the Intel MKL developers to add a new function that can calculate this number?


There is alreadyIntel Linpack 10.3.7 package,with binaries and sources,and you could try to use it.

Best regards,
Sergey

Reply