What should be the required FLOPS for 16x16 MKL_Complex8 matrix inversion using cpotrf and than cpotri ?
How many CPU clocks it should take on ATOM E3826 CPU and I5-3470 CPU ?
Is there any performance difference using Linux 32bit operating system vs Linux 64bit operating system ? (for those specific CPUs)
Thanks , Nimrod
Approximate flops formula for (S/D)POTRF is 1/3*N^3, (S/D)POTRI is 2/3*N^3, for complex case these multiplied by four.
More precise formulas for complex case which makes sence for such a small size are:
CPOTRF_FLOPS = 6 * N * (N * (N * 1./6. + .5) + 1./3.) + 2 * N * 1./6. * (N * N - 1.);
CPOTRI_FLOPS = 6 * N * (N * (N * 1./3. + 1.) + 2./3.) + 2 * N * (N * (N * 1./3. - .5) + 1./6.)
Usually there is a difference for 32 and 64 bit code, which comes from richer set of registers in Intel 64 architecture and other improvements in x86-64 Application Binary Interface (ABI).
Unfortunately I don't have clock counts for these functions.