Using FMA in MKL routines

Kat__Swat · ‎02-13-2019

Hey everyone,

I couldn't find any old topics that dealt with this question in detail, so here I am asking it again: is there a way to enable FMA math when using the MKL routines? Here is a sample routine that when run on MSVC 2017 with the latest MKL version (details in the output below) and an AVX2 processor DOES NOT use FMA:

void print_mkl_info() {
    MKLVersion Version;
    mkl_get_version(&Version);
    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Platform:                %s\n",Version.Platform);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");
}

float standard_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i < 4; i++) {
        c = c + (a * b);
    }
    return c;
}

float standard_fma_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i < 4; i++) {
        c = fmaf(a, b, c);
    }
    return c;
}

float mkl_dot_product(float* a, float* b) {
    return cblas_sdot(4, a, 1, b, 1);
}

int main() {
    print_mkl_info();
    float a[4] = { 1.907607, -.7862027, 1.148311, .9604002 };
    float b[4] = { -.9355000, -.6915108, 1.724470, -.7097529 };
    printf("Standard dot product is:     %.23f\n", standard_dot_product(a, b));
    printf("Standard FMA dot product is: %.23f\n", standard_fma_dot_product(a, b));
    printf("MKL dot product is:          %.23f\n", mkl_dot_product(a, b));
    return 0;
}

The above program outputs (compiled with FP:FAST and O2. Note that changing O2 to O1 changes the result of the standard_dot_product function, but not of the CBLAS routine):

Major version:           2019
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20190118
Platform:                32-bit
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Standard dot product is:     0.05768233537673950195313
Standard FMA dot product is: 0.05768235772848129272461
MKL dot product is:          0.05768233537673950195313

So is there anyway to generate results with FMA in such cases? Or am I being a knobhead and missing something?

THANKS!

Swat

Gennady_F_Intel · ‎02-13-2019

Your CPU already supports FMA instructions because of AVX2 code branch has been called.

You may also try to play mkl_enable_instructions(int) to dispatch for another instruction sets.

Kat__Swat · ‎03-13-2019

Gennady F. (Intel) wrote:
Your CPU already supports FMA instructions because of AVX2 code branch has been called.
You may also try to play mkl_enable_instructions(int) to dispatch for another instruction sets.

You were right; my processor does have FMA support, but it looks like that branch is called only when compiled under 64-bit mode. There is a slight difference in the answers though (printed out as integer values for easy comparison, compiled on MSVC 19.11.25507.1 for x64, with /arch:AVX2 and /O2):

Standard dot product is:          1030505552
Standard FMA dot product is:      1030505558
MKL dot product is:               1030505568

Would you happen to know why this difference occurs?