topic Hi Gustafsson, bengt in Intel® oneAPI Math Kernel Library

DNN convolution does not preserve XMM registers as it should.

gustafsson__bengt — Wed, 14 Mar 2018 13:36:50 GMT

When running a MKL_DNN operation previously created using dnnConvolutionCreateForwardBias_F32() the XMM7-8 registers (and possibly more) are not preserved.

I have code similar to this (but of course much more elaborate) which does not work in relase mode:

void RunConvolution() {
    dnnPrimitive_t handle;

    dnnConvolutionCreateForwardBias_F32(&handle, --- other pars ---);

    void* resources[dnnResourceNumber];
    --- fill in pointers to buffers, kernel and bias ---

    dnnExecute_F32(handle, resources);
}


void main()
{
    Timer timer;
    for (o =0; o < 10; o++) {
        for (int i = 0; i < 10; i++)
            RunConvolution();

        cout << timer.Elapsed() / 10.0 << endl;
    }
}

I am using the Visual Studio 2017 compiler and I noted it uses XMM7 to store the constant 10.0 _over_ the entire loops which calls RunConvolution(). I tracked the contents of the register and noted that it is destroyed by dnnExecute_F32() but that's just a thin wrapper so the actual problem is probably in some assembly code in the function redirected to when the handle refers to a structure created by dnnConvolutionCreateForwardBias_F32().

According to Microsoft documentation XMM6 and up are to be preserved over function calls, so the numbers printed are bogus. I can't see any other explanation for this than that the calling convention has been breached by the MKL DNN code.

I am using MKL 2018.1 version, 64 bit, static linking with intel threading (not TBB).

I have not tested other operations except dnnConvolutionCreateForwardBias_F32.

dnnConversionExecute_F32()

gustafsson__bengt — Mon, 19 Mar 2018 14:26:56 GMT

dnnConversionExecute_F32() also does not preserve registers properly.

Hi Gustafsson, bengt

Ying_H_Intel — Tue, 20 Mar 2018 02:44:03 GMT

Hi Gustafsson, bengt

Thank you for raising the question. So as i understand, that the original problem is the code won't work in release mode, the possible reason is XMM7-8 registers (and possibly more) are not preserved. As. https://stackoverflow.com/questions/262162/why-did-windows-64-choose-to-require-xmm6-and-xmm7-to-be-saved-restored

Could you please tell what kind of CPU you are using? does the code work for other machine?

Best Regards,

Ying

The issue is, as you presumed

gustafsson__bengt — Tue, 20 Mar 2018 14:35:36 GMT

The issue is, as you presumed, that you violate the register saving convention that XMM6-7 have to be preserved, and that Microsoft compilers (both 2013 and 2017) rely on these registers actually being preserved.

I'm running a Intel i7-8700.

Hi Gustafssion,

Ying_H_Intel — Wed, 21 Mar 2018 01:05:23 GMT

Hi Gustafssion, Yeah, there is such a problem. We fixed it in Intel MKL 2018 U2 and above, which is supposed to be available soon. You may check the forum announcement Thank you Ying

I tested this with MKL2018.2

gustafsson__bengt — Thu, 29 Mar 2018 07:25:11 GMT

I tested this with MKL2018.2 and it fixed the problem.

Thanks for the good work!

Hi Gustafsson,

Ying_H_Intel — Thu, 29 Mar 2018 07:52:33 GMT

Hi Gustafsson,

Thank you a lot for the confirmation!

Thanks,
Ying