<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Lennart,  in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DNN-convolution-has-the-wrong-output-order-on-Intel-R-Xeon-R/m-p/1139279#M26196</link>
    <description>&lt;P&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;Hi Lennart, &lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;Thank you a lot&amp;nbsp;for the reporting this.&amp;nbsp; The&amp;nbsp;result is by&amp;nbsp;design.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;Actually implementation depends on different machine type and convolution shape etc, so that output layout can be different on&amp;nbsp;different machine (or even different layout on same machine).&amp;nbsp;For the case when number of input channels (ic) =1 and number of output channels (oc) is divisible on SIMD width (8 for avx2)&amp;nbsp;, the function&amp;nbsp;will call optimized code that will produce output in SIMD-friendly format blocked by channels - nChw8c, where n is batch size, C – number of blocks by channels, h is spatial heights and w is spatial widths, instead of the plain format NCHW. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;There is a some explanation about data layouts and common programming model in this article &lt;A href="https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl"&gt;&lt;U&gt;&lt;FONT color="#0563c1"&gt;&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl" target="_blank"&gt;https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl&lt;/A&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;​If you hope to see the plain format whatever, you may need to call convert (reorder) at end of the output. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;In any case, we will recommend you to try the &lt;/SPAN&gt;&lt;A href="https://github.com/intel/mkl-dnn/"&gt;&lt;U&gt;&lt;FONT color="#0563c1" face="Calibri" size="3"&gt;MKL-DNN&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;FONT color="#1f497d" face="Calibri" size="3"&gt; &lt;/FONT&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt; instead of NN primitive MKL as&amp;nbsp;better&amp;nbsp;functionality and performance&amp;nbsp;there. &amp;nbsp;&lt;BR /&gt;
	&lt;BR /&gt;
	Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;​Ying &lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Jun 2018 01:31:32 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2018-06-15T01:31:32Z</dc:date>
    <item>
      <title>MKL_DNN convolution has the wrong output order on Intel(R) Xeon(R) CPU E5-2650 v3 (Possible bug)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DNN-convolution-has-the-wrong-output-order-on-Intel-R-Xeon-R/m-p/1139278#M26195</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;

&lt;P&gt;I recently implemented the convolution of the intel mkl library as described in the example included with the library. Everything is fine and dandy on my Laptop with a&amp;nbsp; i5-3210M. However when I tried to run the code on the big machine, with an Intel(R) Xeon(R) CPU E5-2650 v3 i ran into some bugs/problems.&lt;/P&gt;

&lt;P&gt;For outputs that have a channel size that is a multiple of 8 the order of the output is wrong. This is either a mistake on my side (probably with the compile options) or in the worst case a bug in the mkl. I wrote a short test script similar to the example file, that implements a standard forward convolution.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;
#include &amp;lt;iostream&amp;gt;
#include "mkl_dnn.h"
#include &amp;lt;vector&amp;gt;
using namespace std;


#define dimension (4)
int main() {

	dnnPrimitiveAttributes_t attributes;
	dnnPrimitive_t conv_prim = NULL;


	float* resConv1[dnnResourceNumber] = {0};

	size_t batch_num = 1;


	bool use_bias = false;

	size_t xinp = 4,
		yinp = 4,
		xout = 4,
		yout = 4,
		inpchannels = 1,
		outchannels = 8,
		xfilt = 3,
		yfilt = 3;


    size_t outputSize[dimension] = { xout, yout, outchannels, batch_num };
    size_t outputStrides[dimension] = { 1, xout, xout * yout, xout * yout * outchannels };

    size_t inputSize[dimension] = { xinp, yinp, inpchannels, batch_num };
    size_t inputStrides[dimension] = { 1, xinp, xinp * yinp, xinp * yinp * inpchannels };

    size_t filterSize[dimension] = { xfilt, yfilt, inpchannels, outchannels };
    size_t filterStrides[dimension] = { 1, xfilt, xfilt * yfilt, xfilt * yfilt * inpchannels };

    size_t biasSize[1] = { outputSize[2] };
    size_t biasStrides[1] = { outputStrides[2] };

    size_t convolutionStride[dimension - 2] = { 1, 1 };
    int inputOffset[dimension - 2 ] = { - ( (outputSize[0]/2)) - filterSize[0]/2 + inputSize[0]/2, - ( (outputSize[0]/2)) - filterSize[0]/2 + inputSize[0]/2 };

    dnnLayout_t lt_conv1_input = NULL,
                lt_conv1_filt = NULL,
                lt_conv1_bias = NULL,
                lt_conv1_output = NULL;




	if( dnnPrimitiveAttributesCreate_F32(&amp;amp;attributes)!= E_SUCCESS){
		std::cout &amp;lt;&amp;lt; "error" &amp;lt;&amp;lt; std::endl;
	}
	dnnError_t err;
	if( use_bias ){
		err= dnnConvolutionCreateForwardBias_F32(&amp;amp;conv_prim, attributes,
	                    dnnAlgorithmConvolutionDirect, dimension, inputSize,
	                    outputSize, filterSize, convolutionStride, inputOffset,
	                    dnnBorderZeros);
	}else{
		err = dnnConvolutionCreateForward_F32(&amp;amp;conv_prim, attributes,
						dnnAlgorithmConvolutionDirect, dimension, inputSize,
						outputSize, filterSize, convolutionStride, inputOffset,
						 dnnBorderZeros);
	}

	if( err != E_SUCCESS){
		switch (err){
		case E_INCORRECT_INPUT_PARAMETER:
				std::cout &amp;lt;&amp;lt; "incorrect input parameter while creating the convolution" &amp;lt;&amp;lt; std::endl;break;
		default:
			std::cout &amp;lt;&amp;lt; "error while creating convolution" &amp;lt;&amp;lt; std::endl;
		}

	}

    dnnLayoutCreateFromPrimitive_F32(&amp;amp;lt_conv1_input, conv_prim, dnnResourceSrc);
    dnnLayoutCreateFromPrimitive_F32(&amp;amp;lt_conv1_filt, conv_prim, dnnResourceFilter);
    if( use_bias){
    	dnnLayoutCreateFromPrimitive_F32(&amp;amp;lt_conv1_bias, conv_prim, dnnResourceBias);
    }
    dnnLayoutCreateFromPrimitive_F32(&amp;amp;lt_conv1_output,conv_prim, dnnResourceDst);


    std::vector&amp;lt;float&amp;gt; input(xinp*yinp*inpchannels,1.0);
    std::vector&amp;lt;float&amp;gt; output(xout*yout*outchannels,1.0);
    std::vector&amp;lt;float&amp;gt; filter(xfilt*yfilt*inpchannels*outchannels,1.0);
    std::vector&amp;lt;float&amp;gt; bias(outchannels,1.0);

    resConv1[dnnResourceSrc] = &amp;amp;(input[0]);
    resConv1[dnnResourceFilter] = &amp;amp;filter[0];
    if( use_bias)  resConv1[dnnResourceBias] = &amp;amp;bias[0];
    resConv1[dnnResourceDst]= &amp;amp;output[0];

    dnnError_t err_exe = dnnExecute_F32(conv_prim, (void**) resConv1);
    if( err_exe != E_SUCCESS){
    	std::cout &amp;lt;&amp;lt; "Error while forward propagation in convolutional layer" &amp;lt;&amp;lt; std::endl;
    	if( err_exe== E_MEMORY_ERROR){
    		std::cout &amp;lt;&amp;lt; "Memory Error" &amp;lt;&amp;lt; std::endl;
    	}
    	if( err_exe == E_UNIMPLEMENTED){
    		std::cout &amp;lt;&amp;lt; "Unimplemented" &amp;lt;&amp;lt; std::endl;
    	}
    	if( err_exe == E_UNSUPPORTED_DIMENSION){
    		std::cout &amp;lt;&amp;lt; "Unsupported dimension" &amp;lt;&amp;lt; std::endl;
    	}
    	if( err_exe == E_INCORRECT_INPUT_PARAMETER){
    		std::cout &amp;lt;&amp;lt; "Incorrect input parameter" &amp;lt;&amp;lt; std::endl;
    	}
    }

    std::cout &amp;lt;&amp;lt; "output" &amp;lt;&amp;lt;std::endl;
    for( int i=0; i &amp;lt; output.size(); i++){
    	std::cout &amp;lt;&amp;lt; output&lt;I&gt; &amp;lt;&amp;lt; " ";
    }
    std::cout &amp;lt;&amp;lt; std::endl;
	return 0;
}&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The desired output for a 4x4 image with 8 convolutions and an input of 1s and 3x3 filters of 1s is:&lt;/P&gt;

&lt;P&gt;4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4&lt;/P&gt;

&lt;P&gt;This is also what my mobile CPU gives me when i run the code. However on the big PC i get&lt;/P&gt;

&lt;P&gt;4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4&lt;/P&gt;

&lt;P&gt;which is obviously somewhat right, but not in the right order. However when i change the output channel to not be a multiple of 8 the code runs fine even on the Xeon CPU. This might be due to the mkl switching to a slower and different algorithm as explained in this post:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/761063" target="_blank"&gt;https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/761063&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Does anybody have an explanation or even a fix for this issue? Is this known behaviour on Xeon CPUs, or a bug in the software? I don't necessarily wan't to switch to the open source implementation, since it would mean a week of new implementing/testing.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;For compilation i used the following linkline for both systems :&lt;/P&gt;

&lt;P&gt;&amp;nbsp;-L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl&lt;/P&gt;

&lt;P&gt;&amp;nbsp;-I${MKLROOT}/include -I${MKLROOT}/../lib/intel64_lin&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;any help would be appreciated.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Jun 2018 14:36:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DNN-convolution-has-the-wrong-output-order-on-Intel-R-Xeon-R/m-p/1139278#M26195</guid>
      <dc:creator>Lennart_S_</dc:creator>
      <dc:date>2018-06-12T14:36:43Z</dc:date>
    </item>
    <item>
      <title>Hi Lennart,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DNN-convolution-has-the-wrong-output-order-on-Intel-R-Xeon-R/m-p/1139279#M26196</link>
      <description>&lt;P&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;Hi Lennart, &lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;Thank you a lot&amp;nbsp;for the reporting this.&amp;nbsp; The&amp;nbsp;result is by&amp;nbsp;design.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;Actually implementation depends on different machine type and convolution shape etc, so that output layout can be different on&amp;nbsp;different machine (or even different layout on same machine).&amp;nbsp;For the case when number of input channels (ic) =1 and number of output channels (oc) is divisible on SIMD width (8 for avx2)&amp;nbsp;, the function&amp;nbsp;will call optimized code that will produce output in SIMD-friendly format blocked by channels - nChw8c, where n is batch size, C – number of blocks by channels, h is spatial heights and w is spatial widths, instead of the plain format NCHW. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;There is a some explanation about data layouts and common programming model in this article &lt;A href="https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl"&gt;&lt;U&gt;&lt;FONT color="#0563c1"&gt;&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl" target="_blank"&gt;https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl&lt;/A&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;​If you hope to see the plain format whatever, you may need to call convert (reorder) at end of the output. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;In any case, we will recommend you to try the &lt;/SPAN&gt;&lt;A href="https://github.com/intel/mkl-dnn/"&gt;&lt;U&gt;&lt;FONT color="#0563c1" face="Calibri" size="3"&gt;MKL-DNN&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;FONT color="#1f497d" face="Calibri" size="3"&gt; &lt;/FONT&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt; instead of NN primitive MKL as&amp;nbsp;better&amp;nbsp;functionality and performance&amp;nbsp;there. &amp;nbsp;&lt;BR /&gt;
	&lt;BR /&gt;
	Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px;"&gt;&lt;SPAN style="margin: 0px; color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;​Ying &lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2018 01:31:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DNN-convolution-has-the-wrong-output-order-on-Intel-R-Xeon-R/m-p/1139279#M26196</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-06-15T01:31:32Z</dc:date>
    </item>
  </channel>
</rss>

