<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Dear Shubha, in Intel® Distribution of OpenVINO™ Toolkit</title>
    <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148850#M11973</link>
    <description>&lt;P&gt;Dear Shubha,&lt;/P&gt;&lt;P&gt;Finally my custom operation&amp;nbsp;is working at 100%!&lt;/P&gt;&lt;P&gt;I think dot product can't be done that way (in this context)&amp;nbsp;because the weights and inputs are stored in a long one-dimensional vector, so&amp;nbsp; what I did was implementing&amp;nbsp;the mask to obtain&amp;nbsp;the weight indexes. Here is the complete code for the previously mentioned function:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;StatusCode execute(std::vector&amp;lt;Blob::Ptr&amp;gt;&amp;amp; inputs, std::vector&amp;lt;Blob::Ptr&amp;gt;&amp;amp; outputs,
                       ResponseDesc *resp) noexcept override {
        
        if (inputs.size() != 1 || outputs.empty()) {
            if (resp) {
                std::string errorMsg = "Incorrect number of input or output edges!";
                errorMsg.copy(resp-&amp;gt;msg, sizeof(resp-&amp;gt;msg) - 1);
            }
            return GENERAL_ERROR;
        }

        const float* src = inputs[0]-&amp;gt;buffer();
        const float* scl = weights-&amp;gt;buffer();
        float* dst = outputs[0]-&amp;gt;buffer();

        SizeVector in_dims = inputs[0]-&amp;gt;getTensorDesc().getDims();
        SizeVector out_dims = outputs[0]-&amp;gt;getTensorDesc().getDims();

        const int in_neurons = static_cast&amp;lt;int&amp;gt;(in_dims[1]);
        const int out_neurons = static_cast&amp;lt;int&amp;gt;(out_dims[1]);

        for(size_t n = 0; n &amp;lt; out_neurons; n++){
            float accum = 0.0;
            for(size_t i = 0; i &amp;lt; in_neurons; i++){
                accum += src&lt;I&gt;*scl[i*out_neurons + n];
            }
            dst&lt;N&gt; = accum;
        }   
        return OK;
    }&lt;/N&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;However now I have another problem. After running the benchmark script I saw&amp;nbsp;that&amp;nbsp;my custom function (dot)&amp;nbsp;represents&amp;nbsp;the&amp;nbsp;bottleneck of the application:&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;CONV3-32: [28x28x32] &amp;nbsp;memory: &amp;nbsp;28*28*32=25K &amp;nbsp; weights: (3*3*32)*32 = 9K &amp;nbsp; nr_operations: 7M &amp;nbsp; time &lt;S&gt;: 0.049000

FC1: [1x1x512] &amp;nbsp;memory: &amp;nbsp;576 &amp;nbsp; weights: 3*3*64*512 = 295K &amp;nbsp; nr_operations: 295K &amp;nbsp; time &lt;S&gt;: 0.592000&lt;/S&gt;&lt;/S&gt;&lt;/PRE&gt;

&lt;P&gt;You can see by the results above that a convolutional layer that has 7M MACs (Multiply-Accumulate operations) runs in 49 ms, while a fully connected layer (implemented with my custom&amp;nbsp;dot operation) runs in 592 ms (x12 slower). This is&amp;nbsp;something that should not occur&amp;nbsp;because&amp;nbsp;the number of calculations (processing time) for the convolutional layers is much higher than the number of calculations in the fully connected.&lt;/P&gt;
&lt;P&gt;I think my next step will be to take a look at the Advanced Vector Extensions (in this case AVX2).&lt;/P&gt;</description>
    <pubDate>Mon, 09 Sep 2019 14:30:51 GMT</pubDate>
    <dc:creator>Gouveia__César</dc:creator>
    <dc:date>2019-09-09T14:30:51Z</dc:date>
    <item>
      <title>Not able to generate openvino IR for simple model (mnist) using mxnet</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148841#M11964</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;I tried to convert a simple MXNet model (for MNIST) to&amp;nbsp;an optimized&amp;nbsp;Intermediate Representation (IR) using the openVINO toolkit. I use&amp;nbsp;the following command to convert:&lt;/P&gt;&lt;P&gt;python mo_mxnet.py --input_model test_model\mnist_cnn-0000.params --input_shape (1,1,28,28) --reverse_input_channels&lt;/P&gt;&lt;P&gt;But when I try to run it it shows the following error:&lt;/P&gt;&lt;P&gt;[ ERROR ] &amp;nbsp;Unexpected exception happened during extracting attributes for node dense_1/kernel1.&lt;BR /&gt;Original exception message: Operation 'dot' not supported. Please register it as custom op.&lt;/P&gt;&lt;P&gt;The toolkit fully supports the dense layers for MXNet?&amp;nbsp;or is there something I'm doing wrong? Attached is the network definition file.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Aug 2019 14:41:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148841#M11964</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-08-21T14:41:05Z</dc:date>
    </item>
    <item>
      <title>Dear Gouveia, César,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148842#M11965</link>
      <description>&lt;P&gt;Dear&amp;nbsp;Gouveia, César,&lt;/P&gt;&lt;P&gt;While I don't have your full debug log (obtained by --log_level DEBUG) If this is indeed coming from Model Optimizer:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Original exception message: Operation 'dot' not supported. Please register it as custom op.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;It means that dot.py is not found under here:&lt;/P&gt;&lt;P&gt;C:\Program Files (x86)\IntelSWTools\openvino_2019.2.242\deployment_tools\model_optimizer\mo\ops&lt;/P&gt;&lt;P&gt;or even under here:&lt;/P&gt;&lt;P&gt;C:\Program Files (x86)\IntelSWTools\openvino_2019.2.242\deployment_tools\model_optimizer\extensions\ops&lt;/P&gt;&lt;P&gt;You can add one though by creating something like "dot.py" in one of those locations. Is dot a "dot product" operation ? There maybe a way to modify the model to avoid the dot operation.&lt;/P&gt;&lt;P&gt;My guess is that the complaint is about not finding&amp;nbsp;&lt;A href="https://mxnet.incubator.apache.org/api/scala/ndarray.html#dot-product"&gt;NDArray Dot&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please attach your debug log to this forum ticket so that I can investigate further.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shubha&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2019 20:59:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148842#M11965</guid>
      <dc:creator>Shubha_R_Intel</dc:creator>
      <dc:date>2019-08-22T20:59:17Z</dc:date>
    </item>
    <item>
      <title>Hi Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148843#M11966</link>
      <description>&lt;P&gt;Hi Shubha,&lt;/P&gt;&lt;P&gt;First of all I apologize for the delayed response. I also want to thank&amp;nbsp;you&amp;nbsp;for your&amp;nbsp;answer&amp;nbsp;and for the attention&amp;nbsp;that&amp;nbsp;is being&amp;nbsp;given to this&amp;nbsp;matter.&lt;/P&gt;&lt;P&gt;I did some research and I found that the two&amp;nbsp;mxnet&amp;nbsp;core packages are &lt;STRONG&gt;NDArray&lt;/STRONG&gt; and &lt;STRONG&gt;Symbol&lt;/STRONG&gt;.&amp;nbsp;The symbol package uses the &lt;A href="https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.FullyConnected"&gt;FullyConnected&lt;/A&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;operation, and the&amp;nbsp;NDArray package uses the &lt;A href="https://beta.mxnet.io/api/ndarray/_autogen/mxnet.ndarray.dot.html"&gt;dot&lt;/A&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;operation. This dot operation is not listed under the &lt;A href="https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html#mxnet_supported_symbols_and_the_mapping_to_the_intermediate_representation_layers"&gt;supported operations&lt;/A&gt; for mxnet&amp;nbsp;by openVINO&amp;nbsp;and as you said there is no&amp;nbsp;dot.py&amp;nbsp;under&amp;nbsp;&lt;STRONG&gt;\model_optimizer\mo\ops&lt;/STRONG&gt; and&amp;nbsp;&lt;STRONG&gt;\model_optimizer\extensions\ops&lt;/STRONG&gt;. Yes, the documentation says that the dot operation is a dot product of two arrays.&lt;/P&gt;&lt;P&gt;Yes I have two options either try modifying the network definition&amp;nbsp;and replacing dot operations with fully connected operations, or create a dot.py file as you said and as described in &lt;A href="https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives.html"&gt;here&lt;/A&gt;&amp;nbsp;and &lt;A href="https://software.intel.com/en-us/forums/computer-vision/topic/805980"&gt;here&lt;/A&gt;. I don´t know how to create this dot.py file&amp;nbsp;and this is what I am researching now.&amp;nbsp;Any tips&amp;nbsp;you could provide&amp;nbsp;would be helpful!&lt;/P&gt;&lt;P&gt;I attached my full log (--log_level DEBUG as you mentioned).&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;César.&lt;/P&gt;&lt;P&gt;EDIT: Apparently there is a &lt;A href="https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.dot"&gt;dot&lt;/A&gt; operation in the symbol mxnet API too.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Aug 2019 09:20:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148843#M11966</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-08-23T09:20:00Z</dc:date>
    </item>
    <item>
      <title>Dear Gouveia, César,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148844#M11967</link>
      <description>&lt;P&gt;Dear&amp;nbsp;Gouveia, César,&lt;/P&gt;&lt;P&gt;I certainly understand your situation. Please read the following&amp;nbsp;&lt;A href="https://software.intel.com/en-us/forums/computer-vision/topic/805980"&gt;IDZ Custom Layers Post&lt;/A&gt;&amp;nbsp;where I give detailed information on how to build custom layers for OpenVino. The &lt;A href="https://github.com/opencv/dldt"&gt;dldt github&lt;/A&gt;&amp;nbsp;links it points to are 2018 but you can find the same links in the 2019 R2 repo.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please also take a look at&amp;nbsp;&lt;A href="https://github.com/david-drew/OpenVINO-Custom-Layers"&gt;The OpenVino Custom layer Tutorial&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As for how to build an "dot.py", the best advice I can give you is to study existing ops in those directories I pointed you to earlier. They are all Python code. There is no easy answer. We unfortunately don't have documentation on these.&lt;/P&gt;&lt;P&gt;Hope it helps,&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shubha&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 22:04:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148844#M11967</guid>
      <dc:creator>Shubha_R_Intel</dc:creator>
      <dc:date>2019-08-27T22:04:34Z</dc:date>
    </item>
    <item>
      <title>Hi Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148845#M11968</link>
      <description>&lt;P&gt;Hi&amp;nbsp;Shubha,&lt;/P&gt;&lt;P&gt;In the meanwhile, I made some progresses. I'm going to divide my post&amp;nbsp;in two phases:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1º phase:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I can now generate the IR model, using a dot_ext.py and a dot.py. However an error message appears when I try to&amp;nbsp;generate&amp;nbsp;the model with&amp;nbsp;reverse&amp;nbsp;input channels (RGB to BGR) using&amp;nbsp;the&amp;nbsp;--reverse_input_channels flag (without this flag works fine):&lt;/P&gt;&lt;BLOCKQUOTE&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;[ ERROR ]  Reverse input channels are not applied -- appropriate convolutions were not found

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: C:\Program Files (x86)\IntelSWTools\openvino_2019.2.242\.\mnist_cnn-0000.xml
[ SUCCESS ] BIN file: C:\Program Files (x86)\IntelSWTools\openvino_2019.2.242\.\mnist_cnn-0000.bin
[ SUCCESS ] Total execution time: 1.65 seconds.&lt;/PRE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;My question is: this flag appears because mxnet already uses the BGR notation, or because I'm doing something wrong while generating the model?&amp;nbsp;I need to clarify this point because of this note:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;P&gt;**NOTE!**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Attached to this post I provide my&amp;nbsp;dot.py, dot_ext.py and my&amp;nbsp;full log (--log_level DEBUG) for this particular error. Can you please check if the&amp;nbsp;python files (dot.py and dot_ext.py) are correct as well as&amp;nbsp;the reason of this error?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;2º phase:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In this phase I tried to execute the Model with the Custom Layer (using the&amp;nbsp;C++ sample).&amp;nbsp;The build_samples_msvc.bat builds correctly using my ext_dot.cpp however the inference stops and doesn't shows any ERROR message which is strange:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;[ INFO ] InferenceEngine:
        API version ............ 2.0
        Build .................. 27579
        Description ....... API
[ INFO ] Parsing input parameters
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ]     deployment_tools\pics\digit_8.bmp
[ INFO ] Creating Inference Engine
[ INFO ] CPU Extension loaded: C:\Users\cesar.gouveia\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release\cpu_extension.dll
        CPU
        MKLDNNPlugin version ......... 2.0
        Build ........... 27579

[ INFO ] Loading network files
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (280, 280) to (28, 28)
[ INFO ] Batch size is 1
[ INFO ] Loading model to the device
[ INFO ] Create infer request
[ INFO ] Start inference (10 asynchronous executions)&lt;/PRE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I tried to look for a debug flag to provide more information but without success:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;%USERPROFILE%\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release\classification_sample_async.exe -h
[ INFO ] InferenceEngine:
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; API version ............ 2.0
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Build .................. 27579
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Description ....... API
[ INFO ] Parsing input parameters

classification_sample_async [OPTION]
Options:

&amp;nbsp; &amp;nbsp; -h &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Print a usage message.
&amp;nbsp; &amp;nbsp; -i "&amp;lt;path&amp;gt;" &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Required. Path to a folder with images or path to an image files: a .ubyte file for LeNetand a .bmp file for the other networks.
&amp;nbsp; &amp;nbsp; -m "&amp;lt;path&amp;gt;" &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Required. Path to an .xml file with a trained model.
&amp;nbsp; &amp;nbsp; &amp;nbsp; -l "&amp;lt;absolute_path&amp;gt;" &amp;nbsp;Required for CPU custom layers.Absolute path to a shared library with the kernels implementation
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Or
&amp;nbsp; &amp;nbsp; &amp;nbsp; -c "&amp;lt;absolute_path&amp;gt;" &amp;nbsp;Required for GPU custom kernels.Absolute path to the .xml file with kernels description
&amp;nbsp; &amp;nbsp; -d "&amp;lt;device&amp;gt;" &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Optional. Specify the target device to infer on (the list of available devices is shown below). Default value is CPU. Sample will look for a suitable plugin for device specified.
&amp;nbsp; &amp;nbsp; -nt "&amp;lt;integer&amp;gt;" &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Optional. Number of top results. Default value is 10.
&amp;nbsp; &amp;nbsp; -p_msg &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Optional. Enables messages from a plugin

Available target devices: &amp;nbsp;CPU &amp;nbsp;GNA &amp;nbsp;GPU &amp;nbsp;HDDL
&lt;/PRE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Attached goes the ext_dot.cpp and the logs of my&amp;nbsp;build (build.log). Can you please verify this files too?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;César.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 13:38:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148845#M11968</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-08-28T13:38:46Z</dc:date>
    </item>
    <item>
      <title>Dearest Gouveia, César,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148846#M11969</link>
      <description>&lt;P&gt;Dearest&amp;nbsp;Gouveia, César,&lt;/P&gt;&lt;P&gt;I am not ignoring you ! I promise to take a look.&lt;/P&gt;&lt;P&gt;Shubha&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 22:14:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148846#M11969</guid>
      <dc:creator>Shubha_R_Intel</dc:creator>
      <dc:date>2019-08-28T22:14:36Z</dc:date>
    </item>
    <item>
      <title>Dear Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148847#M11970</link>
      <description>&lt;P&gt;Dear Shubha,&lt;/P&gt;&lt;P&gt;Thank&amp;nbsp;you&amp;nbsp;very&amp;nbsp;much,&amp;nbsp;for&amp;nbsp;your&amp;nbsp;willingness to&amp;nbsp;answer my&amp;nbsp;questions.&lt;/P&gt;&lt;P&gt;I&amp;nbsp;look&amp;nbsp;forward to your&amp;nbsp;answer,&lt;/P&gt;&lt;P&gt;César.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Aug 2019 09:03:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148847#M11970</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-08-29T09:03:12Z</dc:date>
    </item>
    <item>
      <title>Hi Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148848#M11971</link>
      <description>&lt;P&gt;Hi Shubha,&lt;/P&gt;&lt;P&gt;I have made some&amp;nbsp;progresses and I think I am very close to the solution!&amp;nbsp;Inference now runs&amp;nbsp;without crashing and I'm able to output a prediction value, however this value is not correct and the model does not predict correctly, which makes me&amp;nbsp;think that there is still something wrong implemented in the dot operation.&lt;/P&gt;&lt;P&gt;I checked if both weights and outputs relative to the dot1 operation/layer&amp;nbsp;were equal/similar to the weights and output values&amp;nbsp;produced by the dense1 keras, by importing slog.hpp which enables info message printing. The weights are being read correctly, however the dot1&amp;nbsp;output values are different from the ones being produced by keras, is this a normal? Should the values be similar? Or the inference engine does some "out of the box" optimizations?&amp;nbsp;Below are&amp;nbsp;the output values ​​calculated by keras (dense_1)&amp;nbsp;and those calculated by the inference engine on openVINO using the IR model (dot1):&lt;/P&gt;&lt;BLOCKQUOTE&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;Keras 10 highest output values for the Layer dense_1 array:

[4.927633  4.5581803 4.235109  4.0994024 4.0133104 3.9622984 3.7825406
 3.7398129 3.7224526 3.5236738]
&lt;/PRE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;OpenVINO 10 highest output values for the dot1 operation/layer:

Top 10 results:

Image deployment_tools\data\digit_8.png

classid probability
------- -----------
38 &amp;nbsp; &amp;nbsp; &amp;nbsp;1.4662519
62 &amp;nbsp; &amp;nbsp; &amp;nbsp;1.3287330
51 &amp;nbsp; &amp;nbsp; &amp;nbsp;1.3258586
49 &amp;nbsp; &amp;nbsp; &amp;nbsp;1.3234890
4 &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.0815108
76 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.9692022
100 &amp;nbsp; &amp;nbsp; 0.9581612
61 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.8927912
83 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.8498729
78 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.8213191
&lt;/PRE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Attached&amp;nbsp;is my current&amp;nbsp;version of ext_dot.cpp. Can you please check it?&lt;/P&gt;
&lt;P&gt;I look forward&amp;nbsp;to your reply.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;César.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Sep 2019 11:06:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148848#M11971</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-09-04T11:06:00Z</dc:date>
    </item>
    <item>
      <title>Dear Gouveia, César,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148849#M11972</link>
      <description>&lt;P&gt;Dear&amp;nbsp;Gouveia, César,&lt;/P&gt;&lt;P&gt;I am glad that you got so far on your own. But indeed, your results look way off Keras's numbers. I apologize that I haven't gotten back to you sooner but I'm sure you can understand, I'm super busy with other customers.&lt;/P&gt;&lt;P&gt;I'm looking at your code in ext_dot.cpp and it doesn't look correct for dot product.&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;for(size_t n = 0; n &amp;lt; out_neurons; n++){
            float accum = 0.0;
            for(size_t i = 0; i &amp;lt; in_neurons; i++){
                accum += src&lt;I&gt;*scl[n*in_neurons + i];
            }
            dst&lt;N&gt; = accum;
        }&lt;/N&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;Dot Product is each row element multiplied by each column element added together.&amp;nbsp; Here are a couple of ways to do that in C++&lt;/P&gt;
&lt;P&gt;&lt;A href="https://rosettacode.org/wiki/Dot_product#C.2B.2B"&gt;https://rosettacode.org/wiki/Dot_product#C.2B.2B&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.sanfoundry.com/cpp-program-calculate-dot-product-two-matrices/"&gt;https://www.sanfoundry.com/cpp-program-calculate-dot-product-two-matrices/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Your loops look different from this one (which is more readable and looks correct to me):&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;for (i = 0; i &amp;lt; m; i++)
    {
        C&lt;I&gt; = 0;
        for (j = 0; j &amp;lt; n; j++)
            C&lt;I&gt; +=  A&lt;I&gt;&lt;J&gt; * B&lt;I&gt;&lt;J&gt;;
 
    }&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;Something you can certainly do though is debug and step through your code (maybe you are already doing that).&lt;/P&gt;
&lt;P&gt;Hope it helps.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Shubha&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2019 20:32:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148849#M11972</guid>
      <dc:creator>Shubha_R_Intel</dc:creator>
      <dc:date>2019-09-05T20:32:10Z</dc:date>
    </item>
    <item>
      <title>Dear Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148850#M11973</link>
      <description>&lt;P&gt;Dear Shubha,&lt;/P&gt;&lt;P&gt;Finally my custom operation&amp;nbsp;is working at 100%!&lt;/P&gt;&lt;P&gt;I think dot product can't be done that way (in this context)&amp;nbsp;because the weights and inputs are stored in a long one-dimensional vector, so&amp;nbsp; what I did was implementing&amp;nbsp;the mask to obtain&amp;nbsp;the weight indexes. Here is the complete code for the previously mentioned function:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;StatusCode execute(std::vector&amp;lt;Blob::Ptr&amp;gt;&amp;amp; inputs, std::vector&amp;lt;Blob::Ptr&amp;gt;&amp;amp; outputs,
                       ResponseDesc *resp) noexcept override {
        
        if (inputs.size() != 1 || outputs.empty()) {
            if (resp) {
                std::string errorMsg = "Incorrect number of input or output edges!";
                errorMsg.copy(resp-&amp;gt;msg, sizeof(resp-&amp;gt;msg) - 1);
            }
            return GENERAL_ERROR;
        }

        const float* src = inputs[0]-&amp;gt;buffer();
        const float* scl = weights-&amp;gt;buffer();
        float* dst = outputs[0]-&amp;gt;buffer();

        SizeVector in_dims = inputs[0]-&amp;gt;getTensorDesc().getDims();
        SizeVector out_dims = outputs[0]-&amp;gt;getTensorDesc().getDims();

        const int in_neurons = static_cast&amp;lt;int&amp;gt;(in_dims[1]);
        const int out_neurons = static_cast&amp;lt;int&amp;gt;(out_dims[1]);

        for(size_t n = 0; n &amp;lt; out_neurons; n++){
            float accum = 0.0;
            for(size_t i = 0; i &amp;lt; in_neurons; i++){
                accum += src&lt;I&gt;*scl[i*out_neurons + n];
            }
            dst&lt;N&gt; = accum;
        }   
        return OK;
    }&lt;/N&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;However now I have another problem. After running the benchmark script I saw&amp;nbsp;that&amp;nbsp;my custom function (dot)&amp;nbsp;represents&amp;nbsp;the&amp;nbsp;bottleneck of the application:&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;CONV3-32: [28x28x32] &amp;nbsp;memory: &amp;nbsp;28*28*32=25K &amp;nbsp; weights: (3*3*32)*32 = 9K &amp;nbsp; nr_operations: 7M &amp;nbsp; time &lt;S&gt;: 0.049000

FC1: [1x1x512] &amp;nbsp;memory: &amp;nbsp;576 &amp;nbsp; weights: 3*3*64*512 = 295K &amp;nbsp; nr_operations: 295K &amp;nbsp; time &lt;S&gt;: 0.592000&lt;/S&gt;&lt;/S&gt;&lt;/PRE&gt;

&lt;P&gt;You can see by the results above that a convolutional layer that has 7M MACs (Multiply-Accumulate operations) runs in 49 ms, while a fully connected layer (implemented with my custom&amp;nbsp;dot operation) runs in 592 ms (x12 slower). This is&amp;nbsp;something that should not occur&amp;nbsp;because&amp;nbsp;the number of calculations (processing time) for the convolutional layers is much higher than the number of calculations in the fully connected.&lt;/P&gt;
&lt;P&gt;I think my next step will be to take a look at the Advanced Vector Extensions (in this case AVX2).&lt;/P&gt;</description>
      <pubDate>Mon, 09 Sep 2019 14:30:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148850#M11973</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-09-09T14:30:51Z</dc:date>
    </item>
    <item>
      <title>Dear Gouveia, César,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148851#M11974</link>
      <description>&lt;P&gt;Dear&amp;nbsp;Gouveia, César,&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;I think dot product can't be done that way (in this context)&amp;nbsp;because the weights and inputs are stored in a long one-dimensional vector, so&amp;nbsp; what I did was implementing&amp;nbsp;the mask to obtain&amp;nbsp;the weight indexes.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;OK I missed that part. Good on you that you finally got it working ! Congrats ! Take a look at the implementation of&amp;nbsp;&lt;A href="https://github.com/opencv/dldt/blob/2019/inference-engine/src/extension/ext_argmax.cpp"&gt;ext_argmax.cpp&lt;/A&gt;&amp;nbsp;.&amp;nbsp; You will see this in the header file section :&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;#include &amp;lt;ie_parallel.hpp&amp;gt;
#if defined(HAVE_SSE) || defined(HAVE_AVX2) || defined(HAVE_AVX512F)
#include &amp;lt;immintrin.h&amp;gt;
#endif&lt;/PRE&gt;

&lt;P&gt;You will see extensive use of AVX as well as stuff like parallel_for.&lt;/P&gt;
&lt;P&gt;Hope it helps,&lt;/P&gt;
&lt;P&gt;Shubha&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Sep 2019 17:09:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148851#M11974</guid>
      <dc:creator>Shubha_R_Intel</dc:creator>
      <dc:date>2019-09-09T17:09:15Z</dc:date>
    </item>
    <item>
      <title>Hi again Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148852#M11975</link>
      <description>&lt;P&gt;Hi again&amp;nbsp;Shubha,&lt;/P&gt;&lt;P&gt;I have made the&amp;nbsp;following&amp;nbsp;code to perform dot product using the&amp;nbsp;AVX vectors to speed up the operation:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;const float* src = inputs[0]-&amp;gt;buffer();
const float* scl = weights-&amp;gt;buffer();
float* dst = outputs[0]-&amp;gt;buffer();

SizeVector in_dims = inputs[0]-&amp;gt;getTensorDesc().getDims();
SizeVector out_dims = outputs[0]-&amp;gt;getTensorDesc().getDims();

const int in_neurons = static_cast&amp;lt;int&amp;gt;(in_dims[1]);
const int out_neurons = static_cast&amp;lt;int&amp;gt;(out_dims[1]);    

for(size_t n = 0; n &amp;lt; out_neurons; n++){
    float accum = 0.0;
    float temp[4] = {0,0,0,0};
    float *p = temp;

    __m128 in, ws, dp;

    for(size_t i = 0; i &amp;lt; in_neurons; i+=4){

        // read and save the weights correctly by applying the mask
        temp[0] = scl[(i+0)*out_neurons + n];
        temp[1] = scl[(i+1)*out_neurons + n];
        temp[2] = scl[(i+2)*out_neurons + n];
        temp[3] = scl[(i+3)*out_neurons + n];

        // load input neurons sequentially
        in = _mm_load_ps(&amp;amp;src&lt;I&gt;);

        // load weights
        ws = _mm_load_ps(p);

        // dot product
        dp = _mm_dp_ps(in, ws, 0xff);

        // accumulator
        accum += dp.m128_f32[0]; 
    }
    // save the final result
    dst&lt;N&gt; = accum.m128_f32[0];
}&lt;/N&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;It works but the speedup is far from what I expected. As you can see below a convolutional layer with x24 more operations than my custom dot product layer takes less time. This makes no sense and there should be much more room for improvements. What are my major mistakes when trying to use AVX?&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;**Convolutional Convolutional Layer Fully Optimized (AVX)**
Layer: CONV3-32 
Input: 28x28x32 = 25K   
Weights: (3*3*32)*32 = 9K   
Number of MACs: 3*3*27*27*32*32 = 7M    
Execution Time on OpenVINO framework: 0.049 ms

**My Custom Dot Product Layer Far From Optimized (AVX)**
Layer: FC
Inputs: 1x1x512
Outputs: 576    
Weights: 3*3*64*512 = 295K  
Number of MACs: 295K    
Execution Time on OpenVINO framework: 0.197 ms
&lt;/PRE&gt;</description>
      <pubDate>Tue, 24 Sep 2019 16:07:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148852#M11975</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-09-24T16:07:26Z</dc:date>
    </item>
    <item>
      <title>Dear Gouveia, César,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148853#M11976</link>
      <description>&lt;P&gt;Dear&amp;nbsp;Gouveia, César,&lt;/P&gt;&lt;P&gt;Please have a look at&amp;nbsp;&lt;A href="https://github.com/opencv/dldt/blob/2019/inference-engine/src/extension/ext_topk.cpp"&gt;ext_topk.cpp&lt;/A&gt;. While I didn't study your code deeply I don't see a SIMD approach (Single Instruction Multiple Data). How do I know this ? I just see a regular for loop. I'd expect to see&amp;nbsp;parallel_for2d. For instance, if you study &lt;A href="https://github.com/opencv/dldt/blob/2019/inference-engine/src/extension/ext_topk.cpp"&gt;ext_topk.cpp&lt;/A&gt;, you will see something like this:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;#if defined(HAVE_SSE) || defined(HAVE_AVX2) || defined(HAVE_AVX512F)
        parallel_for2d(before_num, after_num / block_size, [&amp;amp;](int i0, int ib1) {&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope it helps,&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Shubha&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2019 20:47:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148853#M11976</guid>
      <dc:creator>Shubha_R_Intel</dc:creator>
      <dc:date>2019-09-30T20:47:09Z</dc:date>
    </item>
    <item>
      <title>Dear Shubha,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148854#M11977</link>
      <description>&lt;P&gt;Dear Shubha,&lt;/P&gt;&lt;P&gt;After transposing the&amp;nbsp;weights matrix and applying AVX instrinsics I was able to optimize my code to a decent execution time! Here is the final code (which uses the dot weights matrix transposed and without the AVX instrinsics and the parallel_for&amp;nbsp;to simplify the code).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;StatusCode execute(std::vector&amp;lt;Blob::Ptr&amp;gt;&amp;amp; inputs, std::vector&amp;lt;Blob::Ptr&amp;gt;&amp;amp; outputs,
                       ResponseDesc *resp) noexcept override {
        
    if (inputs.size() != 1 || outputs.empty()) {
        if (resp) {
            std::string errorMsg = "Incorrect number of input or output edges!";
            errorMsg.copy(resp-&amp;gt;msg, sizeof(resp-&amp;gt;msg) - 1);
        }
        return GENERAL_ERROR;
    }

    const float* src = inputs[0]-&amp;gt;buffer();
    const float* scl = weights-&amp;gt;buffer();
    float* dst = outputs[0]-&amp;gt;buffer();

    SizeVector in_dims = inputs[0]-&amp;gt;getTensorDesc().getDims();
    SizeVector out_dims = outputs[0]-&amp;gt;getTensorDesc().getDims();

    const int in_neurons = static_cast&amp;lt;int&amp;gt;(in_dims[1]);
    const int out_neurons = static_cast&amp;lt;int&amp;gt;(out_dims[1]);

    for(size_t n = 0; n &amp;lt; out_neurons; n++){
        float accum = 0.0;
        for(size_t i = 0; i &amp;lt; in_neurons; i++){
            accum += src&lt;I&gt;*scl[n*in_neurons + i];
        }
        dst&lt;N&gt; = accum;
    }
return OK;
}
&lt;/N&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;César.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2019 10:46:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Not-able-to-generate-openvino-IR-for-simple-model-mnist-using/m-p/1148854#M11977</guid>
      <dc:creator>Gouveia__César</dc:creator>
      <dc:date>2019-10-14T10:46:00Z</dc:date>
    </item>
  </channel>
</rss>

