topic Re: softmax layer, 60 ms on 0.01 MFLOPS in Intel® Distribution of OpenVINO™ Toolkit

softmax layer, 60 ms on 0.01 MFLOPS

idata — Tue, 23 Oct 2018 22:49:27 GMT

Hello Movis

I have a very simple Fully Convolutional network (converting a input of shape [1, 2, 256, 256] to a pmap of [1 2 42 42]).

It uses 3x3 and 5x5 layers and finishes off with a Softmax layer. The softmax layer recieves a map of shape [1 2 42 42].

The network, confirmed and working on the Movidius NC, has < 1GFlops and I profile it with

mvNCProfile model.prototxt -w model.caffemodel  -s 12

The total inference time is reported to be 132ms with 58ms used in the last layer doing a softmax of complexity 0.010584 MFLOPS!

This cant be true .. I guess I'am missing some parameter?

Thanks! And have a Nice one

A minimal prototxt example which (on my computer) confirms this is:

(I installed all movidius (ncsdk) on a clean ubuntu system with mvNCProfile --version v02.00.)

    name: "CNN_test_movidius_softmax"
    input: "data"
    input_shape {
      dim: 1
      dim: 2
      dim: 42
      dim: 42
    }        
    layer {
      name: "prob"
      type: "Softmax"
      bottom: "data"
      top: "prob"
    }

A call to mvNCProfile test_softmax.prototxt -s 12 results in:

Detailed Per Layer Profile
                                                               Bandwidth   time
#   Name                                                 MFLOPs  (MB/s)    (ms)
===============================================================================
0   input                                                   0.0     0.0   0.002
1   prob                                                    0.0     0.1  58.205
-------------------------------------------------------------------------------
                                   Total inference time                   58.21
-------------------------------------------------------------------------------

Re: softmax layer, 60 ms on 0.01 MFLOPS

idata — Thu, 25 Oct 2018 16:50:40 GMT

Solution; I skip the Softmax layer and calculate it in the program.

It is NOT a solution but a hack.

My hack needs some magic scaling to be identical to the 'true' solution.

Hack:

Return the layer before the Softmax (just edit the .prototxt and remove the 'prob' layer)

Calc the softmax (THIS IS A WRONG VERSION, it does not scale right!)

Save approx 50ms

for (Mat &M : probability_maps )

{

// max of each dimension

double minval, maxval;

cv::minMaxIdx(M, &minval, &maxval);

// Ei = exp of each Pi-maxPi

M -= maxval;

cv::exp(M, M);

// sum of each exp(Pi-maxPi)

const Scalar s = cv::sum(M);

// divide each element of Ei by sum of Ei

const double scale = 1.0 / (s.val[0] + 1.e-6);

M *= scale;

}

Have a Good Day!

NB: It would be nice if anyone could confirm that my use of the softmax in NCSDK is correct. I know that for two classes (ie binary) I could use a Sigmoid layer but my network _will_ have more than two classes when its up and running.

Re: softmax layer, 60 ms on 0.01 MFLOPS

idata — Mon, 22 Apr 2019 17:19:37 GMT

// previous post contains some code. That code is .. like .. NOT .. correct. (do not know on what data I verified that .. but .. SORRY if you copy pasted and ran bevildered ..)

An, even more, correct version of a softmax layer ..

for (cv::Mat &M : probability_maps )
    {
        // M -= max;
      cv::exp(M,M);
    }

  Mat Sum = probability_maps[0].clone();   
  for (unsigned int i = 1; i < probability_maps.size(); i++ )
    {
      Sum += probability_maps[i];
    }

  // divide each pix with sum .. ie scale prob to 0:1 
  for (cv::Mat &M : probability_maps )
    {
      cv::divide(M, Sum, M);
    }

.. I still havent found out WHY movidius cant handle a softmax on a eg 50x50 2 channels layer.