Showing results for 
Search instead for 
Did you mean: 
Community Manager

softmax layer, 60 ms on 0.01 MFLOPS

Hello Movis


I have a very simple Fully Convolutional network (converting a input of shape [1, 2, 256, 256] to a pmap of [1 2 42 42]).


It uses 3x3 and 5x5 layers and finishes off with a Softmax layer. The softmax layer recieves a map of shape [1 2 42 42].


The network, confirmed and working on the Movidius NC, has < 1GFlops and I profile it with


mvNCProfile model.prototxt -w model.caffemodel -s 12


The total inference time is reported to be 132ms with 58ms used in the last layer doing a softmax of complexity 0.010584 MFLOPS!


This cant be true .. I guess I'am missing some parameter?


Thanks! And have a Nice one





A minimal prototxt example which (on my computer) confirms this is:


(I installed all movidius (ncsdk) on a clean ubuntu system with mvNCProfile --version v02.00.)


name: "CNN_test_movidius_softmax" input: "data" input_shape { dim: 1 dim: 2 dim: 42 dim: 42 } layer { name: "prob" type: "Softmax" bottom: "data" top: "prob" }


A call to mvNCProfile test_softmax.prototxt -s 12 results in:


Detailed Per Layer Profile Bandwidth time # Name MFLOPs (MB/s) (ms) =============================================================================== 0 input 0.0 0.0 0.002 1 prob 0.0 0.1 58.205 ------------------------------------------------------------------------------- Total inference time 58.21 -------------------------------------------------------------------------------
Tags (1)
0 Kudos
2 Replies
Community Manager

Solution; I skip the Softmax layer and calculate it in the program.


It is NOT a solution but a hack.


My hack needs some magic scaling to be identical to the 'true' solution.





  • Return the layer before the Softmax (just edit the .prototxt and remove the 'prob' layer)


  • Calc the softmax (THIS IS A WRONG VERSION, it does not scale right!)


  • Save approx 50ms


    for (Mat &M : probability_maps )




    // max of each dimension


    double minval, maxval;


    cv::minMaxIdx(M, &minval, &maxval);


    // Ei = exp of each Pi-maxPi


    M -= maxval;


    cv::exp(M, M);


    // sum of each exp(Pi-maxPi)


    const Scalar s = cv::sum(M);


    // divide each element of Ei by sum of Ei


    const double scale = 1.0 / (s.val[0] + 1.e-6);


    M *= scale;





Have a Good Day!




NB: It would be nice if anyone could confirm that my use of the softmax in NCSDK is correct. I know that for two classes (ie binary) I could use a Sigmoid layer but my network _will_ have more than two classes when its up and running.

0 Kudos
Community Manager

// previous post contains some code. That code is .. like .. NOT .. correct. (do not know on what data I verified that .. but .. SORRY if you copy pasted and ran bevildered ..)


An, even more, correct version of a softmax layer ..


for (cv::Mat &M : probability_maps ) { // M -= max; cv::exp(M,M); } Mat Sum = probability_maps[0].clone(); for (unsigned int i = 1; i < probability_maps.size(); i++ ) { Sum += probability_maps[i]; } // divide each pix with sum .. ie scale prob to 0:1 for (cv::Mat &M : probability_maps ) { cv::divide(M, Sum, M); }


.. I still havent found out WHY movidius cant handle a softmax on a eg 50x50 2 channels layer.

0 Kudos