Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Problems at MTCNN conversion to OpenVino

hopeai
Beginner
1,361 Views

Hi,

I used the following commands to convert the mtcnn caffe models into IR:

python3 mo.py --input_model path/to/PNet/det1.caffemodel --model_name det1 --output_dir path/to/output_dir
python3 mo.py --input_model path/to/RNet/det2.caffemodel --model_name det2 --output_dir path/to/output_dir
python3 mo.py --input_model path/to/ONet/det3.caffemodel --model_name det3 --output_dir path/to/output_dir

PNet accepts different scale images to propose bounding boxes and probabilities then we input them into refinement and input the output of RNet into ONet. As far as I know openvino does not support variable input size so I used the for loop to reshape the input shape for different scales. First, I would like to know if there is a better way to do this and second is I am not sure why detection performance degrades significantly compared to the original repo mtcnn-pytorch   I double checed the weights of converted model with the original one and there is no difference however I keep getting different results in the first stage which affect the second and third stage and finally detection performance.

 

# run P-Net on different scales
for i, s in enumerate(scales):
    width, height = image.size
    sw, sh = math.ceil(width*s), math.ceil(height*s)
    img = image.resize((sw, sh), Image.BILINEAR)
    img = np.asarray(img, 'float32')
    pnet.reshape({'data': (1, 3, sh, sw)})
    img = _preprocess(img)
    log.info("Loading model to the plugin")
    pnet_exec_net = plugin.load(network=pnet, num_requests=1)
    output = pnet_exec_net.infer({'data': img})
    probs = output['prob1'][0, 1, :, :]
    offsets = (output['conv4-2'])

    boxes = _generate_bboxes(probs, offsets, s, thresholds[0])
    if len(boxes) == 0:
        bounding_boxes.append(None)
    else:
        keep = nms(boxes[:, 0:5], overlap_threshold=0.5)
        bounding_boxes.append(boxes[keep])

I spend several hours to get the same results (or at least an acceptable performance from converted mtcnn), however no luck! I really do appreciate any help regarding this.

 

 

0 Kudos
13 Replies
Shubha_R_Intel
Employee
1,361 Views

Dear Abdollahi Aghdam, Omid,

if you are scaling your input images before training then you must also add the scale to your model optimizer command. Please see my detailed response to this forum post .

Thanks,

Shubha

 

0 Kudos
hopeai
Beginner
1,361 Views

Dear Shubha R..

Just to mention, I am pre-processing the images before feeding them to the network. Should I add scale to the model optimizer despite this fact.

Kind regards,

Omid

0 Kudos
Shubha_R_Intel
Employee
1,361 Views

Dear Abdollahi Aghdam, Omid,

Yes absolutely. If you are pre-processing your images you should pay close attention to not only --input_shape (the size of the image used during training) but also run your model optimizer command with careful consideration to the following command-line parameters. Any pre-processing you do before you train images, obviously, model optimizer cannot know about it unless you tell it so.

--input_shape INPUT_SHAPE
                        Input shape(s) that should be fed to an input node(s)
                        of the model. Shape is defined as a comma-separated
                        list of integer numbers enclosed in parentheses or
                        square brackets, for example [1,3,227,227] or
                        (1,227,227,3), where the order of dimensions depends
                        on the framework input layout of the model. For
                        example, [N,C,H,W] is used for Caffe* models and
                        [N,H,W,C] for TensorFlow* models. Model Optimizer
                        performs necessary transformations to convert the
                        shape to the layout required by Inference Engine
                        (N,C,H,W). The shape should not contain undefined
                        dimensions (? or -1) and should fit the dimensions
                        defined in the input operation of the graph. If there
                        are multiple inputs in the model, --input_shape should
                        contain definition of shape for each input separated
                        by a comma, for example: [1,3,227,227],[2,4] for a
                        model with two inputs with 4D and 2D shapes.
                        Alternatively, you can specify shapes with the --input
                        option.
  --scale SCALE, -s SCALE
                        All input values coming from original network inputs
                        will be divided by this value. When a list of inputs
                        is overridden by the --input parameter, this scale is
                        not applied for any input that does not match with the
                        original input of the model.
  --reverse_input_channels
                        Switch the input channels order from RGB to BGR (or
                        vice versa). Applied to original inputs of the model
                        if and only if a number of channels equals 3. Applied
                        after application of --mean_values and --scale_values
                        options, so numbers in --mean_values and
                        --scale_values go in the order of channels used in the
                        original model.

 --mean_values MEAN_VALUES, -ms MEAN_VALUES
                        Mean values to be used for the input image per
                        channel. Values to be provided in the (R,G,B) or
                        [R,G,B] format. Can be defined for desired input of
                        the model, for example: "--mean_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --scale_values SCALE_VALUES
                        Scale values to be used for the input image per
                        channel. Values are provided in the (R,G,B) or [R,G,B]
                        format. Can be defined for desired input of the model,
                        for example: "--scale_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.

Thanks,

Shubha

0 Kudos
hopeai
Beginner
1,361 Views

Dear Shubha R.,

I went through all the possible options and did the conversion using the listed parameters in the list_topologies.yaml however I did not get the same results as the original models. Finally, I managed to get the same results using the mxnet models (https://github.com/YYuanAnyVision/mxnet_mtcnn_face_detection) and added --reverse_input_channels. I think there might be a problem in caffe model optimizer.

Kind regards,

Omid

0 Kudos
Shubha_R_Intel
Employee
1,361 Views

Dearest Abdollahi Aghdam, Omid,

I totally believe you. Can you attach as a zip file the *.xml for both the Caffe and the mxnet ? Also, please give me the exact mo command you used for each. Finally if you can also put the models in that zip file that would be great. You may have to attach two different zip files. If you feel uncomfortable attaching the zip files on a public forum please let me know on this forum and I can PM you, so that you can send them privately. This may be a bug and if the caffe MO is messing up while MXNet does it correctly with a similar model, it definitely could be a bug.

Thanks for your patience !

Shubha

0 Kudos
hopeai
Beginner
1,361 Views

Dear Shubha R.,

You will find caffe and mxnet models I used for conversion in the following links

Caffe models: https://github.com/TropComplique/mtcnn-pytorch/tree/master/caffe_models

Mxnet models: https://github.com/YYuanAnyVision/mxnet_mtcnn_face_detection/tree/master/model

For the caffe models I used the followings:

python3 mo.py --input_model path/to/PNet/det1.caffemodel --model_name det1 --output_dir path/to/output_dir

python3 mo.py --input_model path/to/RNet/det2.caffemodel --model_name det2 --output_dir path/to/output_dir

python3 mo.py --input_model path/to/ONet/det3.caffemodel --model_name det3 --output_dir path/to/output_dir

I also checked with --reverse_input_channels 

 

Finally, I convert the mxnet models using following:

python3 mo.py 

--input_model path/to/det1-0001.params \

--input_symbol path/to//det1-symbol.json \

--reverse_input_channels

 

python3 mo.py 

--input_model path/to/det2-0001.params \

--input_symbol path/to//det2-symbol.json \

--reverse_input_channels


 

python3 mo.py 

--input_model path/to/det3-0001.params \

--input_symbol path/to//det3-symbol.json \

--reverse_input_channels

 

The *.xml files are available in the following link:

https://www.dropbox.com/s/7ybc4tlfsw4j616/converted_from_caffe.zip

https://www.dropbox.com/s/pnsatt98r7o57zz/converted_from_mxnet.zip

 

Best regards,

Omid

0 Kudos
Shubha_R_Intel
Employee
1,361 Views

Dearest Abdollahi Aghdam, Omid,

Wonderful. Thanks ! I promise to take a look.

Shubha

0 Kudos
Shubha_R_Intel
Employee
1,361 Views

Dearest Abdollahi Aghdam, Omid,

Unfortunately I don't use dropbox. Can you add them as a *.zip attachment ? Or if you'd prefer I can PM you and you can send them to me privately.

Thanks,

Shubha

0 Kudos
hopeai
Beginner
1,361 Views

Dearest Shubha R.,

Sure, find the zip files attached.

Best regards,

Omid

0 Kudos
Shubha_R_Intel
Employee
1,361 Views

Dear  Abdollahi Aghdam, Omid

Please look at your deployment_tools\tools\model_downloader\list_topologies.yml file. Do a search on mtcnn. You need to pass in specific --input_shape parameters to your model optimizer command as this forum poster did. Please scroll down about halfway down the page and you'll understand. Your mo_caffe.py commands look different than his, he is adding --input_shape to match the topologies.yml.

Please try a similar approach and let me know how it works out.

Thanks,

Shubha

 

0 Kudos
hopeai
Beginner
1,361 Views

Dear Shubha R.,

I had tried all the possible combinations of commands (--input_shape, --scale. etc.) that might caused the problem, however, I only was able to achieve the same performance using mxnet pretrained models and not with caffe models.

 

Best regards,

Omid

0 Kudos
hua__wei
Beginner
1,361 Views

 

 

do u know how to convert model with more than 3 channels, i used lnet in mtcnn and use code below to convert model, but the result is different from original mxnet model: (the input of lnet is [1,15,24,24])

python3 mo_mxnet.py --input_model det4-0001.params  --mean_values [127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5]    --scale_values [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128]  --input_shape [1,15,24,24] --input_symbol det4-symbol.json

------------------------------------

Shubha R. (Intel) wrote:

Dear Abdollahi Aghdam, Omid,

Yes absolutely. If you are pre-processing your images you should pay close attention to not only --input_shape (the size of the image used during training) but also run your model optimizer command with careful consideration to the following command-line parameters. Any pre-processing you do before you train images, obviously, model optimizer cannot know about it unless you tell it so.

--input_shape INPUT_SHAPE
                        Input shape(s) that should be fed to an input node(s)
                        of the model. Shape is defined as a comma-separated
                        list of integer numbers enclosed in parentheses or
                        square brackets, for example [1,3,227,227] or
                        (1,227,227,3), where the order of dimensions depends
                        on the framework input layout of the model. For
                        example, [N,C,H,W] is used for Caffe* models and
                        [N,H,W,C] for TensorFlow* models. Model Optimizer
                        performs necessary transformations to convert the
                        shape to the layout required by Inference Engine
                        (N,C,H,W). The shape should not contain undefined
                        dimensions (? or -1) and should fit the dimensions
                        defined in the input operation of the graph. If there
                        are multiple inputs in the model, --input_shape should
                        contain definition of shape for each input separated
                        by a comma, for example: [1,3,227,227],[2,4] for a
                        model with two inputs with 4D and 2D shapes.
                        Alternatively, you can specify shapes with the --input
                        option.
  --scale SCALE, -s SCALE
                        All input values coming from original network inputs
                        will be divided by this value. When a list of inputs
                        is overridden by the --input parameter, this scale is
                        not applied for any input that does not match with the
                        original input of the model.
  --reverse_input_channels
                        Switch the input channels order from RGB to BGR (or
                        vice versa). Applied to original inputs of the model
                        if and only if a number of channels equals 3. Applied
                        after application of --mean_values and --scale_values
                        options, so numbers in --mean_values and
                        --scale_values go in the order of channels used in the
                        original model.

 --mean_values MEAN_VALUES, -ms MEAN_VALUES
                        Mean values to be used for the input image per
                        channel. Values to be provided in the (R,G,B) or
                        [R,G,B] format. Can be defined for desired input of
                        the model, for example: "--mean_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --scale_values SCALE_VALUES
                        Scale values to be used for the input image per
                        channel. Values are provided in the (R,G,B) or [R,G,B]
                        format. Can be defined for desired input of the model,
                        for example: "--scale_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.

Thanks,

Shubha

0 Kudos
Shubha_R_Intel
Employee
1,361 Views

Dear Abdollahi Aghdam, Omid,

 

 had tried all the possible combinations of commands (--input_shape, --scale. etc.) that might caused the problem, however, I only was able to achieve the same performance using mxnet pretrained models and not with caffe models.

I understand. I will look into your files. Thanks for providing them. 

Dear hua, wei,

What do you mean by more than 3 channels ? 3 channels are R,G,B (color channels). do you mean more than one input ? The layout which OpenVino Inference Engine expects is NCHW.  So the size of C is generally expected to be 3 for BGR or RGB input. Can you clarify what you mean ?

Thanks kindly,

Shubha

 

 

 

 

0 Kudos
Reply