Problems at MTCNN conversion to OpenVino

hopeai · ‎06-18-2019

Hi,

I used the following commands to convert the mtcnn caffe models into IR:

python3 mo.py --input_model path/to/PNet/det1.caffemodel --model_name det1 --output_dir path/to/output_dir
python3 mo.py --input_model path/to/RNet/det2.caffemodel --model_name det2 --output_dir path/to/output_dir
python3 mo.py --input_model path/to/ONet/det3.caffemodel --model_name det3 --output_dir path/to/output_dir

PNet accepts different scale images to propose bounding boxes and probabilities then we input them into refinement and input the output of RNet into ONet. As far as I know openvino does not support variable input size so I used the for loop to reshape the input shape for different scales. First, I would like to know if there is a better way to do this and second is I am not sure why detection performance degrades significantly compared to the original repo mtcnn-pytorch I double checed the weights of converted model with the original one and there is no difference however I keep getting different results in the first stage which affect the second and third stage and finally detection performance.

# run P-Net on different scales
for i, s in enumerate(scales):
    width, height = image.size
    sw, sh = math.ceil(width*s), math.ceil(height*s)
    img = image.resize((sw, sh), Image.BILINEAR)
    img = np.asarray(img, 'float32')
    pnet.reshape({'data': (1, 3, sh, sw)})
    img = _preprocess(img)
    log.info("Loading model to the plugin")
    pnet_exec_net = plugin.load(network=pnet, num_requests=1)
    output = pnet_exec_net.infer({'data': img})
    probs = output['prob1'][0, 1, :, :]
    offsets = (output['conv4-2'])

    boxes = _generate_bboxes(probs, offsets, s, thresholds[0])
    if len(boxes) == 0:
        bounding_boxes.append(None)
    else:
        keep = nms(boxes[:, 0:5], overlap_threshold=0.5)
        bounding_boxes.append(boxes[keep])

I spend several hours to get the same results (or at least an acceptable performance from converted mtcnn), however no luck! I really do appreciate any help regarding this.

Shubha_R_Intel · ‎06-18-2019

Dear Abdollahi Aghdam, Omid,

if you are scaling your input images before training then you must also add the scale to your model optimizer command. Please see my detailed response to this forum post .

Thanks,

Shubha

hopeai · ‎06-18-2019

Dear Shubha R..

Just to mention, I am pre-processing the images before feeding them to the network. Should I add scale to the model optimizer despite this fact.

Kind regards,

Omid

Shubha_R_Intel · ‎06-20-2019

Dear Abdollahi Aghdam, Omid,

Yes absolutely. If you are pre-processing your images you should pay close attention to not only --input_shape (the size of the image used during training) but also run your model optimizer command with careful consideration to the following command-line parameters. Any pre-processing you do before you train images, obviously, model optimizer cannot know about it unless you tell it so.

--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
Alternatively, you can specify shapes with the --input
option.
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
--reverse_input_channels
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.

--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.

Thanks,

Shubha

hopeai · ‎06-20-2019

Dear Shubha R.,

I went through all the possible options and did the conversion using the listed parameters in the list_topologies.yaml however I did not get the same results as the original models. Finally, I managed to get the same results using the mxnet models (https://github.com/YYuanAnyVision/mxnet_mtcnn_face_detection) and added --reverse_input_channels. I think there might be a problem in caffe model optimizer.

Kind regards,

Omid

Shubha_R_Intel · ‎06-21-2019

Dearest Abdollahi Aghdam, Omid,

I totally believe you. Can you attach as a zip file the *.xml for both the Caffe and the mxnet ? Also, please give me the exact mo command you used for each. Finally if you can also put the models in that zip file that would be great. You may have to attach two different zip files. If you feel uncomfortable attaching the zip files on a public forum please let me know on this forum and I can PM you, so that you can send them privately. This may be a bug and if the caffe MO is messing up while MXNet does it correctly with a similar model, it definitely could be a bug.

Thanks for your patience !

Shubha

hopeai · ‎06-21-2019

Dear Shubha R.,

You will find caffe and mxnet models I used for conversion in the following links

Caffe models: https://github.com/TropComplique/mtcnn-pytorch/tree/master/caffe_models

Mxnet models: https://github.com/YYuanAnyVision/mxnet_mtcnn_face_detection/tree/master/model

For the caffe models I used the followings:

python3 mo.py --input_model path/to/PNet/det1.caffemodel --model_name det1 --output_dir path/to/output_dir

python3 mo.py --input_model path/to/RNet/det2.caffemodel --model_name det2 --output_dir path/to/output_dir

python3 mo.py --input_model path/to/ONet/det3.caffemodel --model_name det3 --output_dir path/to/output_dir

I also checked with --reverse_input_channels

Finally, I convert the mxnet models using following:

python3 mo.py 

--input_model path/to/det1-0001.params \

--input_symbol path/to//det1-symbol.json \

--reverse_input_channels

 

python3 mo.py 

--input_model path/to/det2-0001.params \

--input_symbol path/to//det2-symbol.json \

--reverse_input_channels


 

python3 mo.py 

--input_model path/to/det3-0001.params \

--input_symbol path/to//det3-symbol.json \

--reverse_input_channels

The *.xml files are available in the following link:

https://www.dropbox.com/s/7ybc4tlfsw4j616/converted_from_caffe.zip

https://www.dropbox.com/s/pnsatt98r7o57zz/converted_from_mxnet.zip

Best regards,

Omid

Shubha_R_Intel · ‎06-21-2019

Dearest Abdollahi Aghdam, Omid,

Wonderful. Thanks ! I promise to take a look.

Shubha

Shubha_R_Intel · ‎06-27-2019

Dearest Abdollahi Aghdam, Omid,

Unfortunately I don't use dropbox. Can you add them as a *.zip attachment ? Or if you'd prefer I can PM you and you can send them to me privately.

Thanks,

Shubha

hopeai · ‎06-27-2019

Dearest Shubha R.,

Sure, find the zip files attached.

Best regards,

Omid

Shubha_R_Intel · ‎06-27-2019

Dear Abdollahi Aghdam, Omid

Please look at your deployment_tools\tools\model_downloader\list_topologies.yml file. Do a search on mtcnn. You need to pass in specific --input_shape parameters to your model optimizer command as this forum poster did. Please scroll down about halfway down the page and you'll understand. Your mo_caffe.py commands look different than his, he is adding --input_shape to match the topologies.yml.

Please try a similar approach and let me know how it works out.

Thanks,

Shubha

hopeai · ‎06-27-2019

Dear Shubha R.,

I had tried all the possible combinations of commands (--input_shape, --scale. etc.) that might caused the problem, however, I only was able to achieve the same performance using mxnet pretrained models and not with caffe models.

Best regards,

Omid

hua__wei · ‎06-28-2019

do u know how to convert model with more than 3 channels, i used lnet in mtcnn and use code below to convert model, but the result is different from original mxnet model: (the input of lnet is [1,15,24,24])

python3 mo_mxnet.py --input_model det4-0001.params --mean_values [127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5] --scale_values [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128] --input_shape [1,15,24,24] --input_symbol det4-symbol.json

------------------------------------

Shubha R. (Intel) wrote:
Dear Abdollahi Aghdam, Omid,
Yes absolutely. If you are pre-processing your images you should pay close attention to not only --input_shape (the size of the image used during training) but also run your model optimizer command with careful consideration to the following command-line parameters. Any pre-processing you do before you train images, obviously, model optimizer cannot know about it unless you tell it so.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
Alternatively, you can specify shapes with the --input
option.
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
--reverse_input_channels
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
Thanks,
Shubha

Shubha_R_Intel · ‎06-28-2019

Dear Abdollahi Aghdam, Omid,

had tried all the possible combinations of commands (--input_shape, --scale. etc.) that might caused the problem, however, I only was able to achieve the same performance using mxnet pretrained models and not with caffe models.

I understand. I will look into your files. Thanks for providing them.

Dear hua, wei,

What do you mean by more than 3 channels ? 3 channels are R,G,B (color channels). do you mean more than one input ? The layout which OpenVino Inference Engine expects is NCHW. So the size of C is generally expected to be 3 for BGR or RGB input. Can you clarify what you mean ?

Thanks kindly,

Shubha