- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used the following commands to convert the mtcnn caffe models into IR:
python3 mo.py --input_model path/to/PNet/det1.caffemodel --model_name det1 --output_dir path/to/output_dir python3 mo.py --input_model path/to/RNet/det2.caffemodel --model_name det2 --output_dir path/to/output_dir python3 mo.py --input_model path/to/ONet/det3.caffemodel --model_name det3 --output_dir path/to/output_dir
PNet accepts different scale images to propose bounding boxes and probabilities then we input them into refinement and input the output of RNet into ONet. As far as I know openvino does not support variable input size so I used the for loop to reshape the input shape for different scales. First, I would like to know if there is a better way to do this and second is I am not sure why detection performance degrades significantly compared to the original repo mtcnn-pytorch I double checed the weights of converted model with the original one and there is no difference however I keep getting different results in the first stage which affect the second and third stage and finally detection performance.
# run P-Net on different scales for i, s in enumerate(scales): width, height = image.size sw, sh = math.ceil(width*s), math.ceil(height*s) img = image.resize((sw, sh), Image.BILINEAR) img = np.asarray(img, 'float32') pnet.reshape({'data': (1, 3, sh, sw)}) img = _preprocess(img) log.info("Loading model to the plugin") pnet_exec_net = plugin.load(network=pnet, num_requests=1) output = pnet_exec_net.infer({'data': img}) probs = output['prob1'][0, 1, :, :] offsets = (output['conv4-2']) boxes = _generate_bboxes(probs, offsets, s, thresholds[0]) if len(boxes) == 0: bounding_boxes.append(None) else: keep = nms(boxes[:, 0:5], overlap_threshold=0.5) bounding_boxes.append(boxes[keep])
I spend several hours to get the same results (or at least an acceptable performance from converted mtcnn), however no luck! I really do appreciate any help regarding this.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Abdollahi Aghdam, Omid,
if you are scaling your input images before training then you must also add the scale to your model optimizer command. Please see my detailed response to this forum post .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Shubha R..
Just to mention, I am pre-processing the images before feeding them to the network. Should I add scale to the model optimizer despite this fact.
Kind regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Abdollahi Aghdam, Omid,
Yes absolutely. If you are pre-processing your images you should pay close attention to not only --input_shape (the size of the image used during training) but also run your model optimizer command with careful consideration to the following command-line parameters. Any pre-processing you do before you train images, obviously, model optimizer cannot know about it unless you tell it so.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
Alternatively, you can specify shapes with the --input
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Shubha R.,
I went through all the possible options and did the conversion using the listed parameters in the list_topologies.yaml however I did not get the same results as the original models. Finally, I managed to get the same results using the mxnet models (https://github.com/YYuanAnyVision/mxnet_mtcnn_face_detection) and added --reverse_input_channels. I think there might be a problem in caffe model optimizer.
Kind regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dearest Abdollahi Aghdam, Omid,
I totally believe you. Can you attach as a zip file the *.xml for both the Caffe and the mxnet ? Also, please give me the exact mo command you used for each. Finally if you can also put the models in that zip file that would be great. You may have to attach two different zip files. If you feel uncomfortable attaching the zip files on a public forum please let me know on this forum and I can PM you, so that you can send them privately. This may be a bug and if the caffe MO is messing up while MXNet does it correctly with a similar model, it definitely could be a bug.
Thanks for your patience !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Shubha R.,
You will find caffe and mxnet models I used for conversion in the following links
Caffe models: https://github.com/TropComplique/mtcnn-pytorch/tree/master/caffe_models
Mxnet models: https://github.com/YYuanAnyVision/mxnet_mtcnn_face_detection/tree/master/model
For the caffe models I used the followings:
python3 mo.py --input_model path/to/PNet/det1.caffemodel --model_name det1 --output_dir path/to/output_dir python3 mo.py --input_model path/to/RNet/det2.caffemodel --model_name det2 --output_dir path/to/output_dir python3 mo.py --input_model path/to/ONet/det3.caffemodel --model_name det3 --output_dir path/to/output_dir
I also checked with --reverse_input_channels
Finally, I convert the mxnet models using following:
python3 mo.py --input_model path/to/det1-0001.params \ --input_symbol path/to//det1-symbol.json \ --reverse_input_channels python3 mo.py --input_model path/to/det2-0001.params \ --input_symbol path/to//det2-symbol.json \ --reverse_input_channels python3 mo.py --input_model path/to/det3-0001.params \ --input_symbol path/to//det3-symbol.json \ --reverse_input_channels
The *.xml files are available in the following link:
Best regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dearest Abdollahi Aghdam, Omid,
Wonderful. Thanks ! I promise to take a look.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dearest Abdollahi Aghdam, Omid,
Unfortunately I don't use dropbox. Can you add them as a *.zip attachment ? Or if you'd prefer I can PM you and you can send them to me privately.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Abdollahi Aghdam, Omid
Please look at your deployment_tools\tools\model_downloader\list_topologies.yml file. Do a search on mtcnn. You need to pass in specific --input_shape parameters to your model optimizer command as this forum poster did. Please scroll down about halfway down the page and you'll understand. Your mo_caffe.py commands look different than his, he is adding --input_shape to match the topologies.yml.
Please try a similar approach and let me know how it works out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Shubha R.,
I had tried all the possible combinations of commands (--input_shape, --scale. etc.) that might caused the problem, however, I only was able to achieve the same performance using mxnet pretrained models and not with caffe models.
Best regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do u know how to convert model with more than 3 channels, i used lnet in mtcnn and use code below to convert model, but the result is different from original mxnet model: (the input of lnet is [1,15,24,24])
python3 mo_mxnet.py --input_model det4-0001.params --mean_values [127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5,127.5] --scale_values [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128] --input_shape [1,15,24,24] --input_symbol det4-symbol.json
Shubha R. (Intel) wrote:Dear Abdollahi Aghdam, Omid,
Yes absolutely. If you are pre-processing your images you should pay close attention to not only --input_shape (the size of the image used during training) but also run your model optimizer command with careful consideration to the following command-line parameters. Any pre-processing you do before you train images, obviously, model optimizer cannot know about it unless you tell it so.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
Alternatively, you can specify shapes with the --input
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Abdollahi Aghdam, Omid,
had tried all the possible combinations of commands (--input_shape, --scale. etc.) that might caused the problem, however, I only was able to achieve the same performance using mxnet pretrained models and not with caffe models.
I understand. I will look into your files. Thanks for providing them.
Dear hua, wei,
What do you mean by more than 3 channels ? 3 channels are R,G,B (color channels). do you mean more than one input ? The layout which OpenVino Inference Engine expects is NCHW. So the size of C is generally expected to be 3 for BGR or RGB input. Can you clarify what you mean ?
Thanks kindly,
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page