Solved: Huggingface Transformer Conversion Instructions

sbsky · ‎11-06-2020

In the model zoo I see that there are BERT transformer models successfully converted from the Huggingface transformer library to OpenVINO:

https://docs.openvinotoolkit.org/2021.1/omz_models_intel_bert_small_uncased_whole_word_masking_squad_0001_description_bert_small_uncased_whole_word_masking_squad_0001.html

Unfortunately I can't find any instructions or documentation on how this conversion from the Huggingface transformer library to OpenVINO was performed. Where can I find the instructions?

Adli · ‎12-09-2020

Hi Pieter,

This case had been escalated to the developer team and we got a recommendation provided to take a look at the set of scripts from training_extensions to fine this BERT model. Please refer to the following link:

https://github.com/openvinotoolkit/training_extensions/tree/develop/pytorch_toolkit/question_answering

They produce the ONNX model automatically. After that, you only need to execute MO and get the IR model.

Regards,

Adli

View solution in original post

IntelSupport · ‎11-08-2020

Hi Pieter,

Thank you for reaching out to us. 'Bert-small-uncased-whole-word-masking-squad-0001' model is an Intel Pre-Trained model that is available for download. To download the model in IR format, please run the following command:

python downloader.py --name bert-small-uncased-whole-word-masking-squad-0001

You can also download the model IR files at the following link:

https://download.01.org/opencv/2021/openvinotoolkit/2021.1/open_model_zoo/models_bin/2/bert-small-uncased-whole-word-masking-squad-0001/

For more information regarding the model downloader, please refer to the following link:

https://docs.openvinotoolkit.org/latest/omz_tools_downloader_README.html#model_downloader_usage

Regards,

Adli

sbsky · ‎11-09-2020

Hi Adli, thanks for your response. My question isn't about using the example model, but rather how it was created, as the example model isn't suitable for my task. I need to train it on my own data.

Was Tensorflow or Pytorch used to create the model? And if so, what version? What commands were used to perform the conversion? Was ONNX used?

IntelSupport · ‎11-10-2020

Hi Pieter,

Since this is Intel Pre-Trained model, all the information available publicly from our side for this model is presented on the page you have pointed out already: https://docs.openvinotoolkit.org/latest/omz_models_intel_bert_small_uncased_whole_word_masking_squad_0001_description_bert_small_uncased_whole_word_masking_squad_0001.html

The original 'bert-large-uncased-whole-word-masking-finetuned-squad' model is taken from Transformers library: https://github.com/huggingface/transformers

The source framework is PyTorch. The model is trained on the 'SQuAD v1.1' dataset, which you can replace with your own dataset. Since there is no direct PyTorch conversion in the OpenVINO toolkit, we utilize intermediate conversion to ONNX.

For IR conversion command example, please refer the following code:

python3 mo.py -m bert_squad_fp32.onnx --input_shape "[1,384],[1,384],[1,384]" --input "0,1,2"

Regards,

Adli

sbsky · ‎11-11-2020

Hi Adli, thanks I was able to convert using your instructions. I had a look at the converted openvino XML graph and I saw that Gelu and LayerNorm fusion wasn't performed. It is my understanding that the Model Optimizer should perform these graph fusions automatically.

How do I make use of these fused operators? Are there any special commands I need to give to the Model Optimizer?

IntelSupport · ‎11-14-2020

Hi Pieter,

Thank you for reaching out to us. Optimization offers methods to accelerate inference with the convolution neural networks (CNN) that do not require model retraining. In the Model Optimizer, this optimization is turned on by default. This optimization method consists of three stages:

BatchNormalization and ScaleShift decomposition.
Linear operations merge.
Linear operations fusion.

For more information regarding optimization description, please refer to the following link:

https://docs.openvinotoolkit.org/2021.1/openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques.html#optimization_description

There are cases where optimization might not operate such as:

Non-linear operations (like activations) in between convolutions and linear operations might prevent fusion. Please refer to the following link: https://docs.openvinotoolkit.org/2021.1/openvino_docs_optimization_guide_dldt_optimization_guide.html#mo-knobs-related-to-performance
Device-specific optimization, covered by Inference Engine during the model loading time. Info regarding device-specific optimization is available at the following link: https://docs.openvinotoolkit.org/2021.1/openvino_docs_optimization_guide_dldt_optimization_guide.html

In addition, we can trigger Model Optimizer to disable optimizations for specified nodes via -- finegrain_fusing command. For more information, please refer to the following link:

https://docs.openvinotoolkit.org/2021.1/openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques.html#disable_fusing

Regards,

Adli

sbsky · ‎11-17-2020

Hi Adli, how were Gelu and LayerNorm fusion performed in the reference BERT model? The reference model features these fusions.

IntelSupport · ‎11-24-2020

Hi Pieter,

If possible, could you run the following command on your model:

benchmark_app -m <your_model>.xml -report_type detailed_counters

The 'benchmark_app.exe' is located in 'inference_engine_samples_build\intel64\Release' directory. Please share and post the outcome here. For more information regarding Benchmark C++ Tool, please refer to the following link: https://docs.openvinotoolkit.org/latest/openvino_inference_engine_samples_benchmark_app_README.html#run_the_tool

Regards,

Adli

sbsky · ‎11-26-2020

Hi Adli, I can do you one better. Please find attached an ONNX file that contains Dense, Gelu, Dense, LayerNorm layers.

sbsky · ‎12-02-2020

Hi Aldi, just wondering if there's any update on this?

IntelSupport · ‎12-02-2020

Hi Pieter,

I've been able to convert the 'onnx' model over to IR, and there are no Gelu nor LayerNorm left in IR. It has been checked thru Netron as well as in >benchmark_app -m <your_model>.xml -report_type detailed_counters.

Please check the model attached and please verify if that resolves the issue.

Regards,

Adli

sbsky · ‎12-03-2020

Hi Adli, my whole point and why I am asking for assistance is because there should be Gelu and LayerNorm in the converted IR. ONNX doesn't have Gelu or LayerNorm operations, and so expresses these as a number of other operations, such as ERF. This is what you are seeing in the converted IR.

OpenVINO should recognise these patterns and substitute in the Gelu and LayerNorm operations, but doesn't. My question is how do I do this?

I know that this is possible because the BERT examples in the OpenVINO zoo have successfully replaced these operators (you can check in Netron).

Can you please escalate this with the team that made the BERT example in the OpenVINO model zoo, or advise how I get in contact with them?

Without the fused operators, the model performance is severely impacted and no faster than Pytorch.

Adli · ‎12-09-2020

Hi Pieter,

This case had been escalated to the developer team and we got a recommendation provided to take a look at the set of scripts from training_extensions to fine this BERT model. Please refer to the following link:

https://github.com/openvinotoolkit/training_extensions/tree/develop/pytorch_toolkit/question_answering

They produce the ONNX model automatically. After that, you only need to execute MO and get the IR model.

Regards,

Adli

Adli · ‎12-17-2020

Hi Pieter,

This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Regards,

Adli

sbsky · ‎12-17-2020

Hi Adli, after a bit of tinkering based on what's in those scripts I got it to work. Thanks for your assistance