I have a Bert model, and the inference is slow in CPU, does OpenVINO support the inference of Bert, and accelerate it? My CPU is 40 Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, thanks a lot!
Hi Hap Zhang,
Thanks for reaching out to us.
Intel® Xeon® Silver 4114 Processor (13.75M Cache, 2.20 GHz) is supported by OpenVINO, since it’s a 1st generation Intel® Xeon® Scalable processor. The system requirements for Intel® Distribution of OpenVINO™ toolkit are available at System Requirements.
Performance wise, the bert-large-uncased-whole-word-masking-squad-int8-0001 model (which is a model quantized to INT8 precision using quantization-aware fine-tuning with NNCF) executed on Intel® Xeon® Silver 4216R CPU @ 2.20GHz achieved 16.71 FPS, as can be seen in the benchmark page.
For your additional information, OpenVINO supports the following Bidirectional Encoder Representations from Transformers (BERT) models, as mentioned here.
· BERT-Base, Cased
· BERT-Base, Uncased
· BERT-Base, Multilingual Cased
· BERT-Base, Multilingual Uncased
· BERT-Base, Chinese
· BERT-Large, Cased
· BERT-Large, Uncased
OpenVINO also provides the following seven Intel pre-trained models and one public pre-trained model:
Hi Hap Zhang,
This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.