We were able to optimize the BERT model for Neural machine translation successfully. But there is no sample or reference available to infer the model on openVINO.
Can anyone provide me with any documentations or reference on how to write the inference script for the model?
Try using benchmark_app with the IR files and use it as a reference to generate the inference script. Kindly refer to this thread.
Hemanth Kumar G. (Intel) wrote:
Try to use benchmark_app to experiment with the core utilization. I used the following script on a machine having 18 cores, 2 sockets, which gives 72 logical processors all utilized to 100% during the peak of loading the model.
numactl -l ~/inference_engine_samples_build/intel64/Release/benchmark_app -i bert_input.bin -m bert_model.ckpt.xml -niter 100 -nthreads 72 -nstreams 72 -nireq 72
Count: 144 iterations
Duration: 1921.27 ms
Latency: 833.773 ms
Throughput: 74.9504 FPS