Try to use benchmark_app to experiment with the core utilization. I used the following script on a machine having 18 cores, 2 sockets, which gives 72 logical processors all utilized to 100% during the peak of loading the model.
numactl -l ~/inference_engine_samples_build/intel64/Release/benchmark_app -i bert_input.bin -m bert_model.ckpt.xml -niter 100 -nthreads 72 -nstreams 72 -nireq 72
Count: 144 iterations
Duration: 1921.27 ms
Latency: 833.773 ms
Throughput: 74.9504 FPS
Is there somewhere I can download the input data file bert_input.bin? I tried running benchmark_app without the -i flag (since it is marked as optional) but got the error
[ ERROR ] Input Placeholder cannot be filled: please provide input binary files!
Is there a tutorial for converting a dataset (perhaps SQuAD) into the .bin format that benchmark_app will accept?
For the purpose of unit testing, I just gave a text file of the required number of bytes saved in .bin format for the purpose of answering the query here. I recommend exploring Google's BERT repositories for understanding the structured input.