Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

About kaldi model inference speed

ArtemisZGL
Beginner
943 Views

My question has been posted to the github issues, by I can't get useful feedback.( kaldi model inference speed · Issue #6402 · openvinotoolkit/openvino (github.com) ). When I compared the kaldi TDNN model inference speed between openVINO and kaldi,  it's faster setting chunk size 1 in kaldi script. But setting chunk size larger, openVINO is much more lower, and I also found openVINO speed did not change with different cpu utilization. Details can be seen in link of above issue. Did I need to tune the plugin param to get some performance improvement ?

0 Kudos
1 Solution
JesusE_Intel
Moderator
874 Views

Hi ArtemisZGL,


In order to compare OpenVINO inference results to nnet3-compute the speech_sample will need to be updated to support frames_per_chunk. This is something the development team is currently exploring, I don't have an ETA when this will be available.


I've updated your GitHub issue with my benchmark_app results. Let me know if we can close this discussion and continue on your original post.


Regards,

Jesus


View solution in original post

0 Kudos
6 Replies
ArtemisZGL
Beginner
930 Views

According to the code in speech sample and my experiment. Can I assume that the converted Kaldi model does not support chunk input ? Maybe we should specific the chunk size when converting the model, now it seems just take it as 1 when converting the model. This makes we only can inference frame by frame ( if context is not 0), which make the inference too slow to use compared chunk input in Kaldi.

0 Kudos
Iffa_Intel
Moderator
913 Views

Greetings,


If you are aiming for better performance (inference speed etc) it is recommended to implement Model Optimization.

You could try out the Post Training Optimization (POT) where it is designed to accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, like post-training quantization.


This is the official documentation: https://docs.openvinotoolkit.org/latest/pot_README.html


Another thing that you could utilize is batching. You can use multiple input (eg 3 RGB images) where these images are packed together so that inferencing could be done on them simultaneously. Response time would be slower but the efficiency (fps) may be higher.


You may see the concept here: https://www.youtube.com/watch?v=Ga8j0lgi-OQ




0 Kudos
ArtemisZGL
Beginner
901 Views

Thanks for your reply. I know using batch-size can improve the performance, but the converted model is time-related, which means the t times inference result depends on t-1 and more previous state. So batch-size only can be used in different sample, but the chunk size I mentioned above is a set of data in the same sample, thus using chunk as input can improve the inference speed for one sample, batch-size only improve the inference speed for multi-sample.

0 Kudos
JesusE_Intel
Moderator
875 Views

Hi ArtemisZGL,


In order to compare OpenVINO inference results to nnet3-compute the speech_sample will need to be updated to support frames_per_chunk. This is something the development team is currently exploring, I don't have an ETA when this will be available.


I've updated your GitHub issue with my benchmark_app results. Let me know if we can close this discussion and continue on your original post.


Regards,

Jesus


0 Kudos
ArtemisZGL
Beginner
858 Views

Thanks for your reply. The support of frames_per_chunk is really helpful, looking forward to the good newest ! 

0 Kudos
Iffa_Intel
Moderator
851 Views

Hi,


Intel will no longer monitor this thread since this issue has been resolved. If you need any additional information from Intel, please submit a new question



Sincerely,

Iffa


0 Kudos
Reply