Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ArtemisZGL
Beginner
204 Views

About kaldi model inference speed

Jump to solution

My question has been posted to the github issues, by I can't get useful feedback.( kaldi model inference speed · Issue #6402 · openvinotoolkit/openvino (github.com) ). When I compared the kaldi TDNN model inference speed between openVINO and kaldi,  it's faster setting chunk size 1 in kaldi script. But setting chunk size larger, openVINO is much more lower, and I also found openVINO speed did not change with different cpu utilization. Details can be seen in link of above issue. Did I need to tune the plugin param to get some performance improvement ?

0 Kudos
1 Solution
JesusE_Intel
Moderator
135 Views

Hi ArtemisZGL,


In order to compare OpenVINO inference results to nnet3-compute the speech_sample will need to be updated to support frames_per_chunk. This is something the development team is currently exploring, I don't have an ETA when this will be available.


I've updated your GitHub issue with my benchmark_app results. Let me know if we can close this discussion and continue on your original post.


Regards,

Jesus


View solution in original post

6 Replies
ArtemisZGL
Beginner
191 Views

According to the code in speech sample and my experiment. Can I assume that the converted Kaldi model does not support chunk input ? Maybe we should specific the chunk size when converting the model, now it seems just take it as 1 when converting the model. This makes we only can inference frame by frame ( if context is not 0), which make the inference too slow to use compared chunk input in Kaldi.

Iffa_Intel
Moderator
174 Views

Greetings,


If you are aiming for better performance (inference speed etc) it is recommended to implement Model Optimization.

You could try out the Post Training Optimization (POT) where it is designed to accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, like post-training quantization.


This is the official documentation: https://docs.openvinotoolkit.org/latest/pot_README.html


Another thing that you could utilize is batching. You can use multiple input (eg 3 RGB images) where these images are packed together so that inferencing could be done on them simultaneously. Response time would be slower but the efficiency (fps) may be higher.


You may see the concept here: https://www.youtube.com/watch?v=Ga8j0lgi-OQ




ArtemisZGL
Beginner
162 Views

Thanks for your reply. I know using batch-size can improve the performance, but the converted model is time-related, which means the t times inference result depends on t-1 and more previous state. So batch-size only can be used in different sample, but the chunk size I mentioned above is a set of data in the same sample, thus using chunk as input can improve the inference speed for one sample, batch-size only improve the inference speed for multi-sample.

JesusE_Intel
Moderator
136 Views

Hi ArtemisZGL,


In order to compare OpenVINO inference results to nnet3-compute the speech_sample will need to be updated to support frames_per_chunk. This is something the development team is currently exploring, I don't have an ETA when this will be available.


I've updated your GitHub issue with my benchmark_app results. Let me know if we can close this discussion and continue on your original post.


Regards,

Jesus


View solution in original post

ArtemisZGL
Beginner
119 Views

Thanks for your reply. The support of frames_per_chunk is really helpful, looking forward to the good newest ! 

Iffa_Intel
Moderator
112 Views

Hi,


Intel will no longer monitor this thread since this issue has been resolved. If you need any additional information from Intel, please submit a new question



Sincerely,

Iffa


Reply