We are trying to develop Kaldi model in Chinese and trying to verified on Live Speech Recognition Demo application.
However, we found that Live Speech Recognition performance is worse than Offline Speech Recognition.
We check the code and found that getResult() is always query results from decoder module inside Speech Library, which means no source revealed.
We are a little bit curious how score and decoder module to judge a stable result and would like to know if there has any language dependent, especially for the correction mechanism.
It will be appreciate if source code can be shared, if not, please let us know if this is language dependent and if the behavior can be changed according to the language?
The Speech library source code is available in the <OPENVINO_DIR>\data_processing\audio\speech_recognition\src folder.
Check out this Speech Library documentation also.
Hope this information helps.
Thanks for the response.
Could you let us know where we can find the source code of the method inside <OPENVINO_DIR>\data_processing\audio\speech_recognition\include\speech_decoder.h?
The methods that you’re requesting are defined in the core binary libraries (decoder_library.dll) in the lib folder which the source code is not available to be shared.
Based on the documentation, Intel Speech Decoder turns the phonemes into text hypothesis. The decoding graph takes into account the grammar of the data, as well as the distribution and probabilities of contiguous specific words (n-grams).
On a separate note, it will be greatly appreciated if you can share your Chinese model with us as the demo models are trained for US English.