According to our understanding, it is not intuitive to use Model Optimizer (mo_kaldi.py) to convert Kaldi Nnet3 Model input and provide it(Kaldi Nnet3 Model) to the application demo (Live Speech Recognition Demo). In addition to using iVector as a common feature input, nnet3 also uses the following Layers, but I know that support of the nnet3 format is limited.Layers (BatchNorm, Dropout, GeneralDropout, NoOp, SpecAugmentTimeMask, FixedAffine, Linear, LogSoftmax, NaturalGradientAffine, RectifiedLinear, Tdnn)
The common and latest technology of the Kaldi toolkit is developed on Nnet3 architecture, and the training of Nnet1 architecture is quite time-consuming and the training effect is not as good as the Nnet3 model. If the Kaldi Nnet3 architecture could have more comprehensive support in the future, it will be able to provide the R & D team with a more friendly development environment. Also, it will be easier to popularize and apply on the edge application because the traditional speech recognization model (Kaldi Model) is more stable than the E2E Model.
Thanks for reaching out to us.
We are checking on this with our development team and update you with the information soon.
Meanwhile, for the iVector feature input, you might want to try implementing the
Convert ASpIRE Chain TDNN Model to IR method and see if this can work on your model.
Update subject as follow.
Is it possible to support the Nnet3 Model fully and provide a simple inference pipeline like the Nnet1 model on the online demo in the future since Nnet1 is a long time ago architecture?
This thread will no longer be monitored since we have provided the information. If you need any additional information from Intel, please submit a new question.