OpenVINO Support for speech synthesis models(eg. Tacotron)

Menon__Sujeendran · ‎02-28-2019

Hi,

I am currently trying to get Tacotron or similar speech synthesis networks to run using OpenVINO as part of my Master's Thesis and for my workplace. I have been facing a lot of issues trying to generate a IR from the Tensorflow model obtained from https://github.com/keithito/tacotron to see if it is feasable. Does the OpenVINO R5 release support such a model and could you please let me know if there are other options available for speech synthesis using OpenVINO?

PS: It is really not easy to contact support in this site.

Menon__Sujeendran · ‎03-12-2019

Hi, Can anyone provide a solution for this issue? I managed to get the DeepSpeech model converted in Ubuntu VM (failed in Windows 10) and since the model has somewhat similar components tried a similar approach.

python3 ./mo_tf.py --input_model /home/user/Desktop/Sharedfolder/tacofrozen.pb --freeze_placeholder_with_value "input_lengths->[16]" --input inputs --input_shape [16,256] --output model/griffinlim/Squeeze
Model Optimizer arguments:
Common parameters:
    - Path to the Input Model:    tacofrozen.pb
    - Path for generated IR:     /home/user/MO/.
    - IR output name:     tacofrozen
    - Log level:     ERROR
    - Batch:     Not specified, inherited from the model
    - Input layers:     inputs
    - Output layers:     model/griffinlim/Squeeze
    - Input shapes:     [16,256]
    - Mean values:     Not specified
    - Scale values:     Not specified
    - Scale factor:     Not specified
    - Precision of IR:     FP32
    - Enable fusing:     True
    - Enable grouped convolutions fusing:     True
    - Move mean values to preprocess section:     False
    - Reverse input channels:     False
TensorFlow specific parameters:
    - Input model in text protobuf format:     False
    - Offload unsupported operations:     False
    - Path to model dump for TensorBoard:     None
    - List of shared libraries with TensorFlow custom layers implementation:     None
    - Update the configuration file with input/output node names:     None
    - Use configuration file used to generate the model with Object Detection API:     None
    - Operations to offload:     None
    - Patterns to offload:     None
    - Use the config file:     None
Model Optimizer version:     1.5.12.49d067a0
[ ERROR ]  Cannot infer shapes or values for node "model/griffinlim/stft_30/hann_window/Cast_1".
[ ERROR ]  Input 0 of node model/griffinlim/stft_30/hann_window/Cast_1 was passed int64 from model/griffinlim/stft_30/hann_window/sub_1_port_0_ie_placeholder:0 incompatible with expected int32.
[ ERROR ]
[ ERROR ]  It can happen due to bug in custom shape infer function <function tf_native_tf_node_infer at 0x7fa7ca6820d0>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ]  Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ]  Stopped shape/value propagation at "model/griffinlim/stft_30/hann_window/Cast_1" node.
For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.

Shubha_R_Intel · ‎03-18-2019

Dear Sujeendran:

It looks like Tacotron is a GRU-based model (as opposed to LSTM). Look for a possible future release to support Tacotron. We should have GRU support in a near-term upcoming release, but, this particular Tacotron model has a complicated decoder part which currently is not supported. All I can say for now is, stay tuned and sorry for the inconvenience !

Sincerely,

Shubha

Menon__Sujeendran · ‎03-18-2019

Thanks for the response Shubha! Although it was unpleasant to know that GRU support is currently not present, do you think there is any other models that might support speech synthesis on OpenVINO at present that I can look into? Like the following or any other options:
1. DeepVoice3: https://github.com/candlewill/AiVoice
2. VoiceLoop: https://github.com/facebookresearch/loop (After possible conversion from PyTorch to ONNX)

Regards,

Sujeendran

Menon__Sujeendran · ‎04-03-2019

Shubha R. (Intel) wrote:
Dear Sujeendran:
It looks like Tacotron is a GRU-based model (as opposed to LSTM). Look for a possible future release to support Tacotron. We should have GRU support in a near-term upcoming release, but, this particular Tacotron model has a complicated decoder part which currently is not supported. All I can say for now is, stay tuned and sorry for the inconvenience !
Sincerely,
Shubha

Dear Shubha,

I had managed to change the GRU cells to LSTM in the model I had mentioned to you and it still worked well in Tensorflow tests after training. But I am still not able to get the IR with the Model Optimizer. As far as I read about the support in R5 build of OpenVINO, the model currently doesn't have any unsupported layers or operations. It will be even fine to separate the Griffin-Lim decoder part of the model as I will try to implement it outside the network.

Is there any way I can maybe send you in mail the model file and the details so you can try the conversion process?

Shubha_R_Intel · ‎04-03-2019

Dear Sujeendran,

Sure. I have PM'd you so that you can respond and send me the zipped up model file and other details.

Thanks !

Shubha