Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6574 Discussions

Runtime error after exporting Whisper large-v3 and inconsistent Chinese output

chiao
Beginner
846 Views

Hi Intel team,

Our customer is working with Whisper large-v3 models for speech recognition. They tried exporting the original Hugging Face model using the following command:
optimum-cli export openvino --model openai/whisper-large-v3 --task automatic-speech-recognition openvino/whisper-large-v3-ov

The export process completed successfully, but during runtime they encountered the following error:

RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:188:
Check '::getPort(port, name, {_impl->get_inputs(), _impl->get_outputs()})' failed at src/inference/src/cpp/infer_request.cpp:190:
Port for tensor name cache_position was not found.

 

In addition, the customer also tested the pre-converted model provided on Hugging Face:
OpenVINO/whisper-large-v3-fp16-ov

It can output Chinese successfully in some cases, but they frequently observe that when the input is in Chinese, Spanish, or Japanese, the transcription output is in English instead of the original language (though the meaning is still correct).

We would like to ask:

1.How can we resolve the cache_position runtime error when exporting openai/whisper-large-v3 with Optimum-Intel?

2.Are there recommended workarounds or fixes to ensure proper multilingual support (Chinese, Japanese, Spanish, etc.)?

3.Could you recommend other Whisper or ASR models that can reliably support Chinese transcription?

Thank you for your support.

 

 

 

0 Kudos
1 Reply
Peh_Intel
Moderator
346 Views

Hi chiao,

 

1)     Export the openai/whisper-large-v3 into IR without specifying the task. This generated IR can be run successfully.

optimum-cli export openvino --model openai/whisper-large-v3 whisper-large-v3

speech.png

whisper_speech_recognition.py

how_are_you_doing_today.wav

 

2)     Try to specify config language to <|zh|> for Chinese language.

 

3)     I don’t have a recommendation to share, but I found FireRedASR model in my search which might be helpful.

 

 

Regards,

Peh

 

0 Kudos
Reply