Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6580 Discussions

LSTM CudnnRNNV3 Translation, Workaround Produces Biased Predictions

Xanph
Novice
874 Views

With thanks to @Peh_Intel  for his help in debugging so far, I'm continuing this issue from Deprecation of Tensorflow CudnnRNNV3 on conversion to IR Format 

 

My Development System Info (training and converting model to IR):
OS: Ubuntu 24.04.3 LTS
Kernel: 6.14.0-32-generic


Dependencies
OpenVINO Version: 2025.3.0-19807-44526285f24-releases/2025/3
Keras: 3.8.0
Tensorflow: 2.16.1
NNCF: 2.18.0
Numpy: 1.26.4

NVIDIA driver: 550.163.01
CUDA: 12.3.107
cuDNN: 8.9.7

System
CPU: AMD Ryzen 9 5950X 16-Core Processor
GPUs: 1x NVIDIA RTX3070, 1x NVIDIA RTX 3060Ti

 

Production System Info (inference use):
OS: Ubuntu 24.04.3 LTS
Kernel: 6.8.0-84-generic

(Running inside of a docker container, base image Ubuntu 24.04 latest)

Docker image contains gstreamer, OpenCV built from source, pygobject, cython, pycario, Intel OpenCL ICD (just to note a few).


Dependencies
OpenVINO Version: 2025.3.0-19807-44526285f24-releases/2025/3

Intel driver: i915

System
CPU: Intel(R) Xeon(R) E-2124 CPU @ 3.30GHz
GPUs: 1x Intel ARC A580 rev08

---

 

Summary

Root Problem
My .keras binary classification model uses a bidirectional LSTM layer

Bidirectional(LSTM(32, return_sequences=True))

which when converted, produces a "no translator found for operation(s): CudnnRNNV3" internal error. Back when I created issue 'Deprecation of Tensorflow CudnnRNNV3 on conversion to IR Format', the solution was to downgrade to TF 2.16.1. Since then (September 2025), TF 2.16.1 now uses CudnnRNNV3 operations that fail conversion, whereas previously it worked.

To test, I added:

Bidirectional(LSTM(32, return_sequences=True, recurrent_dropout=0.1))

to the LSTM layer to avoid the use of cuDNN. Model conversion to IR then worked, but would then produce predictions on the production system that have heavy bias to 0.0001.

Model conversion code used:

model_name = f"{version}_best_model"
model_path = f'models/best_keras/{model_name}.keras'
model = keras.models.load_model(model_path)

# Save the model in the SavedModel format
saved_model_dir = f'models/tensorflow/{model_name}_tf'
model.export(saved_model_dir, format='tf_saved_model')

# Convert the SavedModel to OpenVINO IR format with multiple outputs
ir_model = ov.convert_model(
saved_model_dir,
)

# Save the converted IR model
output_dir = "models/intermediate_representation"
ov.save_model(ir_model, f"{output_dir}/ir_{version}.xml")

 

Just sharing an observation, I wonder if Tensorflow's inclusion of CUDA in the pip package is causing this problem, that's all I think has changed. I have re-installed all dependencies and attempted different versions.

 

Secondary Problem
Speaking with Peh, sharing code and model files, he converted the non-recurrent dropout keras model with CPU only. Testing this on the production system the predictions are also heavily one sided toward a class - all values at 0.9+. However, when I tested the model on CPU only, the predictions were accurate.

I did the same test on the model with recurrent dropout and it was the same result.

Peh suggested to add:

ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
compile_model = core.compile_model(“model.xml”, “GPU”,ov_config)

in to the production system's code.

With this, the converted model Peh provided did produce better predictions, but without the accuracy I'm expecting - predictions range from 0.1964 to 0.5981 with a mean of 0.4245 and standard deviation 0.1119.

Expected predictions would be a range of 0.0004 to 0.998 with the mean around 0.2279 and deviation at 0.2778.

I tested this code with the recurrent dropout model and still experienced the same prediction bias (toward 0.0001).

 

---

 

So to summarise those two problems

  1. I still get cudnnRNNV3 errors when not using recurrent dropout.
  2. I'd currently have to rely on Peh to convert my model (or use recurrent_dropout), and that converted model has weak accuracy or no convergence.

From a few months back (26th March 2025), when I created the first issue, I do have a working IR model, with successful optimisation. This model was created from a .h5 version, and would have used OpenVINO v2024.3.0.

I'm happy to provide model and code directly.

 

With thanks.

 

Best regards,
Xanph

0 Kudos
6 Replies
Peh_Intel
Moderator
703 Views

Hi Xanph,


Thanks for your detailed description of the issue. Yes, please do share the models and codes also for better investigation. You can send those files to me privately if you don’t expose them publicly.



Regards,

Peh


0 Kudos
Xanph
Novice
679 Views

Hello Peh,

 

Thanks, all files remain the same as the ones I sent privately in our existing conversation.

 

You mentioned about some further investigation needed from the dev team?

 

Best regards,

Xanph

0 Kudos
Peh_Intel
Moderator
653 Views

Hi Xanph,


I only have your models only. If you able to provide your inferencing script in justifying the predictions would be great.



Regards,

Peh


0 Kudos
Xanph
Novice
632 Views

No problem Peh,

 

Inference files sent 

 

Best regards,

Flynn.

0 Kudos
Peh_Intel
Moderator
319 Views

Hi Xanph,


I have received your shared files. We will investigate this matter further and get back to you at the earliest.



Regards,

Peh


Xanph
Novice
290 Views

With thanks to you and the team,

 

Xanph

0 Kudos
Reply