Re:LSTM CudnnRNNV3 Translation, Workaround Produces Biased Predictions

Xanph · ‎10-01-2025

With thanks to @Peh_Intel for his help in debugging so far, I'm continuing this issue from Deprecation of Tensorflow CudnnRNNV3 on conversion to IR Format

My Development System Info (training and converting model to IR):
OS: Ubuntu 24.04.3 LTS
Kernel: 6.14.0-32-generic

Dependencies
OpenVINO Version: 2025.3.0-19807-44526285f24-releases/2025/3
Keras: 3.8.0
Tensorflow: 2.16.1
NNCF: 2.18.0
Numpy: 1.26.4

NVIDIA driver: 550.163.01
CUDA: 12.3.107
cuDNN: 8.9.7

System
CPU: AMD Ryzen 9 5950X 16-Core Processor
GPUs: 1x NVIDIA RTX3070, 1x NVIDIA RTX 3060Ti

Production System Info (inference use):
OS: Ubuntu 24.04.3 LTS
Kernel: 6.8.0-84-generic

(Running inside of a docker container, base image Ubuntu 24.04 latest)

Docker image contains gstreamer, OpenCV built from source, pygobject, cython, pycario, Intel OpenCL ICD (just to note a few).

Dependencies
OpenVINO Version: 2025.3.0-19807-44526285f24-releases/2025/3

Intel driver: i915

System
CPU: Intel(R) Xeon(R) E-2124 CPU @ 3.30GHz
GPUs: 1x Intel ARC A580 rev08

---

Summary

Root Problem
My .keras binary classification model uses a bidirectional LSTM layer

Bidirectional(LSTM(32, return_sequences=True))

which when converted, produces a "no translator found for operation(s): CudnnRNNV3" internal error. Back when I created issue 'Deprecation of Tensorflow CudnnRNNV3 on conversion to IR Format', the solution was to downgrade to TF 2.16.1. Since then (September 2025), TF 2.16.1 now uses CudnnRNNV3 operations that fail conversion, whereas previously it worked.

To test, I added:

Bidirectional(LSTM(32, return_sequences=True, recurrent_dropout=0.1))

to the LSTM layer to avoid the use of cuDNN. Model conversion to IR then worked, but would then produce predictions on the production system that have heavy bias to 0.0001.

Model conversion code used:

model_name = f"{version}_best_model"
model_path = f'models/best_keras/{model_name}.keras'
model = keras.models.load_model(model_path)

# Save the model in the SavedModel format
saved_model_dir = f'models/tensorflow/{model_name}_tf'
model.export(saved_model_dir, format='tf_saved_model')

# Convert the SavedModel to OpenVINO IR format with multiple outputs
ir_model = ov.convert_model(
saved_model_dir,
)

# Save the converted IR model
output_dir = "models/intermediate_representation"
ov.save_model(ir_model, f"{output_dir}/ir_{version}.xml")

Just sharing an observation, I wonder if Tensorflow's inclusion of CUDA in the pip package is causing this problem, that's all I think has changed. I have re-installed all dependencies and attempted different versions.

Secondary Problem
Speaking with Peh, sharing code and model files, he converted the non-recurrent dropout keras model with CPU only. Testing this on the production system the predictions are also heavily one sided toward a class - all values at 0.9+. However, when I tested the model on CPU only, the predictions were accurate.

I did the same test on the model with recurrent dropout and it was the same result.

Peh suggested to add:

ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
compile_model = core.compile_model(“model.xml”, “GPU”,ov_config)

in to the production system's code.

With this, the converted model Peh provided did produce better predictions, but without the accuracy I'm expecting - predictions range from 0.1964 to 0.5981 with a mean of 0.4245 and standard deviation 0.1119.

Expected predictions would be a range of 0.0004 to 0.998 with the mean around 0.2279 and deviation at 0.2778.

I tested this code with the recurrent dropout model and still experienced the same prediction bias (toward 0.0001).

---

So to summarise those two problems

I still get cudnnRNNV3 errors when not using recurrent dropout.
I'd currently have to rely on Peh to convert my model (or use recurrent_dropout), and that converted model has weak accuracy or no convergence.

From a few months back (26th March 2025), when I created the first issue, I do have a working IR model, with successful optimisation. This model was created from a .h5 version, and would have used OpenVINO v2024.3.0.

I'm happy to provide model and code directly.

With thanks.

Best regards,
Xanph

Peh_Intel · ‎10-01-2025

Hi Xanph,

Thanks for your detailed description of the issue. Yes, please do share the models and codes also for better investigation. You can send those files to me privately if you don’t expose them publicly.

Regards,

Peh

Xanph · ‎10-02-2025

Hello Peh,

Thanks, all files remain the same as the ones I sent privately in our existing conversation.

You mentioned about some further investigation needed from the dev team?

Best regards,

Xanph

Peh_Intel · ‎10-02-2025

Hi Xanph,

I only have your models only. If you able to provide your inferencing script in justifying the predictions would be great.

Regards,

Peh

Xanph · ‎10-03-2025

No problem Peh,

Inference files sent ️

Best regards,

Flynn.

Peh_Intel · ‎10-14-2025

Hi Xanph,

I have received your shared files. We will investigate this matter further and get back to you at the earliest.

Regards,

Peh

Xanph · ‎10-15-2025

With thanks to you and the team,

Xanph