High memory consumption using Python

wsla1 · ‎12-04-2023

I have a small neural network converted into OpenVINO format (158KB bin + 16 KB xml file). When I load it onto the CPU, using Python on Windows, and run a single inference, it consumes over 7 GB of memory. FP16 compressed model seems to be giving the same results.

What can I do to reduce memory consumption? I went through the manual, hoping there would be something about batch size or number of threads, but I couldn't find anything useful. I want to run inference on AWS Lambda, so I need to lower the memory consumption

Megat_Intel · ‎12-05-2023

Hi Wsla1,

Thank you for reaching out to us.

For memory usage optimization, you can refer to the OpenVINO™ Toolkit Optimizing memory usage page. You might also want to check out the Advanced Throughput Options: Streams and Batching for details on OpenVINO™ Batch and Stream.

In addition, please refer to the OpenVINO™ Python Tutorials on configuring inference threads here.

On another note, I ran a Python Benchmark on the FP16 face-detection-retail-0005 (1,994 KB bin + 220 KB xml) model and it only uses 130.4 MB of memory. Could you please provide us more details (OpenVINO™ Version and CPU name), with the model that you used so that we can investigate further?

Regards,

Megat

Megat_Intel · ‎12-11-2023

Hi Wsla1,

Thank you for your question. This thread will no longer be monitored since we have provided a suggestion. If you need any additional information from Intel, please submit a new question

Regards,

Megat