memory consumption - runing on CPU with open vino 2023.3

Yael1 · ‎01-06-2025

I use python 3.8, on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 2.19 GHz (2 processors)

I have model with dynamic input, saved with ONNX.

I run OPEN Vino 2023.3 to accelerate performence on CPU.

The inference time is improved. but the memory consumtion is very high and keep climbing at each inference, till the limit of 25GB.

Is there a way to improve it or clear the memory after each inference, or to control the limit?

It seem when it get to this limit it clears the memory.

Wan_Intel · ‎01-06-2025

Hi Yael1,

Thanks for reaching out to us.

You may refer to the OpenVINO™ Toolkit Optimizing Memory Usage for memory usage optimization during inference.

Regards,

Wan

dusktilldawn · ‎01-07-2025

The issue you're encountering — high and steadily increasing memory consumption during inference with OpenVINO 2023.3 on your

from openvino.inference_engine import IECore
ie = IECore()
net = ie.read_network(model="model.xml", weights="model.bin")
exec_net = ie.load_network(network=net, device_name="CPU", num_requests=1)

You can set precision to FP16 when loading the model, like so:

exec_net = ie.load_network(network=net, device_name="CPU", config={"CONFIG_KEY": "FP16"})

Use OpenVINO Model Optimizer:

The OpenVINO Model Optimizer can help convert models to a more efficient format. Using the Model Optimizer may lead to a reduction in memory usage.

python3 /opt/intel/openvino_2023.3/deployment_tools/model_optimizer/mo.py --input_model your_model.onnx

Try using OpenVINO’s Auto mode to automatically adjust the precision based on the capabilities of the CPU. This can help reduce memory overhead while maintaining performance.

Control Memory Limits

You can try to control the memory usage through system-level options or by setting environment variables for OpenVINO.

Environment variables for OpenVINO: OpenVINO exposes certain environment variables that can affect memory usage:

OPENVINO_INFERENCE_THREADS — Limits the number of threads used during inference, which can control memory consumption.

OPENVINO_CPU_BIND — Controls CPU core binding, which could indirectly affect memory usage.

export OPENVINO_CPU_BIND=ALL

Or for multi-threaded inference:

export OPENVINO_INFERENCE_THREADS=4

These variables may help fine-tune memory usage by managing the number of active threads and CPU core utilization.

Use OpenVINO's "Inference Engine" Memory Optimizations

If you're using Inference Engine for executing models, OpenVINO provides an option to optimize how memory is used during inference. You can specify the memory layout and execution parameters to ensure that resources are efficiently allocated.

For instance, you might want to try using ExecNetwork and managing the number of requests (asynchronous inference) to reduce memory spikes.

exec_net = ie.load_network(network=net, device_name="CPU", num_requests=2)

Wan_Intel · ‎01-18-2025

Hi dusktilldawn,

Thanks for sharing in the OpenVINO™ community!

Hi Yael1,

Thanks for your question.

The suggestion provided by dusktilldawn above such as utilize OpenVINO™ Model Optimizer and OpenVINO™ Inference Engine APIs can be useful if you are utilizing OpenVINO™ toolkit.

Please refer to the following links for the tutorials on using the latest OpenVINO™ Model Optimizer and OpenVINO™ Inference Engine APIs:

If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.

Regards,

Wan