- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a small neural network converted into OpenVINO format (158KB bin + 16 KB xml file). When I load it onto the CPU, using Python on Windows, and run a single inference, it consumes over 7 GB of memory. FP16 compressed model seems to be giving the same results.
What can I do to reduce memory consumption? I went through the manual, hoping there would be something about batch size or number of threads, but I couldn't find anything useful. I want to run inference on AWS Lambda, so I need to lower the memory consumption
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wsla1,
Thank you for reaching out to us.
For memory usage optimization, you can refer to the OpenVINO™ Toolkit Optimizing memory usage page. You might also want to check out the Advanced Throughput Options: Streams and Batching for details on OpenVINO™ Batch and Stream.
In addition, please refer to the OpenVINO™ Python Tutorials on configuring inference threads here.
On another note, I ran a Python Benchmark on the FP16 face-detection-retail-0005 (1,994 KB bin + 220 KB xml) model and it only uses 130.4 MB of memory. Could you please provide us more details (OpenVINO™ Version and CPU name), with the model that you used so that we can investigate further?
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wsla1,
Thank you for your question. This thread will no longer be monitored since we have provided a suggestion. If you need any additional information from Intel, please submit a new question
Regards,
Megat
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page