- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A pruned model could easily have a size of about 1/10th of the original model. However, there doesn't seem to be much latency improvement when running inference on a pruned model versus the original one. 1. Why would that be the case? 2. Are there any plans for speeding up the processing of pruned models? Thank you very much for your help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Generally, using OpenVINO Model Optimizer to convert a native model into Intermediate Representation (IR) would improve the performance of the model since it's been optimized during the conversion process.
You may refer here for further info: https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html
Besides, the OpenVINO Post-Training Optimization Tool (POT) could be used to accelerate the inference of DL models by applying special methods without model retraining.
This is the official guide:
https://docs.openvinotoolkit.org/latest/pot_README.html
Sincerely,
Iffa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.
Sincerely,
Iffa
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page