You have to load the model only once for inference. Please refer object_detection_demo_ssd_async to load the model to the device, create infer request and to do inference on a video file.
Which model are you using? On which target device are you trying to load the model?
Different hardware platforms uses different optimization semantics and different plugins, so the time it takes to load the model also differs.
For more information, please refer to Device-Specific Optimization Guide to improve the performance.
Yes, I already tried to run that example. But in my case, I am trying to integrate this code in separate C# code. For now I run this code in the C# using a DLL but the problem is that every time I run this C# code, I always execute the loading of the model onto the device, which is taking too much time. I wonder if there is a way to just execute this once and for the next run, only the inference part will be executed.
I am using a YOLO model and CPU as the target device.
OpenVINO currently supports only C++ and Python. It doesn't provide any C# API wrappers. It seems like the DLL that you are using is the reason why it is taking more time to load the model.
Also, the model has to be loaded to IE, once in each run to do the inference part.
I already compared calling the DLL vs. running it directly in C++ but the DLL does not introduce more time in loading the model.
So, right now, there is no way that I just want to load the inference engine once, and then perform series of inference? I am thinking if the initialization part of the Inference Engine can be saved in the memory or stuff like that.
I badly need to do implement this kind of method in order to reduce the processing time.