I have a pytorch CNN model which needs to store and retrieve last conv states between low-latency inferences for custom padding. Onnx does not support Assign/ReadValue ops and InferenceEngine::LowLatency() works only with LSTMs. What would be the recommended way to add Assign/ReadValue ops at arbitrary places so that the states I need are stored between inferences?
Thank you for waiting. Just a quick check, have you tried exporting the Pytorch model with custom op to ONNX? Please refer to the following page. Then, please refer to the following article. The article is on how to add support for an unsupported layer in the OpenVINO toolkit.
OpenVINO contains a special API to simplify work with networks with states. The state is automatically saved between inferences, and there is a way to reset the state when needed. You can also read the state or set it to some new value between inferences.
Inference Engine has the InferRequest::QueryState method to get the list of states from a network and IVariableState interface to operate with states. Please refer to the following link:
Using several threads is possible if you have several independent sequences. Then each sequence can be processed in its own infer request. Please note that inference of one sequence in several infer requests is not recommended. You can refer to the following link for an example of stateful network inference:
Member Function of GetState(), Reset() and SetState() usage might be helpful for your situation:
My understanding was that GetState/SetState work as long as there are ReadValue/Assign ops in proper places. And if that's the case, I do not need GetState/SetState anyway, since state will be saved automatically.
Are you saying that GetState/SetState work on any node, without ReadValue/Assign ops present in IR?
Thank you for your patience. Based on the documentation:
- GetState/SetState -> returns current value of state/set new value for state
- ReadValue/Assign -> return/assign value
It is possible for the GetState/SetState to work on any node without ReadValue/Assign operation present in IR. In the case of low-latency transformation, the transformation will not involve the state. Rather use ReadValue/Assign to interact with the value.
This thread will no longer be monitored since we have provided references and recommendations. If you need any additional information from Intel, please submit a new question.