I'm using the person-detection-action-recognition-0006 model with a Neural Compute Strick 2 and a raspberry pi 4. I'm able to load the model in the NCS and getting the results of the branches but I do not know how to use them to compute the bounding boxes and the label of the recognized action. I'm using Python and I'm not fluent in C++ so the SmartClassroomDemo is not of any help. Can someone help me or point me in the right direction?
I'm attaching the code I'm using to load the model and to get the outps.
Thanks in Advance
May I know you are using this person-detection-action-recognition-0006 model with which OpenVINO sample application?
I suggest you try this out with your model : https://www.youtube.com/watch?v=OKH57mvO9k0
Good morning and thanks for the quick response. I'm not using the model inside one of the sample applications provided. I need to run the person-detection-action-recognition-0006 on a raspberry pi 4. So I installed the openvino toolkit on it. I already used the person and vehicle detection model in python but the output description was very straightforward so I had no problems interpreting the results. I need to detect people inside a video streaming and classify the action: "sitting, standing, raising hand" etc. Unfortunately, I'm able to get the results but I need help interpreting those results that are organized in branches.
Hope someone can help me.
Thanks in Advance
I'm sorry for the double answer. I also tried to use the object_detection_demo_ssd_async.py provided in the python demos folder. In the models.lst there is a voice person-detection-????. I thought that the person-detection-action-recognition-0006 could be used but when I try to run the script an error occurs saying :" unsupported model outputs".
In addition to that,
You need to train the model to recognize the actions that you expect as outcomes.
Hence, you need to feed the model with appropriate images.
Here is some idea on how you can do that: https://www.youtube.com/watch?v=ACtMs0ETpBg
Once you feed the model with appropriate images, you may get the output as here: https://docs.openvinotoolkit.org/2020.3/_models_intel_person_detection_action_recognition_0006_description_person_detection_action_recognition_0006.html
Thanks for the effort
I'm already using the person-detection-action-recognition-0006 pre trained model. I have loaded it inside the NCS and afforded to get the results regarding a test picture I use as input. The problem is that I do not understand how to compute the bounding boxes and the classes from the results.
The OpenVINO samples usually automatically done it.
When inference is done, the application outputs data to the standard output stream and creates an output image with bounding boxes drawn atop the initial image.
The only example that use this model is the Smart Classroom Demo that is only in C++ and use also other pretrained models. Unfortunately, I'm not fluent in C++ and I have to use this model with the python Api. For this reason I'm asking if there is some example/tutorial for python to interpret the results of the person-detection-actoin-recognition-0006 in python or if someone can point me in the right direction to do it myself.
Thanks in advance
Currently, we only have Smart Classroom demo which is written on C++ https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/smart_classroom_demo
And unfortunately so far it has not been ported to Python.
Alternatively, you could take a look at similar but yet different action recognition algorithm implemented in 2 steps (and thus 2 models: encoder and decoder) and Python application:
Action Recognition demo - https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/python_demos/action_recognition
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.