Openvino multi target Reidentification threshold limitation that we need to overcome

Duggy · ‎10-10-2021

Hi,

We are having an "issue" with the Reidentification model in openVino used in the multi target, multi camera demo. This demo was not meant to be deployed to reidentify hundreds of people (understandable). It has a threshold parameter which resets after X amount of seconds. If the tracker object (for Reidentification) get "full" due to large number of people or a high threshold parameter then the fps slows down or the entire application slows down to a halt.

Understandably this demo was not meant to run in production for hundreds of people. However this is what we are looking to achieve.

What we are looking for then are suggestions/assistance/developers who can change/create/solve this reid issue to allow for reid for hundreds of people, whether its to upgrade the current model or create a new one. This does not have to happen in real time, can happen in parallel or "after hours", all our current attempts at it or finding solutions on sites like Fiverr etc have come up short.

Much appreciated.

Vladimir_Dudnik · ‎10-11-2021

@Duggy the problem is not in reidentification network inference time, it is relatively lightweight network. Network itself does not reidentify objects, it just output vector of features, and then having hundreds of such vectors at each frame (for hundreds of tracked objects) you have to compute cosine distance between each of these vectors from current frame and each of these vectors on previous frame. Those which which have minimal cosine distance at current frame comparing to previous frame most probably belongs to the same object. That is just a lot of computations beside network inference.

Duggy · ‎10-12-2021

Thank you Vladimir,

So just checking I understand this correctly.

For each frame (current and previous) all objects get vector features generated, this is the lightweight process. Then the cosine distance of each vector is calculated (which basically means how similar each vector is to each other) - if they are similar (some threshold) then they are said to be the same object. This is irrespective if its done in the same camera or in another camera, the system is oblivious to this, a vector is a vector from current or previous frame irrespective of same or new camera (and that is why we are able to track across cameras)

Is this correct?

The tracker than keeps the vector features in memory and anyone to enter the frame would not only be compared to previous frame but also to the tracker history? So if someone left the camera but returned 30 minutes later they might be in the tracker memory and then receive old ID?

There might also be an issue in that vectors might be incorrect for a frame as the cosine (due to some occlusion or lighting condition or angle) might not be similar and are therefore retained. Therefore we might have a lot of false vectors in the tracker which again would be used in calculations and slow the system down. However as both the tracker and previous frame are compared to (I am guessing this is the case) then the ID would be assigned based on the cosine distance from either, i.e. if the tracker vector (from a couple of frames ago) was more similar then the current object id is reassigned to the historic ID, if the previous frame is similar then the current ID is maintained.

And this is why we are seeing IDs "flip or revert" to previous numbers in part of the journey as the tracker vector was more similar to the new vector of the object. On top of this if the ID is flipped back to historic (from the tracker) then the incorrect ID's path is "fixed" or cleaned up and assigned to the correct ID.

Is all of the above a correct understanding of the process?

If all the above is a correct understanding of the process, then it makes sense that the demo halts after time, based on threshold of tracker as well as number of people in camera view for the day.

Would you recommend that we remove the tracker from memory, database each vector in each frame and then run the cosine distance process in parallel or "after hours" on each vector in the database for the day. Just as much work but independent of the main demo which would keep populating the database with vectors as it progresses. Would this seem like a viable solution to remove clearance threshold on tracker and remove limitations caused by above for high number of people/multiple cameras?

Apologies for the long response, needed to get clarity, thanks for your patience.

vsovrasov · ‎10-22-2021

Hi! The overall algorithm of the mtmc is described in the paper: https://arxiv.org/pdf/1906.01357.pdf

In most cases slow processing is caused by a large amount of hidden tracks at the detections->track and multicamera association steps.

Vladimir_Dudnik · ‎10-14-2021

@Duggy I readdress your question to relevant experts, hope they could provide some comments/explanation here soon

Regards,
Vladimir

IntelSupport · ‎11-03-2021

Hi Dug Mcgee,

This thread will no longer be monitored since we have provided the information needed. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie

Openvino multi target Reidentification threshold limitation that we need to overcome

Model Optimizer