Hi, I realized that Intel provided some trained models of person re-id. I would like to ask if any of these can be provided:
1) Sample code (Python preferably) for inferencing using the provided "person-reidentification-retail-0079" model ?
2) A script for training if I plan to retrain the network using my own image dataset ?
** I also noticed that on the model's page, it is mentioned that, quote:
The net outputs a blob with shape: [1, 256, 1, 1] named descriptor, which can be compared with other descriptors using the Cosine distance.
This might be beyond my level, but any hints/guides/reading materials on how to use/manipulate the net's output is greatly appreciated.
you can use this model with the Crossroad Camera Sample, it is unfortunately only in C++.
We do not provide a script for re-training this model.
I'd be interested in a Python example as well, because I'd like to track objects.
Since this model doesn't have a script for retraining - I've been trying to figure out how to take a reid network that can be trained (several on github) and run it.
I went through the C++ code, and from what I can tell it basically does detections, then takes the results and sends them into tracker.cpp.
I'm struggling to grasp what tracker is doing - but from what I can tell it gets a detection, shapes it then sends it through the reid network to get a vector of the detection.
Then, it uses several tests to determine a confidence that it's the same object.
Trajectory being one, distance being another.
As far as I can tell, once the tracker is confident enough, it updates the tracked object and moves onto the next detection. Each tracked object has a timestamp, a vector, a trajectory, etc. On the next round of detections, it repeats the process.
What I'm wondering is if it would be possible to build a network where you take two 64x64 objects - combine them into a single 128x64, then build a network with weights and rules that first breaks the two images into vectors, then weighs the vectors in the next layer, then produces a confidence array, or even a single value.
Something like inputlayer -> 2 perceptron arrays (splitting inputs 50/50) -> weighted HL that combines the perceptron layer -> weighted HL that weighs the blend -> confidence array
I'd be very interested to hear if that's not how NNs work. It'd help me gauge my understanding of NN and the challenges with ReID.
Now, I have *zero* idea how to build this (yet), but based on my understanding I think such a network could be done. I think it'd have to have the first layer divided into two groups of neurons that are only weighted to half the inputs - and the next layers would be weighted for the all.
DLib has a cnn for face reid, but I've not looked at it. However, if DLib can do ReID with a CNN - it would be awesome if Intel contributed the ability to plug in MYRIAD processing for object ReID into the DLib library. That way, the highly optimized and power Dlib library could be used with a 2nd NCS to speed up computer vision at the edge.
Object detection is handy at the edge, but I've found tracking frame to frame and handling occlusion and overlapping of detections to be very difficult.
One project you might look at, Loke, is the SORT. (here: https://github.com/abewley/sort )
Overall, at the end of the day, ReID from frame to frame is computationally expensive, so I feel your pain.