Solved: Re: Which way for running people tracking in a D415 and extract X,Y,Z coordinates of the aproximate center of the moving pixels ?

DOliv12 · ‎01-21-2019

I'm running a D415 in a UP2 board (Ubuntu 16.04 LTS). And it seems to be operating perfectly... Is there ways to extract such coordinates from the viewer or any example code i can start with ?

MartyG · ‎01-27-2019

Using some form of object detection may be inevitable. Not because you want to recognize the object, but because it is a practical way of tracking the centers of multiple points of interest instead of just getting a single center coordinate of the overall image. You would not necessarily have to train it to recognize every kind of object, just to learn roughly what a human looks like.

If you wish to try your velocity idea though, I imagine that to calculate the velocity of pixels, you would have to continuously measure the coordinates of the pixels over time and track the coordinates that are measured to be changing their position above a certain rate per second (their velocity). this could be quite processing-intensive unless you could narrow down which areas of pixels on the image were able to be analyzed.

If you think that people are likely to be in one section of the image more than others though, you can define a Region Of Interest (ROI). In these areas, if auto-exposure is enabled then the SDK will average the intensity of all the pixels inside a ROI and try to maintain this value at a predefined Setpoint.

The way you describe pixels moving in a group and your willingness to use OpenCV would make OpenCV's built-in 'pedestrian detection' algorithm an option to consider. It has a pre-trained HOG (Histrogram of Oriented Gradients) model. There is a tutorial for it in the link below and a guide to HOG.

https://www.pyimagesearch.com/2015/11/09/pedestrian-detection-opencv/

https://www.pyimagesearch.com/2014/11/10/histogram-oriented-gradients-object-detection/

View solution in original post

MartyG · ‎01-21-2019

The best body tracking system currently available that is compatible with D415 is the Nuitrack SDK.

https://www.youtube.com/watch?v=gMPtV4NXtUo

Nuitrack also provides a program that can publish tracking information to the ROS vision software.

https://github.com/shinselrobots/nuitrack_body_tracker

If you would like to go deeper into the subject of body tracking with RealSense, Intel recently published an online seminar video on the subject.

https://realsense.intel.com/deep-learning-for-vr-ar/

DOliv12 · ‎01-22-2019

Hi Marty

Thanks for the quick response and suggestions !

I did already watched this recorded presentation of Philip Krejov. Very interesting indeed.

Also, Nuitrack is really a very useful API for skeleton tracking. It's impressive.

However, my intention is something (i think...) much simpler: i'm not interested in the skeleton movements, poses... but rather just the coordinate of the center of each person captured by the camera. I need those coordinates to be realigned to a model (global) x,y,z coordinates.

I think that there's a function in the SDK 2.0 that does this right ?

But since i'm not that fluent in C++ i'm wondering if there's a sample code that could track people x,y,z coordinates and make this available in a XML file or something alike... (?)

MartyG · ‎01-22-2019

It sounds as though the kind of application you are seeking to make is similar to the principles of the SDK's object detection sample program. In this sample, a highlighter box is placed around a recognized object or creature (e.g a dog). In the bottom corner of the application, X and Y coordinates are displayed. In the center of the application's window, the distance (Z) is displayed.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/opencv/dnn

If you want to get the coordinates of the center of a moving person rather than just the center of the overall image, object detection seems like a possible way forward.

If you would prefer a C++ based solution for getting the world coordinates, the post linked to below discusses the subject in detail and provides scripting.

https://github.com/IntelRealSense/librealsense/issues/1904

DOliv12 · ‎01-22-2019

Yes. I think this is the way to go...

Is this the one that can catch more than one element at a time ?

In my case i need to track as much people that enter into the field of view of the RealSense...

Again, thanks for your prompt and technically accurate responses Marty !

MartyG · ‎01-23-2019

Apologies for the delay in responding, I was not sent an email notification of your new comment.

During researching another case today that required multi-person tracking, I came across a two-person demo video using the Nuitrack SDK.

https://www.youtube.com/watch?v=HOm0-7qL5hk

DOliv12 · ‎01-23-2019

No problem Marty !

Wow.. this Nuitrack is really awesome ...

But returning to my situation (of just tracking the center coordinates of a person captured by the camera...):

if i use the object / person tracking incorporated in the RS SDK 2.0... can track multiple persons simultaneously ? If so, which is the limit i should consider in terms of number of people tracked ?

And also the maximum distance from the camera ... ?

Again... thank you for your clarifications !

MartyG · ‎01-23-2019

The RealSense 400 Series camera models can depth-sense up to 10 meters, though depth measuring accuracy starts to drift noticably after around 3 meters from the camera.

I would say that the number of people that can be tracked will be dependent on the size of the camere's field of view (i.e how many people you can fit into one camera's view). For tracking multiple people, the D435 may be a better option as it has a wider field of view than the D415 model.

You can though expand the total size of the field of view by placing multiple cameras in an arrangement where their views overlap. At an Intel event in January 2018 at the Sundance festival, Intel had a demo booth where they could view three people at once with a four camera arrangement of D435s hardware-synched together that gave 180 degree coverage. Doubling the number of cameras to 8 would give 360 degree coverage.

https://realsense.intel.com/intel-realsense-volumetric-capture/

360 degrees has been done with just six cameras, but the less cameras you use, the more potential there is for blind-spot areas in the data. Conversely, the more cameras you add to the arrangement, the more robust the captured data is because blind spots are minimized and there is redundancy in the data due to more than one camera covering the same area.

Multiple cameras can be connected to a single PC as long as you have enough USB ports. You can connect up to 5 USB hubs in a chain on one PC (a setup known as '5 deep') but the performance will be sub-optimal due to the ports on the hub sharing USB controller hardware. For optimum performance, the ideal is to connect the cameras directly to USB ports on the PC, as each port should then have its own dedicated USB controller.

Intel's 2018 'NUC 8 VR' mini-PC is one of the best PC models for this, due to its powerful spec, very small size and large number of USB 3 ports.

https://forums.intel.com/s/question/0D50P0000490UxxSAE/the-new-nuc-8-mini-pc-and-its-multiple-usb-30-ports-suitable-for-multiple-realsense-cameras?language=en_US

For Intel's Sundance demo, they dedicated one PC to each of the four D435 cameras and automatically sent the captured data to a fifth PC foir post-processing.

DOliv12 · ‎01-23-2019

Thanks for this very comprehensive answer Marty !

It makes clear about the coverage and simultaneous people tracking issue.

But just checking if i got the point: by using the SDK 2.0 i may use a function that could help me tracking people ... either using open-cv wrapper or directly using SDK ... which function should be the most appropriate considering that i'm tracking just one single coordinate of the movement of people ?

And reading the https://github.com/IntelRealSense/librealsense/issues/1904

discussion it was not clear for me which was the final solution for getting the world coordinates.

MartyG · ‎01-23-2019

Getting world coordinates in the RealSense SDK 2.0 is usually done using the instruction rs2_deproject_pixel_to_point

The link below has a discussion of this instruction.

https://github.com/IntelRealSense/librealsense/issues/1413

I believe this is not what you really need though. You want to get a coordinate at the center of a person, and that is always changing if the person is moving around.

For that reason, the object recognition examples, which put a general bounding box around detected objects / people that may be moving, are probably the best solution. Perhaps you could get the coordinates of the center of the bounding box as your coordinate.

https://github.com/IntelRealSense/librealsense/issues/2016

DOliv12 · ‎01-27-2019

Thanks Marty.

But reviewing all these possibilities... i'm not really interested the recognition of what is the object (the classification...) since in 99% if not 100% of the cases, every group of pixels moving will represent one or more people walking around. So instead of investing time processing a classification to check if it's a person, a dog, or any other thing, it would be enough for me to get pixels groups moving in the same velocity and consider a person and know the center coordinates x,y,z of those groups. What would you suggest for that ? Does the OpenCV wrapper have something (some algorithm... ) that could do that (group pixels with the approximate similar depth and same moving velocity ? Or maybe the SDK directly ?

MartyG · ‎01-27-2019

Using some form of object detection may be inevitable. Not because you want to recognize the object, but because it is a practical way of tracking the centers of multiple points of interest instead of just getting a single center coordinate of the overall image. You would not necessarily have to train it to recognize every kind of object, just to learn roughly what a human looks like.

If you wish to try your velocity idea though, I imagine that to calculate the velocity of pixels, you would have to continuously measure the coordinates of the pixels over time and track the coordinates that are measured to be changing their position above a certain rate per second (their velocity). this could be quite processing-intensive unless you could narrow down which areas of pixels on the image were able to be analyzed.

If you think that people are likely to be in one section of the image more than others though, you can define a Region Of Interest (ROI). In these areas, if auto-exposure is enabled then the SDK will average the intensity of all the pixels inside a ROI and try to maintain this value at a predefined Setpoint.

The way you describe pixels moving in a group and your willingness to use OpenCV would make OpenCV's built-in 'pedestrian detection' algorithm an option to consider. It has a pre-trained HOG (Histrogram of Oriented Gradients) model. There is a tutorial for it in the link below and a guide to HOG.

https://www.pyimagesearch.com/2015/11/09/pedestrian-detection-opencv/

https://www.pyimagesearch.com/2014/11/10/histogram-oriented-gradients-object-detection/

DOliv12 · ‎01-29-2019

Hi Marty

As usual you come up with some more great references and ideas ! Thank you again !

This OpenCV pedestrian detections seems to be very appropriate for people tracking usage...

Just to understand if i got the point... the idea would be to use this algorithm over the images got from RGB stream from my D415 and then, using the view plane coordinates got from this pedestrian algorithm extract the depth coordinate from the depth stream ... is this correct ?

May i implement everything under Python ? Using the RS wrapper and this OpenCV algorithm ?

I'm not fluent at all with C++ so i would really be more safe working with Python.

Thanks !