Showing results for 
Search instead for 
Did you mean: 

Object Localization result does not provide depth information.

Hello everyone,


I've been trying to use localization feature:

- using recognition mode "PROPOSAL_ONLY".

- localization mechanism to be "LM_EDGE_BOXES"

and R200 is constantly giving me between

10 and 14 results for most of the frames.


None of the RecognizedObjectData is accompanied with

valid properties except 'roi'.

i.e. following properties are all filled with 0.

- boundingBox

- centerPos2D

- centerPos3D

- probability

(I guess 'label' is supposed to be always 0 in this mode)


Did I miss to set some property or to call some method

in order to make it work?


Because it is giving out so many candidates, it is very

important to have other information (especially depth

info contained in 'centerPos3D') to eliminate noise.


Could anybody help me how I can achieve such a goal?


# I tried to set ' SINGLE_RECOGNITION' mode and

# label recognition is working fine.

# (I'm not satisfied with the accuracy, though.)


Thank you very much for your help.


Tosh Satake


0 Kudos
4 Replies

Could you provide any screenshot and pic of real object? Thanks!



Here is the screen-shot of the test application.

  • It uses PROPOSAL_ONLY mode with LM_EDGE_BOXES method.
    • as I already described, if I change the code to use SINGLE_RECOGNITION mode, it can recognize mug / chair / keyboard and such.
  • It hardly localize real 3D object, but it does pick up on-screen object such as windows and icons on my PC monitor.
  • In either way, only ROI has values and other properties 2D/3D results are empty (filled with zeros).
  • It is typical to have more than 10 results on every frame.

Thank you



In PROPOSAL_ONLY mode the SDK is only find a regions of interest in the image without classifier- which means that there is no interpretation like  “this is a chair” or “this is a sofa” and so on. Hence all irrelevant fields are with zeroes. Today there is no support in 3d objects in OR MW.

In SINGLE_RECOGNITION mode all the picture (or ROI) assumed to contain one main object which is catch the major area of the picture (or ROI) and classified as  one object.


Thank you for the response. Yes, you area correct, I do not need label classification results. I only need to detect the object locations, because RealSense does not provide a way to re-train the classifier. I only need to extract ROIs so that I can run separate classifier later in the process. The problem is that it does not give me the depth location, I want to make use of depth information so that I can easily limit the results from certain range of distance from the camera, also real size of the detected object is essential information as well. SDK document says that R200 (edge boxes algorithm) utilizes depth information to locate objects, so I assumed the depth information should be contained in the detection results. My expectation is that, because depth information is the key strength of RealSense, user should be allowed to make full use of it. Is it planned to put 2D/3D information with the result object? Thanks