I'm trying to test my application with recorded data. However, when I run it twice, I get different results from the realsense SDK's face module.
Is this a bug? How can I make it deterministic?
This is the code:
#include <pxcsensemanager.h> #include <iostream> int main(int argc, char *argv[]) { // Create the SenseManager instance PXCSenseManager *sm=PXCSenseManager::CreateInstance(); std::cout << sm->QueryCaptureManager()->SetFileName(L"C:/Users/rggjan/Videos/groundtruth/hard.rssdk", false) << std::endl; // Enable stream and Initialize sm->EnableStream(PXCCapture::STREAM_TYPE_COLOR, 0, 0); // Set realtime=true and pause=false sm->QueryCaptureManager()->SetRealtime(false); sm->QueryCaptureManager()->SetPause(false); // Enable face tracking std::cout << "Face: " << sm->EnableFace() << std::endl; // Initialize the pipeline std::cout <<"Init: " << sm->Init() << std::endl; // Stream data while (sm->AcquireFrame(true)>=PXC_STATUS_NO_ERROR) { // retrieve the face tracking results PXCFaceModule *face=sm->QueryFace(); if (face) { PXCFaceData *face_data = face->CreateOutput(); face_data->Update(); int num_faces = face_data->QueryNumberOfDetectedFaces(); if (num_faces > 0) { for (int i=0; i<num_faces; i++) { PXCFaceData::Face *face = face_data->QueryFaceByIndex(i); PXCFaceData::DetectionData *detection_data = face->QueryDetection(); float average_depth; bool success = detection_data->QueryFaceAverageDepth(&average_depth); std::cout << "depth: " << success << "->" << (success ? average_depth : 0) << std::endl; } } face_data->Release(); } // Resume next frame processing sm->ReleaseFrame(); } // Clean up sm->Release(); }
And two possible outputs:
Output 1:
0
Face: 0
Init: 0
depth: 1->478
depth: 1->460
depth: 1->458
depth: 0->0
depth: 0->0
depth: 0->0
depth: 1->450
depth: 0->0
depth: 1->451
depth: 0->0
depth: 0->0
depth: 0->0
depth: 0->0
depth: 1->453
depth: 1->454
depth: 1->435
depth: 1->442
depth: 1->470
depth: 1->446
...
Output 2:
0
Face: 0
Init: 0
depth: 0->0
depth: 0->0
depth: 1->454
depth: 0->0
depth: 0->0
depth: 0->0
depth: 0->0
depth: 1->451
depth: 1->453
depth: 1->453
depth: 0->0
depth: 0->0
depth: 1->453
depth: 1->454
depth: 1->435
depth: 1->442
depth: 1->470
depth: 1->446
depth: 1->469
...
You can see that on exactly the same frame (color + depth, loaded from disk), sometimes it detects a face and sometimes it doesn't... Could there be some algorithms that run a random number generator that depends on the time or something?
Link Copied
Hi Jan,
Many machine learning algorithms are not deterministic. This is because the search space is generally very large, so they usually sub sample the space. This is similar to how for example particle filters works.
Only the more basic approaches to object detection are deterministic, but they usually don't work very well in real life scenarios.
So, you should deal with this in your application. Also, relying on a single detection as 100% true may not be the best route as well. Consider at least a few frames to make sure there is a positive detection.
@samontab: Thanks for the hints.
Yes, it makes sense to do that, but non-deterministic approaches usually use some randomness, and that randomness cann be faked (fixed random seed with deterministic random number generator) to get reproducable results.
And I can actually deal with this in the application fine. However, when I want to test the application (and compare the output to some groundtruth), I would still like to have reproducable test runs, so I know if my algorithm got better or if the intel SDK just did a better detection by chance...
You can always run your experiment multiple times to reduce those factors.
Yes, thats true. But it just makes everything more complicated. I'm actually thinking about just running it once, saving all the output of the SDK in my own file format and then reading from there...
Its just a bit disappointing that this functionality does not seem to be provided by the SDK...
Hi Jan,
You can try to modify your code to also output frame timestamps to make sure the frames are really the same in both sequences (both color and depth). Actually, from your code, you should have same sequences. Can you also put your clip file to some shared location so we can check on our side?
@DMITRY: Thanks for the reply! The timestamps were actually different!
When I change the code to this:
#include <pxcsensemanager.h> #include <iostream> #include <string> #include <fstream> int main(int argc, char *argv[]) { // Create the SenseManager instance PXCSenseManager *sm=PXCSenseManager::CreateInstance(); std::cout << sm->QueryCaptureManager()->SetFileName(L"C:/Users/rggjan/Pictures/bright2.rssdk", false) << std::endl; // Enable stream and Initialize sm->EnableStream(PXCCapture::STREAM_TYPE_COLOR, 0, 0); sm->QueryCaptureManager()->SetRealtime(false); sm->QueryCaptureManager()->SetPause(true); // Enable face tracking std::cout << "Face: " << sm->EnableFace() << std::endl; // Initialize the pipeline std::cout <<"Init: " << sm->Init() << std::endl; // Stream data int index = 0; while (true) { // retrieve the face tracking results sm->QueryCaptureManager()->SetFrameByIndex(index); sm->FlushFrame(); if (sm->AcquireFrame(true)!=PXC_STATUS_NO_ERROR) break; PXCCapture::Sample *sample=sm->QuerySample(); PXCImage *image = sample->color; PXCImage::ImageInfo info = image->QueryInfo(); if (index == 0) { PXCImage::ImageData data; image->AcquireAccess(PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_RGB24, &data); // image planes are in data.planes[0-3] with pitch data.pitches[0-3] std::ofstream ofs; ofs.open("C:/Users/rggjan/tmp/output.ppm", std::ios::binary); ofs << "P6\n" << info.width << " " << info.height << "\n255\n"; for (int y=0; y<info.height; y++) { for (int x=0; x<info.width; x++) { for (int c=0; c<3; c++) { ofs << *(data.planes[0] + y*data.pitches[0] + x*3 + (2 - c)); } } } image->ReleaseAccess(&data); } std::cout << "Timestamp sample: " << image->QueryTimeStamp() << std::endl; PXCFaceModule *face=sm->QueryFace(); if (face) { PXCFaceData *face_data = face->CreateOutput(); face_data->Update(); int num_faces = face_data->QueryNumberOfDetectedFaces(); if (num_faces > 0) { for (int i=0; i<num_faces; i++) { PXCFaceData::Face *face = face_data->QueryFaceByIndex(i); PXCFaceData::DetectionData *detection_data = face->QueryDetection(); float average_depth; bool success = detection_data->QueryFaceAverageDepth(&average_depth) != 0; std::cout << "depth frame " << face_data->QueryFrameTimestamp() << ": " << (success ? std::string("success ") + std::to_string(average_depth) : std::string("fail")) << std::endl; } } face_data->Release(); } sm->ReleaseFrame(); index++; } sm->Release(); }
It seems to work fine, and produce same timestamps and results.
The change is setting pause to "true" and using "sm->QueryCaptureManager()->SetFrameByIndex(index);" with increasing indices manually.
Shouldn't the same happen (playing all frames one by one without skipping) if I set SetRealtime to false ... ?
Yes, with SetRealtime(false) playback should be consistent. Please fill the following details:
SDK Version
DCM Version
CPU & Intel Graphics driver version
Also, if possible please upload your clip file.
I attached the output of your sdk_info tool, which should contain all version information you need.
The sequence is here:
https://drive.google.com/file/d/0B0mZY8Aj3NgwNVliYjVGbWhNalE/view?usp=sharing
This is the current Code I used:
#include <pxcsensemanager.h> #include <iostream> #include <string> #include <fstream> int main(int argc, char *argv[]) { // Create the SenseManager instance PXCSenseManager *sm=PXCSenseManager::CreateInstance(); std::cout << sm->QueryCaptureManager()->SetFileName(L"C:/Users/rggjan/Pictures/bright2.rssdk", false) << std::endl; // Enable stream and Initialize sm->EnableStream(PXCCapture::STREAM_TYPE_COLOR, 0, 0); sm->QueryCaptureManager()->SetRealtime(false); sm->QueryCaptureManager()->SetPause(false); // Enable face tracking std::cout << "Face: " << sm->EnableFace() << std::endl; // Initialize the pipeline std::cout <<"Init: " << sm->Init() << std::endl; // Stream data int index = 0; for (int i=0; i<20; i++) { // retrieve the face tracking results if (sm->AcquireFrame(true)!=PXC_STATUS_NO_ERROR) break; PXCCapture::Sample *sample=sm->QuerySample(); PXCImage *image = sample->color; PXCImage::ImageInfo info = image->QueryInfo(); std::cout << "Timestamp sample: " << image->QueryTimeStamp() << std::endl; PXCFaceModule *face=sm->QueryFace(); if (face) { PXCFaceData *face_data = face->CreateOutput(); face_data->Update(); int num_faces = face_data->QueryNumberOfDetectedFaces(); if (num_faces > 0) { for (int i=0; i<num_faces; i++) { PXCFaceData::Face *face = face_data->QueryFaceByIndex(i); PXCFaceData::DetectionData *detection_data = face->QueryDetection(); float average_depth; bool success = detection_data->QueryFaceAverageDepth(&average_depth) != 0; std::cout << "depth frame " << face_data->QueryFrameTimestamp() << ": " << (success ? std::string("success ") + std::to_string(average_depth) : std::string("fail")) << std::endl; } } face_data->Release(); } sm->ReleaseFrame(); index++; } sm->Release(); }
And two different outputs:
Run 1:
0
Face: 0
Init: 0
Timestamp sample: 130945613513023083
depth frame 130945613513023083: success 672.000000
Timestamp sample: 130945613523018033
depth frame 130945613523018033: success 604.000000
Timestamp sample: 130945613523351198
depth frame 130945613523351198: success 589.000000
Timestamp sample: 130945613523684363
depth frame 130945613523684363: success 578.000000
Timestamp sample: 130945613524017528
depth frame 130945613524017528: success 630.000000
Timestamp sample: 130945613525017023
depth frame 130945613525017023: success 605.000000
Timestamp sample: 130945613525350188
depth frame 130945613525350188: success 633.000000
Timestamp sample: 130945613525683353
depth frame 130945613525683353: success 636.000000
Timestamp sample: 130945613526016518
depth frame 130945613526016518: success 614.000000
Timestamp sample: 130945613526349683
depth frame 130945613526349683: success 601.000000
Timestamp sample: 130945613526682848
depth frame 130945613526682848: success 592.000000
Timestamp sample: 130945613527016013
depth frame 130945613527016013: success 587.000000
Timestamp sample: 130945613527349178
depth frame 130945613527349178: success 587.000000
Timestamp sample: 130945613527682343
depth frame 130945613527682343: success 567.000000
Timestamp sample: 130945613528015508
depth frame 130945613528015508: success 554.000000
Timestamp sample: 130945613528348673
depth frame 130945613528348673: success 548.000000
Timestamp sample: 130945613528681838
depth frame 130945613528681838: success 539.000000
Timestamp sample: 130945613529015003
depth frame 130945613529015003: success 533.000000
Timestamp sample: 130945613529348168
depth frame 130945613529348168: success 525.000000
Timestamp sample: 130945613529681333
depth frame 130945613529681333: success 524.000000
Run 2:
0
Face: 0
Init: 0
Timestamp sample: 130945613513023083
depth frame 130945613513023083: success 672.000000
Timestamp sample: 130945613523018033
depth frame 130945613523018033: success 589.000000
Timestamp sample: 130945613523351198
depth frame 130945613523351198: success 578.000000
Timestamp sample: 130945613523684363
depth frame 130945613523684363: success 630.000000
Timestamp sample: 130945613524017528
depth frame 130945613524017528: success 618.000000
Timestamp sample: 130945613525017023
depth frame 130945613525017023: success 605.000000
Timestamp sample: 130945613525350188
depth frame 130945613525350188: success 633.000000
Timestamp sample: 130945613525683353
depth frame 130945613525683353: success 636.000000
Timestamp sample: 130945613526016518
depth frame 130945613526016518: success 614.000000
Timestamp sample: 130945613526349683
depth frame 130945613526349683: success 601.000000
Timestamp sample: 130945613526682848
depth frame 130945613526682848: success 592.000000
Timestamp sample: 130945613527016013
depth frame 130945613527016013: success 587.000000
Timestamp sample: 130945613527349178
depth frame 130945613527349178: success 587.000000
Timestamp sample: 130945613527682343
depth frame 130945613527682343: success 567.000000
Timestamp sample: 130945613528015508
depth frame 130945613528015508: success 554.000000
Timestamp sample: 130945613528348673
depth frame 130945613528348673: success 548.000000
Timestamp sample: 130945613528681838
depth frame 130945613528681838: success 539.000000
Timestamp sample: 130945613529015003
depth frame 130945613529015003: success 533.000000
Timestamp sample: 130945613529348168
depth frame 130945613529348168: success 525.000000
Timestamp sample: 130945613529681333
depth frame 130945613529681333: success 524.000000
Hi Jan,
We found the origin of the issue. You need to remove line
sm->QueryCaptureManager()->SetPause(false);
from your original code. In current SDK release it interferes with
sm->QueryCaptureManager()->SetRealtime(false);
and enables Realtime mode for playback. We will fix this either in documentation or in SDK for future releases.
Good to know, thanks for looking into this!
For more complete information about compiler optimizations, see our Optimization Notice.