Object Detection Latency | NCS1 vs. NCS2 | Raspberry Pi

Gilles__Brandon · ‎01-06-2019

Hi there,

So I noticed that Object Detection using NCS2 + OpenVINO + Raspberry Pi seems to have a significantly higher latency than NCS1 + NCSDK + Raspberry Pi.

Here are some examples:

NCS1 (synchronous): https://photos.app.goo.gl/ejezgpvmMz7U53Qu9
NCS2 (asynchronous): https://photos.app.goo.gl/BExZNpWVaTvLA3wn8

At first I thought it was a synchronous/asynchronous difference, but it's significantly more frames than that. It's on the order of 1-second latency on the NCS2, whereas the NCS1 is very little (a frame at most).

I'm using PINTO0309's (awesome) Github to get up/running on the Pi to run these examples, here:
https://github.com/PINTO0309/MobileNet-SSD-RealSense

Anyone have ideas on how to lower the latency on the OpenVINO object detection on the NCS2? It seems to me that there must be some queue that's backing up on the NCS2, so that there's this significant delay.

Thanks in advance,

Brandon

RTasa · ‎01-07-2019

I have not looked at the code but was thinking the delay is caused by inability of the NCS2 to inference frames at the speed of the input so you create a backup and a delay. Have you tried sending it 1 frame per second? Thats close to its ability to process. Slow I know.

Gilles__Brandon · ‎01-07-2019

Yes, you're probably right. It was reduced, but didn't see a benefit immediately. I think how the queue is being set up is the crux of it, the behavior should be set up to drop latent frames... but I don't think that's being done properly.

When implementing something similar in CoreML on iOS, I set it up to drop all latent frames, so that there was no lag (this was in hand-off between YOLOv3 and Vision Tracking), so that any/all frames don't get queued, and just get thrown away. This prevents any additional lag from queueing. I dumped the example code (most of which is from hollance, here, who's awesome).

So what I probably need to do is figure out how to do the equivalent setup of below, on the NCS2 in OpenVINO. On NCS1/NCSDK it seems to be done by default, so that the latency is lower/non-existent.

Best,

Brandon

extension VideoCapture: AVCaptureVideoDataOutputSampleBufferDelegate {

  public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

    // Because lowering the capture device's FPS looks ugly in the preview,

    // we capture at full speed but only call the delegate at its desired

    // framerate.  //This also allows the frames not used by YOLO to be used by Vision Tracking

    let timestamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)

    let deltaTime = timestamp - lastTimestamp

    if deltaTime >= CMTimeMake(value: 1, timescale: Int32(fps)) { // This if statement only sends the frame to YOLO if enough time has passed, right now it's called 5 times a second (set by the fps field)

      lastTimestamp = timestamp

      let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)

      delegate?.videoCapture(self, didCaptureVideoFrame: imageBuffer, timestamp: timestamp)

    } else {  // BLG.  This sends the frames that YOLO drops to Vision processing.

        let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)

        delegate?.trackCapture(self, didCaptureVideoFrame: imageBuffer)

    }

  }



  public func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

    //print("dropped frame")

  }