performance drop in async model

pkhan10 · ‎09-19-2019

Hello,
I ran python ssd example in async mode
but if i restart same example, there's observed performance drop..
throughput fps in the first time is 400-500
while if i re run same program again it drops to 150-200
I am running codes via jupyter notebook
Do i need to close any async process/thread manually? please suggest what to do...

I used following command to convert model
python mo_tf.py --input_model ../model_downloader/object_detection/common/ssd_mobilenet_v2_coco/tf/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb -o ../model_files/ssd_v2/ --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config ../model_downloader/object_detection/common/ssd_mobilenet_v2_coco/tf/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels

Roy_A_Intel · ‎09-19-2019

Hi Prateek

What options do you use to run the example, and are options similar in both runs?

Thanks

Roy

pkhan10 · ‎09-19-2019

I used same code in both cases
I copied this code form async example for ssd

As I restart notebook it loses speed.

this is prediction part of code...

threshold = .5
cv2.namedWindow("Detection Results",cv2.WINDOW_NORMAL)
write_video = False
if write_video:
    out = None

if labels:
    with open(labels, 'r') as f:
        labels_map = [x.strip() for x in f]
else:
    labels_map = None

cap = cv2.VideoCapture(channel)

cur_request_id = 0
next_request_id = 1

log.info("Starting inference in async mode...")
log.info("To switch between sync and async modes press Tab button")
log.info("To stop the demo execution press Esc button")
is_async_mode = True
render_time = 0
ret, frame = cap.read()

print("To close the application, press 'CTRL+C' or any key with focus on the output window")
fps = []
while cap.isOpened():
    fps =fps[-100:]
    if is_async_mode:
        ret, next_frame = cap.read()
    else:
        ret, frame = cap.read()
    if not ret:
        break
    initial_w = cap.get(3)
    initial_h = cap.get(4)
    # Main sync point:
    # in the truly Async mode we start the NEXT infer request, while waiting for the CURRENT to complete
    # in the regular mode we start the CURRENT request and immediately wait for it's completion
    inf_start = time.time()
    if is_async_mode:
        in_frame = cv2.resize(next_frame, (w, h))
        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        in_frame = in_frame.reshape((n, c, h, w))
        exec_net.start_async(request_id=next_request_id, inputs={input_blob: in_frame})
    else:
        in_frame = cv2.resize(frame, (w, h))
        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        in_frame = in_frame.reshape((n, c, h, w))
        exec_net.start_async(request_id=cur_request_id, inputs={input_blob: in_frame})
    if exec_net.requests[cur_request_id].wait(-1) == 0:
        
        

        # Parse detection results of the current request
        res = exec_net.requests[cur_request_id].outputs[out_blob]
        for obj in res[0][0]:
            # Draw only objects when probability more than specified threshold
            if obj[2] > threshold:
                xmin = int(obj[3] * initial_w)
                ymin = int(obj[4] * initial_h)
                xmax = int(obj[5] * initial_w)
                ymax = int(obj[6] * initial_h)
                class_id = int(obj[1])
                # Draw box and label\class_id
                #color = (min(class_id * 12.5, 255), min(class_id * 7, 255), min(class_id * 5, 255))
                color = colors_labels.loc[class_id-1]['colors']
                cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 2)
                det_label = labels_map[class_id-1] if labels_map else str(class_id)
                cv2.putText(frame, det_label + ' ' + str(round(obj[2] * 100, 1)) + ' %', (xmin, ymin - 7),
                            cv2.FONT_HERSHEY_COMPLEX, 0.6, color, 1)
        det_time = time.time() - inf_start
        # Draw performance stats
#         inf_time_message = "Inference time: N\A for async mode" if is_async_mode else \
#             "Inference time: {:.03f} ms".format(det_time*1000) +  "Equivalent FPS : "+ str(1/det_time)
        inf_time_message = "Inference time: {:.03f} ms".format(det_time*1000) +  "Equivalent FPS :{:.03f} ".format(np.mean(fps))   
        fps.append(1/det_time)
        render_time_message = "OpenCV rendering time: {:.3f} ms".format(render_time * 1000)
        async_mode_message = "Async mode is on. Processing request {}".format(cur_request_id) if is_async_mode else \
            "Async mode is off. Processing request {}".format(cur_request_id)

        cv2.putText(frame, inf_time_message, (15, 20), cv2.FONT_HERSHEY_COMPLEX, .6, (200, 10, 10), 2)
        cv2.putText(frame, render_time_message, (15, 50), cv2.FONT_HERSHEY_COMPLEX, 0.5, (10, 10, 200), 1)
        cv2.putText(frame, async_mode_message, (10, int(initial_h - 20)), cv2.FONT_HERSHEY_COMPLEX, 0.5,
                    (10, 10, 200), 1)

    #
    render_start = time.time()
    cv2.imshow("Detection Results", frame)
    render_end = time.time()
    render_time = render_end - render_start
    if write_video:
        if out is None:
            out = cv2.VideoWriter('../output_vids/'+datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S_SSD")+os.path.basename(channel)+'_out.mp4',cv2.VideoWriter_fourcc('M','J','P','G'), 20, (frame.shape[1],frame.shape[0]))
        out.write(frame)
    if is_async_mode:
        cur_request_id, next_request_id = next_request_id, cur_request_id
        frame = next_frame

    key = cv2.waitKey(1)
    if key == 27:
        break
    if (9 == key):
        is_async_mode = not is_async_mode
        log.info("Switched to {} mode".format("async" if is_async_mode else "sync"))

cv2.destroyAllWindows()
cap.release()
out.release()

Roy_A_Intel · ‎09-20-2019

Prateek, these two operations give you the FPS in async mode for the ssd example

det_time = inf_end - inf_start
inf_time_message = "Inference time: N\A for async mode" if is_async_mode else \
"Inference time: {:.3f} ms".format(det_time * 1000)

I don't see why you need the fps list and what the subsequent operations you are executing on it are for, could you explain this?

For reference, please refer to this demo for correct implementation

<openvino_dir>\inference_engine\demos\python_demos\object_detection_demo_ssd_async

Regards

Roy

Shubha_R_Intel · ‎09-20-2019

Dear khandelwal, prateek,

May I suggest the benchmark_app ? We provide it for situations precisely like yours. Also, let me say that by writing your code in Python and by using Jupyter Notebook, you are introducing variables which definitely impact performance negatively. OpenVino is Python compatible but Python itself is slow.

Kindly try the benchmark_app to performance analyze your application. It provides many knobs such as number of iterations, async on/off, batch size, etc...

Feel free to post your results here and we'll be glad to help.

Sincerely,

Shubha

pkhan10 · ‎09-22-2019

hey roy,...
i am taking mean of 100 fps values..
as fps varies a lot

hey shubha..
I understand python is slow..but how jupyter notebook contributes to that?
can you please explain?

Shubha_R_Intel · ‎09-27-2019

Dear khandelwal, prateek,

Well Jupyter Notebook runs on a Python kernel server (in your case) does it not ?

Thanks,

Shubha

pkhan10 · ‎09-29-2019

Hello shubha..
i will run it without jupyter..and
will report performance back to u

Shubha_R_Intel · ‎09-30-2019

Dear khandelwal, prateek,

Thanks. That would be great. But keep in mind, Python is slow. Even OpenVino developers recognize this. Most likely, Python is for experimentation not production.

Hope it helps,

Thanks,

Shubha