Hi Angel,

Yeh__Angel · ‎01-13-2020

intel model name : Intel® Xeon® E-2226GE CPU

iGPU : Intel® HD Graphics P630

when I run face_recognition_demo in your official python demos, and I use mobilenet v2 model on video with 15 fps and the image dimension is 1920*1080

I run the command below:(GPU)

python3 ./face_recognition_demo.py -i /home/data/video/v1.avi -o /home/data/video/output.avi -m_fd /home/data/pretrain_models/face-detection-retail-0005.xml -m_lm /home/data/pretrain_models/landmarks-regression-retail-0009.xml -m_reid /home/data/pretrain_models/face-reidentification-retail-0095.xml -l /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so --verbose -fg "/home/data/face_gallery" -d_fd GPU -d_lm GPU -d_reid GPU -pc

Checking the per-layer performance yields this:

performance counts:

[ DEBUG ] 2020-01-14 18:24:51,360 Frame: 457/9998, detections: 1, frame time: 0.053s, fps: 19.0
[ INFO ] 2020-01-14 18:24:51,360 Performance stats:
[ INFO ] 2020-01-14 18:24:51,360 {'face_detector': [{'332': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_to_bfyx_f16', 'layer_type': 'Convolution', 'real_time': 200, 'cpu_time': 4, 'execution_index': 2}, '334': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 75}, '335': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16_depthwise', 'layer_type': 'Convolution', 'real_time': 162, 'cpu_time': 4, 'execution_index': 3}, '335_reorder_0': {'status': 'EXECUTED', 'exec_type': 'reorder_data_fast_b1', 'layer_type': 'reorder', 'real_time': 283, 'cpu_time': 4, 'execution_index': 4}, '337': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 76}, '338': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_os_iyx_osv16', 'layer_type': 'Convolution', 'real_time': 139, 'cpu_time': 4, 'execution_index': 5}, '340': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_os_iyx_osv16', 'layer_type': 'Convolution', 'real_time': 690, 'cpu_time': 4, 'execution_index': 6}, '340_reorder_5': {'status': 'EXECUTED', 'exec_type': 'reorder_data_fast_b1', 'layer_type': 'reorder', 'real_time': 3679, 'cpu_time': 4, 'execution_index': 7}, '342': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 77}, '343': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16_depthwise', 'layer_type': 'Convolution', 'real_time': 405, 'cpu_time': 4, 'execution_index': 8}, '345': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 78}, '346': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16', 'layer_type': 'Convolution', 'real_time': 105, 'cpu_time': 4, 'execution_index': 9}, '348': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16', 'layer_type': 'Convolution', 'real_time': 166, 'cpu_time': 4, 'execution_index': 10}, '350': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 79}, '351': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16_depthwise', 'layer_type': 'Convolution', 'real_time': 123, 'cpu_time': 4, 'execution_index': 11}, '353': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 80}, '354': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16', 'layer_type': 'Convolution', 'real_time': 151, 'cpu_time': 4, 'execution_index': 12}, '356': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'Eltwise', 'real_time': 0, 'cpu_time': 0, 'execution_index': 81}, '357': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16', 'layer_type': 'Convolution', 'real_time': 161, 'cpu_time': 4, 'execution_index': 13}, '359': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 82}, '360': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16_depthwise', 'layer_type': 'Convolution', 'real_time': 158, 'cpu_time': 4, 'execution_index': 14}, '362': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 83}, '363': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16', 'layer_type': 'Convolution', 'real_time': 48, 'cpu_time': 4, 'execution_index': 15}, '365': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16_1x1', 'layer_type': 'Convolution', 'real_time': 79, 'cpu_time': 4, 'execution_index': 16}, '367': {'status': 'OPTIMIZED_OUT', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 84}, '368': {'status': 'EXECUTED', 'exec_type': 'convolution_gpu_bfyx_f16_depthwise', 'layer_type': 'Convolution', 'real_time': 40, 'cpu_time': 4, 'execution_index': 17},

and I also run CPU mode:

python3 ./face_recognition_demo.py -i /home/data/video/v1.avi -o /home/data/video/output.avi -m_fd /home/data/pretrain_models/face-detection-retail-0005.xml -m_lm /home/data/pretrain_models/landmarks-regression-retail-0009.xml -m_reid /home/data/pretrain_models/face-reidentification-retail-0095.xml -l /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so --verbose -fg "/home/data/face_gallery" -pc

[ DEBUG ] 2020-01-14 18:32:24,523 Frame: 329/9998, detections: 0, frame time: 0.042s, fps: 24.1
[ INFO ] 2020-01-14 18:32:24,523 Performance stats:
[ INFO ] 2020-01-14 18:32:24,523 {'face_detector': [{'332': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_FP32', 'layer_type': 'Convolution', 'real_time': 155, 'cpu_time': 155, 'execution_index': 2}, '334': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 3}, '335': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_dw_FP32', 'layer_type': 'Convolution', 'real_time': 81, 'cpu_time': 81, 'execution_index': 4}, '337': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 5}, '338': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_1x1_FP32', 'layer_type': 'Convolution', 'real_time': 100, 'cpu_time': 100, 'execution_index': 6}, '340': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_1x1_FP32', 'layer_type': 'Convolution', 'real_time': 138, 'cpu_time': 138, 'execution_index': 7}, '342': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 8}, '343': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'Convolution', 'real_time': 0, 'cpu_time': 0, 'execution_index': 9}, '345': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 10}, '346': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_1x1_FP32', 'layer_type': 'Convolution', 'real_time': 33, 'cpu_time': 33, 'execution_index': 11}, '348': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_1x1_FP32', 'layer_type': 'Convolution', 'real_time': 50, 'cpu_time': 50, 'execution_index': 12}, '350': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 13}, '351': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_dw_FP32', 'layer_type': 'Convolution', 'real_time': 47, 'cpu_time': 47, 'execution_index': 14}, '353': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 15}, '354': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_1x1_FP32', 'layer_type': 'Convolution', 'real_time': 69, 'cpu_time': 69, 'execution_index': 16}, '356': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'Eltwise', 'real_time': 0, 'cpu_time': 0, 'execution_index': 17}, '357': {'status': 'EXECUTED', 'exec_type': 'jit_avx2_1x1_FP32', 'layer_type': 'Convolution', 'real_time': 50, 'cpu_time': 50, 'execution_index': 18}, '359': {'status': 'NOT_RUN', 'exec_type': 'undef', 'layer_type': 'ReLU', 'real_time': 0, 'cpu_time': 0, 'execution_index': 19}, '360':

and the table is the result of cpu and gpu option:

CPU GPU

FPS 40~50 25~33

CPU usage (by top) 100% 100%

GPU usage (by intel_gpu_top) 1% 35%

My question are:

Q1: no matter how I use -d CPU or -d GPU, the cpu usage almost 99~100% (each core)
Q2: that Is not reasonable the GPU FPS lower than CPU FPS, and I don't know why?

Thank you very much !!

David_C_Intel · ‎01-23-2020

Hi Angel,

Thanks for reaching out.

This issue can be solved by setting the GPU plugin config key KEY_CLDNN_PLUGIN_THROTTLE to lower value 1. The plugin configuration parameters need to be set before calling the IE LoadNetwork.

#include <cldnn/cldnn_config.hpp>

ie.SetConfig({ { CLDNNConfigParams::KEY_CLDNN_PLUGIN_THROTTLE, "1" } });

You can check this documentation for additional information.

Best regards,

David

100% CPU usage when using OpenVINO GPU inference in face recognition