Data transmission in OpenVINO inference with FPGA

dai__sijie · ‎04-02-2019

I'm trying to perform image classification on an FPGA card with OpenVINO, and I'm using the classification_sample_async tool in the kit for the task.

I have succeeded in running the inference on both CPU and FPGA, but the speed on FPGA card is very slow, using more than triple time compared with running on my CPU.

The FPGA I'm using is Intel A10 FPGA, and the CPU is Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz.

The speed problem is suggested to be caused by too much interact between FPGA and memory. However, when I looked up the c++ and python code of classification_sample_async, I found the OpenVINO API seems only alows transfering one single batch of data at once.

I have also tried to increase batch size in my model, but the FPGA card only takes batch size on bigger than 64.

The source codes and key lines for c++ and python are
/opt/intel/computer_vision_sdk/deployment_tools/inference_engine/samples/classification_sample_async/main.cpp
> inferRequests[0].StartAsync();

and
/opt/intel/computer_vision_sdk/deployment_tools/inference_engine/samples/python_samples/classification_sample_async.py
> infer_request_handle = exec_net.start_async(request_id=0, inputs={input_blob: images})

These two lines transfer the collected batch of images to the FPGA handler and perform inference. If the batch size exceeds 64, the CPU handler is used instead.

Is there a better way to transfer a larger batch of data to FPGA at once? The PCI-E can transfer several GBs per second, but 64 images is only about several MBs.