Some problems with half float in ncGraphQueueInferenceWithFifoElem()

idata · ‎10-08-2018

I use opencv resize a pic, then use a function 'float2half' convert float to half , when I use the processed tensor as input in ncGraphQueueInferenceWithFifoElem(), error occur 'ncFifoWriteElem：2570 input tensor length (540000) doesn't match expected value (1080000)'.

Give some help and advice pls and thanks!

The code as follows.

Mat preprocessed_image_mat;
preprocess_image(inputMat,preprocessed_image_mat);

//three values for each pixel in the image.
float_t tensor32[3];
unsigned short tensor16[NETWORK_IMAGE_HEIGHT* NETWORK_IMAGE_WIDTH* 3];

uint8_t* image_data_ptr = (uint8_t*)preprocessed_image_mat.data;
int chan = preprocessed_image_mat.channels();

int tensor_index = 0;
for (int row = 0; row < preprocessed_image_mat.rows; row++) {
    for (int col = 0; col < preprocessed_image_mat.cols; col++) {

        int pixel_start_index = row * (preprocessed_image_mat.cols + 0) * chan + col * chan; // TODO: don't hard code

        // assuming the image is in BGR format here
        uint8_t blue = image_data_ptr[pixel_start_index + 0];
        uint8_t green = image_data_ptr[pixel_start_index + 1];
        uint8_t red = image_data_ptr[pixel_start_index + 2];

        tensor32[0] = (float_t)blue;
        tensor32[1] = (float_t)green;
        tensor32[2] = (float_t)red;

        tensor16[tensor_index++] =  float2half(*((unsigned*)(&(tensor32[0]))));
        tensor16[tensor_index++] =  float2half(*((unsigned*)(&(tensor32[1]))));
        tensor16[tensor_index++] =  float2half(*((unsigned*)(&(tensor32[2]))));
  }
}      

unsigned int inputTensorLength = NETWORK_IMAGE_HEIGHT* NETWORK_IMAGE_WIDTH* 3 * sizeof(unsigned short);
// queue the inference to start, when its done the result will be placed on the output fifo             
retCode = ncGraphQueueInferenceWithFifoElem(graphHandlePtr, inFifoHandlePtr, outFifoHandlePtr, tensor16, &inputTensorLength, NULL);

idata · ‎10-10-2018

@curry_best You can use either FP16 or FP32 with fifos but the default is FP32. You can set the fifo data type to FP16 using fifo.set_option and setting the NC_FIFO_RW_DATA_TYPE to FP16 (https://movidius.github.io/ncsdk/ncapi/ncapi2/c_api/ncFifoOption_t.html).

In short, you should be able to just pass a FP32 tensor to ncGraphQueueInferenceWithFIfoElem() and it should work. See https://github.com/movidius/ncappzoo/blob/ncsdk2/caffe/AlexNet/cpp/run.cpp for a point of reference.

idata · ‎10-11-2018

@Tome_at_Intel

indeed,I can pass a FP32 tensor to ncGraphQueueInferenceWithFifoElem(), Thanks for your help! But, Could FP16 speed up the inference? if so, how much can it increase? thank you again, looking forward to your reply.

idata · ‎10-11-2018

@curry_best If you're using input images from a hard drive and resize the images and convert them all to FP16 before inference, the time from writing the input FIFO and reading from the output FIFO should be shorter if the FIFOs are FP16. However if you are using frames from a camera, doing this input image pre-processing beforehand won't be possible since the camera frames are coming in live one at a time.

idata · ‎10-12-2018

@Tome_at_Intel

Yes,I am using frames from a camera, and I find a function(which fp32 to fp16) in https://github.com/movidius/ncappzoo/blob/ncsdk2/apps/gender_age_lbp/cpp/gender_age_lbp.cpp#L417, What is the difference between this method and fifo.set_option?

best regards!

idata · ‎10-12-2018

@curry_best The lines of code you are referring to converts the input tensor from FP32 to FP16. The fifo.set_option() call changes a fifo option that includes the NC_FIFO_RW_OPTION_DATA_TYPE. This option can change your FIFO size to FP16. In short you have to match up your input tensor data type to your FIFO data type or else you will get an expected size mismatch like in your initial post. Hope this helps.

idata · ‎10-16-2018

@Tome_at_Intel

Thank you Tome, I have figure out the difference between the FIFO TYPE and input tensor type. now, I use ncGraphAllocateWithFifosEx() to set FIFO type to FP16, like this "ncGraphAllocateWithFifosEx(deviceHandle, graphHandle, graphBuffer, graphBufferLength,

&inputFIFO, NC_FIFO_HOST_WO, 2, NC_FIFO_FP16,

&outputFIFO, NC_FIFO_HOST_RO, 2, NC_FIFO_FP16);"

and fp16 function to set input tensor type to FP16. But,when I use ncGraphQueueInferenceWithFifoElem , I met a error “Segmentation fault (core dumped)”，I am sure successfully allocated graph and FIFOs.

Is there anything wrong with me here? looking forward to your reply.

idata · ‎10-23-2018

@Tome_at_Intel

I had solved this problem, unfortunately FP16 doesn't boost inference speed. Is it normal?

idata · ‎10-25-2018

@curry_best If you preprocess all of the input images and convert them to fp16, you'll save the NCSDK some time from converting the input from FP32 to FP16. Theoretically if all images are converted beforehand before any inferences are made, then the NCSDK won't have to do any conversion. It's likely a very small difference.