- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am deploying a VGG-based segmentation model using OpenVINO R3. I have been able to convert the model from Tensorflow into OpenVINO's IR, and run inference on CPU. When I tried executing inference using heterogeneous execution on an Arria 10 Dev Kit and CPU, the results look very different from both the CPU and original Tensorflow output. Upon on closer inspection, the output blob returned by the FPGA inference contains mostly "nan". What could be causing the problem?
FPGA Bitstream tried: 2-0-1_A10DK_FP16_VGG.aocx
Link to the tensorflow frozen model, OpenVINO IR, console outputs, and segmentation visualizations: https://drive.google.com/open?id=1F_Y0f8A8khP98mWx9VzrUuoRTnXWsgJ3
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
I'll take a look and try to reproduce the behavior that you are seeing to get an understanding of the issue.
Kind Regards,
Monique Jones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Monique,
I just realized that I uploaded the wrong console dump in the text files. In case you'd like to refer to them, please redownload nD_nIN_256x128_cpu.txt and nD_nIN_256x128_fpga.txt from the link in my first post.
Regards,
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
James,
Can you send me the command you used to convert the model?
Kind Regards,
Monique Jones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Monique,
Here is the bash script I used for model conversion:
#!/bin/bash python3 ${INTEL_CVSDK_DIR}/deployment_tools/model_optimizer/mo_tf.py \ --input_model lane_segm_256x128_noDropout_noIN.pb \ --model_name lane_segm_256x128_noDropout_noIN \ --data_type FP32 \ --output_dir . \ --input real_images \ --input_shape [1,128,256,3] \ --output FCN/upconv5/Conv2d_transpose/Relu \ --reverse_input_channels \ --log_level DEBUG
Regards,
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What bitstream have you programmed on the arria 10 to run this model?
Also, you should use --data_type FP16 instead of FP32 if you are running on FPGA.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Monique,
I used 2-0-1_A10DK_FP16_VGG.aocx.
I actually have tried converting the model to FP16 IR and running the model under the heterogeneous mode. However, the MKLDNN plugin doesn't seem to support FP16.
Moreover, I've successfully executed inference on a FP32 VGG-16 model converted from Caffe using the same FPGA bitstream (via the validation app), so maybe the data type incompatibility is not to blame?
Regards,
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
I programmed the VGG fp16 bitstream and successfully ran the segmentation sample with your model converted to FP16 precision(--data_type FP16 during Model Optimizer conversion) with Hetero plugin targeting FPGA first and falling back on CPU as you can see below:
./segmentation_sample -i /opt/intel/computer_vision_sdk/deployment_tools/demo/car.png -m ~/lane_segm_256x128_noDropout_noIN.xml -d HETERO:FPGA,CPU
When layers fall back to CPU they are quantized to FP32 precision dynamically. Can you upload your original image to the google drive directory so that I can see if I get the same results on CPU as I do with HETERO: FPGA,CPU?
Kind Regards,
Monique Jones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Monique,
Thanks for the help. I actually just solved the problem. Similar to the mistake made by this poster: https://software.intel.com/en-us/forums/computer-vision/topic/798732, I overlooked the fact that my frozen model accepted normalized pixels around 0. After I normalized the pixels, the FPGA outputs mostly agreed with the CPU outputs.
Regards,
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
I'm having a similar problem. I have converted a TensorFlow model and run inference on both CPU and FPGA. The CPU results agreed with the original TensorFlow ones exactly (labels are exactly the same, and errors in probabilities are < 1e-5), but the FPGA results were quite different (more than 1% difference in labels, which is not acceptable in my case). I also tried another model on ImageNet data, the Top5 and Top1 rates dropped dramatically (from 96.5% and 82.7% to 89% and 68.9, respectively).
I have also tried to change data type from FP11 to FP16 in my FPGA, but the differences are negligible.
I'm wondering what precision you have finally achieved by saying "mostly agreed"? Are the labels completely same or the error rates acceptable?
On the other hand, the OpenVINO inference on CPU also requires scale and mean_values parameters when running model optimizer, does the inference on FPGA perform extra normalization?
Best regards,
Sijie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sijie,
When I said "mostly agreed", I meant the visualization of the semantic segmentation based on the FPGA's output looked mostly the same as the one produced by the FPGA. I acknowledge that this is nowhere as rigorous as computing some metrics on a value by value basis.
As for IR's precision, I used the default during the model conversion, which is FP32. This was nominally inconsistent with the precision supported by the FPGA, but on other models, the different this causes was not as big as I first observed on this particular model. The precisions I used for the input and output blobs are FP32 as well.
Finally, I believe the scale and mean_values parameters are optional when running the model optimizer. I believe those options add extra layers in order to perform input normalization shifting, which should be faster than relying on other methods, say OpenCV.
These are just based on my experience as a user. Maybe the Intel representatives can give you more tips.
Regards,
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James
Thanks a lot for the reply.
Maybe the batch normalization layer can also normalize the inputs.
I'm going to start another thread in the forum, thanks again for the help.
Best regards,
Sijie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hello,
how do you program your FPGA? IS IT CONNECTED TO THE PCIE? or just using JTAG cable?
thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page