Hi there -
I am trying to evaluate both the NCSV2 and the Google Accelerator on a custom tensorflow model. I am using the CIFAR dataset.
Both devices are connected to a freshly installed Raspberry Pi 3b.
I used quantization-aware training with an Tensorflow Estimator to build the eval graph. My estimator fn is detailed in the attached txt.
I saved the eval model of the Estimator with classifier.experimental_export_all_saved_models().
Then I froze the graph as detailed in the attached txt.
Interestingly, I was successful using TOCO to build a TFLite model. This TFlite model was then converted successfully with the online Google converter for the edge-tpu.
When I run this quantized model on the Google accelerator I have an inference time of 0.2s.
I used the model optimizer to convert the graph in FP_16 (DEBUG log attached) with the following cmd:
python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model frozen.pb --input model_input/input:0 --input_shape [1,32,32,3] --output softmax_tensor --data_type FP16 --log_level DEBUG
When I run this model on my Rpi3, the net.forward take 3s.
I tried first to run the same (with FP_32 outputs) on my Mac OS desktop. It runs in 0.15s.
The object detection example on both devices runs with no perf issue.
- Does it mean that the inference is not happening on the NCS but on the RPi3 CPU? How do I deep dive into what's really happening?
- Am I missing something in the model optimizer? Do I have to use a config.file or a pipeline config? (I don't see any unsupported operation in the attached log)
- Is my methodology correct to assess the inference time?
Any advice is very welcome!
The inference time does not seem correct. I would like to reproduce the issue, would it be possible to send me your code and model to test?
By the way, it doesn't look like there is an attachment on this post.