- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So I compiled a caffe network with mvNCCompile, set the amount out of shave cores to 12 and tested the network running on the NCS in some python code. I timed it with time.clock() and was getting inference times on the order of 150ms/image. However, when I run the mvNCProfile tool on the same network, also with "-s 12" it's telling me that inference is taking 1280ms. I don't understand how there could be an order of magnitude difference between my python tests and the results from mvNCProfile.
The Python Code is something like this:
for test_image in inference_images:
try:
t0 = time.clock()
graph.LoadTensor(test_image.astype(np.float16), 'user object')
except:
print("Error Loading Tensor into Movidius Graph")
try:
output, userobj = graph.GetResult()
t1 = time.clock()
except:
print("Error getting result from NCS")
times.append(t1 - t0)
Printing times and the average of times yields:
[0.1364160000000001, 0.14857699999999996, 0.15344800000000003, 0.14042500000000002, 0.15225599999999995, 0.16213199999999972, 0.1494970000000002, 0.16063800000000006, 0.14944800000000003, 0.1511720000000003]
0.15040090000000003
According to the documentation these values are seconds represented with floats so I'm right around 150ms/image
However, here is what I see from mvNCProfile:
mvNCProfile v02.00, Copyright @ Movidius Ltd 2016
/usr/local/bin/ncsdk/Controllers/FileIO.py:52: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
"Consider reducing your data sizes for best performance\033[0m")
USB: Transferring Data...
Time to Execute : 1358.95 ms
USB: Myriad Execution Finished
Time to Execute : 1286.72 ms
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Network Summary
Detailed Per Layer Profile
Bandwidth time
Name MFLOPs (MB/s) (ms)
===============================================================================
0 data 0.0102011.9 0.005
1 conv1_1 311.0 298.2 15.556
2 conv1_2 6635.5 668.4 148.043
3 pool1 5.8 956.9 11.481
4 conv2_1 3317.8 419.5 59.264
5 conv2_2 6635.5 482.0 103.162
6 pool2 2.9 944.1 5.819
7 conv3_1 3317.8 168.7 76.599
8 conv3_2 6635.5 176.2 146.712
.
.
.
71 conv9_2_mbox_priorbox 0.0 20.4 0.024
72 mbox_conf_reshape 0.0 171.0 0.390
73 mbox_conf_softmax 0.1 526.9 0.126
74 mbox_conf_flatten 0.0 374.0 0.178
75 detection_out 0.0 38.3 1.738
Total inference time 1281.14
Generating Profile Report 'output_report.html'...
Can anyone give me some insight as to why using the Profile tool yields an inference time about 10x larger than what I see when I actually test the function calls in Python?
One thought I had was data type - what data type does mvNCProfile use? FP16, FP32, or FP64? Is there any way to change this and test inference times with different data types?
I'm using FP16 in my python code if that is relevant information.
Thanks!
- Tags:
- Path
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@sgiardi3 The time output that the mvNCProfile tool reports is strictly the time spent processing on the chip. It should typically be less vs LoadTensor() + GetResult(). The issue you are seeing could be due to a bug. I would like to reproduce this issue on my bench if you can provide a link to your network. mvNCProfile uses FP16 data. Also by the way, in your code, the LoadTensor() line, its actually timing the numpy fp16 conversion also. if you remove the fp16 line to outside of the timed code, you may even get a lower result. Thanks for reporting this. I hope you can provide a network for testing purposes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel Thanks for the response! It seems as though I am having this issue with the SSD networks/weights available on github as well as the one's we fine tuned. For testing you can download the Pascal VOC Model from the SSD github.
https://github.com/weiliu89/caffe/tree/ssd
If you scroll down to the bottom I just verified this issue on the 07+12+COCO: SSD300* model/weights.
This was the Download Link
All I did was remove the following lines from the deploy.prototxt to remove the dependency on the label map:
save_output_param {
label_map_file: "data/VOC0712/labelmap_voc.prototxt"
}
I then compiled with:
mvNCCompile deploy.prototxt -w VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel -s 12 -in conv1_1 -on detection_out -o mov_graph_coco
Profiled with:
mvNCProfile deploy.prototxt -w VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel -s 12 -in conv1_1 -on detection_out
Profile Result:
mvNCProfile v02.00, Copyright @ Movidius Ltd 2016
/usr/local/bin/ncsdk/Controllers/FileIO.py:52: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
"Consider reducing your data sizes for best performance\033[0m")
USB: Transferring Data...
Time to Execute : 1412.68 ms
USB: Myriad Execution Finished
Time to Execute : 1336.27 ms
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Network Summary
Detailed Per Layer Profile
Bandwidth time
# Name MFLOPs (MB/s) (ms)
===============================================================================
0 data 0.0 98093.3 0.005
1 conv1_1 311.0 298.0 15.569
2 conv1_2 6635.5 668.3 148.056
3 pool1 5.8 956.8 11.483
4 conv2_1 3317.8 420.0 59.199
5 conv2_2 6635.5 482.3 103.094
6 pool2 2.9 945.4 5.811
7 conv3_1 3317.8 169.4 76.295
.
.
.
69 conv9_2_mbox_conf_perm 0.0 7.0 0.023
70 conv9_2_mbox_conf_flat 0.0 22.4 0.014
71 conv9_2_mbox_priorbox 0.0 4.6 0.107
72 mbox_conf_reshape 0.0 152.7 2.290
73 mbox_conf_softmax 0.6 600.7 0.582
74 mbox_conf_flatten 0.0 196.3 1.782
75 detection_out 0.0 19.3 3.468
-------------------------------------------------------------------------------
Total inference time 1330.09
-------------------------------------------------------------------------------
Python Testing code:
import mvnc.mvncapi as fx
import os
import cv2
import numpy as np
import time
fx.SetGlobalOption(fx.GlobalOption.LOG_LEVEL, 2)
dim = (300, 300)
ncs_names = fx.EnumerateDevices()
if len(ncs_names) < 1:
print('Error: No NCS devices detected!')
quit()
ncs_device = fx.Device(ncs_names[0])
network_graph = #Path to "mov_graph_coco"
test_image_path = #Path to test images
inference_images = []
for image_name in os.listdir(test_image_path):
image = cv2.imread(os.path.join(test_image_path, image_name))
image = cv2.resize(image, dim)
image = image.astype(np.float32)
inference_images.append(image)
print('Image')
try:
ncs_device.OpenDevice()
except:
print('Error: NCS Device could not be opened')
quit()
print('Device Opened Normally!')
with open(network_graph, mode='rb') as f:
blob = f.read()
graph = ncs_device.AllocateGraph(blob)
img_num = 0
times = []
for test_image in inference_images:
try:
t0 = time.clock()
graph.LoadTensor(test_image.astype(np.float16), 'user object')
except:
print("Error Loading Tensor into Movidius Graph")
try:
output, userobj = graph.GetResult()
t1 = time.clock()
except:
print("Error getting result from NCS")
times.append(t1 - t0)
try:
detection = []
detections = []
for i in range(len(output)):
detection.append(output[i])
if i > 0 and i % 7 == 0:
print("Detection ", int(i/7), " for Image: ", img_num, "-------------------")
print(detection)
detections.append(detection)
detection = []
except:
print("Detection Processing Error")
try:
ncs_device.CloseDevice()
except:
print('Error: Could not close NCS Device!')
quit()
print('Closed NCS Device! Everything worked properly')
print(times)
print(np.average(times))
Python Script Results:
[0.16148699999999971, 0.15405300000000022, 0.15758600000000023, 0.1580459999999997, 0.16678099999999985, 0.16459499999999982, 0.16467399999999977, 0.170118, 0.15936899999999987, 0.16317100000000018]
0.16198799999999994
And again the mvNCProfile tool is returning an inference time almost 10x larger than my python test.
Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@sgiardi3 Thanks for providing the network and the thorough instructions. We were able to reproduce the issue and are investigating. Thank you for your patience.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel Just wanted to check in and see if there has been any update on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@sgiardi3 Sorry no update for this particular issue yet. As soon as I find something out, I will let you know. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having the same issue. Profile tool reporting 540ms. I'm measuring 45ms timing in Python (and that includes the time to send and receive the data from the compute stick).
An update on this would be very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@sgiardi3 - try time.time(). My results are matching profiler now.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page