- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to use 4 NCS' to inference a large image. Currently I'm breaking the image into 32x32 pixel images and loading an image each onto an NCS using LoadTensor(). Once the images are loaded I iterate over the NCS' and call GetResult(), at which point the result is added to 2d array of probabilities. I repeat this process until the large image is entirely inferenced like so:
while len(inputs) != 0:
graph_handle[0].LoadTensor(inputs.pop())
graph_handle[1].LoadTensor(inputs.pop())
graph_handle[2].LoadTensor(inputs.pop())
graph_handle[3].LoadTensor(inputs.pop())
res1 = graph_handle[0].GetResult(…)
res2 = graph_handle[1].GetResult(…)
res3 = graph_handle[2].GetResult(…)
res4 = graph_handle[3].GetResult(…)
process results
As you can see this means that the inferencing occurs sequentially - only one NCS is being used at any given time. Is there anyway to have EACH of these NCS' managed in parallel processes that continually calls LoadTensor() and GetResult() (or any other way to have each NCS inferencing in parallel)? I would like to have each NCS making inferences in parallel with one another to reduce processing time, as the main bottleneck is the inferencing (the input data is streamed at a faster rate then the inferences occur).
I've tried using os.fork() and python's multiprocessing module with pipes to have one process taking care of one NCS but inevitably each time GetResult() fails when called. I have tried initialising the NCS' (loading compiled graph and opening device) before the fork in the parent process and after the fork in the child process but either way GetResult() still fails when called.
My current results on a subset of the large image suggest that the majority of the time is spent inferencing, so ideally this time could be quartered when using 4 NCS':
Time spent:
-evaluating (total): 0:04:09.721723
-recieving & formatting input: 0:00:31.952764
-loading tensors: 0:00:15.504957
-inferencing: 0:03:22.264002
Is this possible? Any advice would be much appreciated.
- Tags:
- Tensorflow
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Isaac You can take a look at the examples in the Ncappzoo (https://github.com/movidius/ncappzoo/tree/master/apps). Some of the examples, like our MultiStick GoogLeNet example (https://github.com/movidius/ncappzoo/blob/master/apps/MultiStick_GoogLeNet/MultiStick_GoogLeNet.py), use threading to run multiple inferences from multiple sticks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel . I am able to successfully run those examples you've provided, thanks.
I was initially unable to run my own implementation, every time I called GetResult() in any parallel thread I'd receive the following error:
File "/usr/local/lib/python3.5/dist-packages/mvnc/mvncapi.py", line 264, in GetResult
raise Exception(Status(status))
Exception: mvncStatus.ERROR
which is followed by the device(s) disconnecting and not reconnecting (dmesg -w output):
[95834.145350] usb 2-2.2: USB disconnect, device number 60
[95834.224190] usb 1-2.1.3: USB disconnect, device number 74
However the problem seems to be fixed by introducing a small wait between LoadTensor() and GetResult():
GRAPH_HANDLES[device_number].LoadTensor(image_input, image_loc_string)
time.sleep(0.08)
output, image_loc_string = GRAPH_HANDLES[device_number].GetResult()
The custom tensorflow graph I'm using is very simple (3 conv layers, 2 dense layers), and the images I'm loading are 32x32x3 float16s, I'm not sure if that has anything to do with it. Note that if I reduce the wait to less than 0.06 seconds then the original error will occur, 0.08 seconds seems to be the minimum.
Would you know of any other workaround? I'll be running inferences in batches of 100,000+ images so a 0.08 second wait per inference will begin to add up. Perhaps I can make the graph more complex and see if this makes any difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By switching from multiprocessing's Process to threading's Thread I was able to get the NCS parallelism working.
I am still unsure why time.sleep(0.08) prevented an error when using GetResult() when using multithreading's Process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Isaac I think that the problem is occurring because you may not have set the graph option to block. After the script runs loadTensor(), it runs getResult() right away, but because it takes a little bit of time to process loadTensor(), the result may not actually be available yet so that may be the reason your getResult() is giving you errors. You can read more about the python API and setting the graph options @ https://movidius.github.io/ncsdk/py_api/ and https://movidius.github.io/ncsdk/py_api/GraphOption.html and https://movidius.github.io/ncsdk/py_api/Graph.html.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel I can confirm that the graph's were explicitly set to block (though I believe that is the default behaviour) in the following manner:
//Set graph options so that calls to LoadTensor() block until complete
graph_handles[device_index].SetGraphOption(mvnc.GraphOption.DONT_BLOCK, 0)
Nonetheless the implementation is working since switching to Threading. Thank you for your help.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page