- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I successfully ran ncs-fullcheck example and used it to inference several pictures. The performance of Alexnet is around 200ms and GoogLeNet is around 550ms. However, when I ran the profiling from tool kit (make example), it shows both AlexNet and GoogLeNet inference is around 90ms. There seem to be a gap between profile data and real inference time. Does anyone know where is this gap comes from (transfer image to the stick and retrieve result out, i.e.), and how do I get the performance the same as profiled?
Another question is the inference result seems different from caffe running on the same caffemodel (using cpp classifier), how do I get same result as using caffe?
Caffe: AlexNet
0.3094 - "n02124075 Egyptian cat"
0.1761 - "n02123159 tiger cat"
0.1221 - "n02123045 tabby, tabby cat"
0.1132 - "n02119022 red fox, Vulpes vulpes"
0.0421 - "n02085620 Chihuahua"
NCS
AlexNet
Egyptian cat (69.19%) tabby, tabby cat (6.59%) grey fox, gray fox, Urocyon cinereoargenteus (5.42%) tiger cat (3.93%) hare (3.52%)
- Tags:
- Movidius
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi akey,
We found an issue with our "ncapi/tools/convert_models.sh" script. You need to add the argument "-s 12" to mvNCCompile.pyc to enable all the vector engines. Please execute that script to regenerate the graph files and you should see the performance similar to that you were seeing with "make example01"
Thank You
Ramana @ Intel
Before the change
ubuntu@ubuntu-UP:~/workspace/MvNC_SDK/ncapi/c_examples$ ./ncs-fullcheck ../networks/GoogLeNet/ ../images/512_Amplifier.jpg
OpenDevice 4 succeeded
Graph allocated
radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
Inference time: 569.302185 ms, total time 575.650308 ms
radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
Inference time: 556.881409 ms, total time 562.636079 ms
Deallocate graph, rc=0
Device closed, rc=0
Change
cd ../tools
vi convert_models.sh
** Add -s 12 to all the compiles
!/bin/sh
NCS_TOOLKIT_ROOT='../../bin'
echo $NCS_TOOLKIT_ROOT
python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/SqueezeNet/NetworkConfig.prototxt -w ../networks/SqueezeNet/squeezenet_v1.0.caffemodel -o ../networks/SqueezeNet/graph -s 12
python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/GoogLeNet/NetworkConfig.prototxt -w ../networks/GoogLeNet/bvlc_googlenet.caffemodel -o ../networks/GoogLeNet/graph -s 12
python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/Gender/NetworkConfig.prototxt -w ../networks/Gender/gender_net.caffemodel -o ../networks/Gender/graph -s 12
python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/Age/deploy_age.prototxt -w ../networks/Age/age_net.caffemodel -o ../networks/Age/graph -s 12
python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/AlexNet/NetworkConfig.prototxt -w ../networks/AlexNet/bvlc_alexnet.caffemodel -o ../networks/AlexNet/graph -s 12
Execute the script
./convert_models.sh
cd ../c_examples
After the change
ubuntu@ubuntu-UP:~/workspace/MvNC_SDK/ncapi/c_examples$ ./ncs-fullcheck ../networks/GoogLeNet/ ../images/512_Amplifier.jpg
OpenDevice 4 succeeded
Graph allocated
radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
Inference time: 108.950851 ms, total time 115.101073 ms
radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
Inference time: 88.571877 ms, total time 95.765275 ms
Deallocate graph, rc=0
Device closed, rc=0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Much faster now. Continous inference speed from webcam is about 9.5 FPS for GoogleNet. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@akey can you tell me how you calculate the FPS for GoogleNet, please ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ibrahimsoliman in python you can use:
from timeit import default_timer as timer
time_start = timer()
CODE
time_end = timer()
print('FPS: %.2f fps' % (1000/(time_end-time_start)))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One thing I don't get though with NCS speed is why it is not running at full 100 GOPS as advertised. For example, in SqueezeNet example below and all other networks, we can see
- MFLOPS estimate is 2x compared to actual op count. Is that because of fp16?
- MFLOPS are calculated at ~1/3 speed of 100 GOPS. This ratio varies from 1/4 to 1/2 depending on tensor and convolution type.
Detailed Per Layer Profile
Layer Name MFLOPs Bandwidth MB/s time(ms)
…
25 fire9/squeeze1x1 12.845 587.19 0.43
26 fire9/expand1x1 6.423 150.65 0.37
27 fire9/expand3x3 57.803 318.67 1.57
28 conv10 200.704 272.92 4.28
29 pool10 0.392 722.59 0.52
30 prob 0.003 10.49 0.18
Total inference time 26.89
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page