- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to test the relationship between batch size and parallel computing on GPU. I did the following experiment
device: GPU(Intel UHD Graphics 630)
model input shape: (1,3,8,8)
input image shape:(8,8)
my command is like this:
python ./benchmark_app.py -m xxx_8x8_fp16_b1.xml -i 0801_8x8.png -d GPU -api sync -niter 1000 -b batchsize
we need to focus on latency so I set the -api synchronize mode
and only thing I need to do is to change the batch size, the test result is in the following chart
batchsize |
8 |
16 |
32 |
64 |
128 |
256 |
512 |
1024 |
1200 |
1280 |
1300 |
1500 |
1536 |
2000 |
2048 |
3000 |
3500 |
3700 |
3850 |
3900 |
4000 |
4096 |
Inputshape |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
8x8 |
GPU usage |
50% |
53.5% |
59% |
68% |
75% |
83% |
89% |
95% |
94% |
97% |
94% |
95% |
98% |
97% |
98% |
97.5 |
98% |
99% |
99% |
98% |
100% |
100% |
latency |
0.35 |
0.39 |
0.51 |
0.66 |
0.86 |
1.38 |
2.57 |
6.54 |
5.7 |
9.94 |
5.96 |
6.80 |
13.19 |
9.56 |
19.19 |
14.23 |
17.41 |
17.76 |
18.68 |
18.61 |
41.57 |
42.41 |
and look at the bold numbers. It's strange that the result doesn't show the regularity of the latency increases with the increase of batch size. and specifically, if the batch size is in a particular size, the GPU usage is slightly reduced and the latency reduced a lot
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi XiaoXiong,
Thanks for reaching out to us and sharing your findings.
First and foremost, I unable to run Benchmark App with model by changing to a very high batch size due to exceeding the max size of memory object allocation as the models in Intel Public Model is trained with large input shape (227x227). Hence, I do not have models with very small input shape for duplicating your issues.
Could you share your model with us for further investigation?
Besides, try to specific the same number of streams and carry out the experiment again:
python ./benchmark_app.py -m xxx_8x8_fp16_b1.xml -i 0801_8x8.png -d GPU -api sync -niter 1000 -b batchsize -nstreams 1
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi XiaoXiong,
Any new updates will post in this community post.
Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.
Regards,
Peh

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page