Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6582 Discusiones

multi-batch latency comparative experiment

shirleyliu
Nuevo Colaborador I
4.208 Vistas

I want to test the relationship between batch size and parallel computing on GPU. I did the following experiment 

device: GPU(Intel UHD Graphics 630)

model input shape: (1,3,8,8)

input image shape:(8,8)

my command is like this:

python ./benchmark_app.py -m xxx_8x8_fp16_b1.xml -i 0801_8x8.png -d GPU -api sync -niter 1000 -b batchsize

I need to focus on latency so I set the -api synchronize mode 

and the only thing I need to do is to change the batch size, the test result is in the following chart

batchsize

8

16

32

64

128

256

512

1024

1200

1280

1300

1500

1536

2000

2048

3000

3500

3700

3850

3900

4000

4096

Inputshape

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

8x8

GPU usage

50%

53.5%

59%

68%

75%

83%

89%

95%

94%

97%

94%

95%

98%

97%

98%

97.5

98%

99%

99%

98%

100%

100%

latency

0.35

0.39

0.51

0.66

0.86

1.38

2.57

6.54

5.7

9.94

5.96

6.80

13.19

9.56

19.19

14.23

17.41

17.76

18.68

18.61

41.57

42.41

 

and look at the bold numbers. It's strange that the result doesn't show the regularity of the latency increases with the increase of batch size. and specifically, if the batch size is in a particular size, the GPU usage is slightly reduced and the latency is reduced a lot 

could you please give me some advice on this phenomenon?thanks

 

Etiquetas (2)
0 kudos
1 Solución
Peh_Intel
Moderador
3.664 Vistas

Hi shirleyliu,

 

Thank you for your patient.

 

Our GPU development team has looked into this reported behavior and based on their inputs, in general such behavior may occur because when changing the batch, other kernels and/or other parameters for blocking may be selected. As per today, they have not looked into such batches in details with their main priority is batch size = 1. Recently, they are starting into looking into bigger ones for discreet GPU but it was basically 32 and 64.

 

We hope above explanations suits your needs.

 

 

Best Regards,

Peh

 

Ver la solución en mensaje original publicado

9 Respuestas
Peh_Intel
Moderador
4.196 Vistas

Hi Shirley,


Thanks for reaching out to us and sharing your findings.


First and foremost, I unable to run Benchmark App with model by changing to a very high batch size due to exceeding the max size of memory object allocation as the models in Intel Public Model is trained with large input shape (227x227). Hence, I do not have models with very small input shape for duplicating your issues.


Could you share your model with us for further investigation?


Besides, try to specific the same number of streams and carry out the experiment again:

python ./benchmark_app.py -m xxx_8x8_fp16_b1.xml -i 0801_8x8.png -d GPU -api sync -niter 1000 -b batchsize -nstreams 1



Regards,

Peh


shirleyliu
Nuevo Colaborador I
4.184 Vistas

Thanks, I have tried to set nstreams = 1, but the phenomenon is similar. 

Peh_Intel
Moderador
4.149 Vistas

Hi Shirley,

 

Thanks for sharing your model with us.

 

I also obtained the unexpected results when running Benchmark App with your model on GPU Plugin.

 

However, running Benchmark App with your model on CPU Plugin resulting in desired results which in obtained latency increasing with the increasing of batch sizes. You can have a look on the attached picture.

 

As such, we will investigate this unexpected behavior and get back to you at the earliest.

 

 

Regards,

Peh

 

shirleyliu
Nuevo Colaborador I
4.106 Vistas
Peh_Intel
Moderador
4.067 Vistas

Hi Shirley,

 

We had channeled this issue to our development team for better explanation on such behavior. It might take some time and we will get back to you once we received some feedbacks.

 

 

Regards,

Peh

Peh_Intel
Moderador
4.004 Vistas

Hi Shirley,

 

We received an update from our developers that they will only look at this unexpected behavior in the beginning of January 2022 as their current resources are focused to enable 2022.1 release.

 

As per today, we will keep this case open until we obtain updates from them.

 

 

Regards,

Peh

shirleyliu
Nuevo Colaborador I
3.997 Vistas

Thank you very much. and Let me know if they have any results

Peh_Intel
Moderador
3.665 Vistas

Hi shirleyliu,

 

Thank you for your patient.

 

Our GPU development team has looked into this reported behavior and based on their inputs, in general such behavior may occur because when changing the batch, other kernels and/or other parameters for blocking may be selected. As per today, they have not looked into such batches in details with their main priority is batch size = 1. Recently, they are starting into looking into bigger ones for discreet GPU but it was basically 32 and 64.

 

We hope above explanations suits your needs.

 

 

Best Regards,

Peh

 

Peh_Intel
Moderador
3.612 Vistas

Hi shirleyliu,


Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored. 



Regards,

Peh


Responder