I'm working with text-recognition-0012 model (http://docs.openvinotoolkit.org/2019_R1/_text_recognition_0012_description_text_recognition_0012.html) to recognize some texts. I need to detect more than one text in the image. So I tried to change the batch size from the network to do that.
These are my input and output for one text recognition:
Network input: [B x C x H x W] --> [ 1 x 1 x 32 x 120]
Network output [W x B x L] --> [30 x 1 x 37]
When I applied the following change:
InferenceEngine::CNNNetReader->getNetwork().setBatchSize(value); value > 1
My output was changed to the following:
value = 2 --> [60 x 1 x 37]
value = 3 --> [90 x 1 x 37]
The correct for me should be:
value = 2 --> [60 x 2 x 37]
value = 3 --> [90 x 3 x 37]
Am I wrong? This change is making my network to incorrect recognize texts. How can I process more than one text in the same image.