Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6403 Discussions

What is the expected fps for text ocr sample?

shimondoodkin
Beginner
1,332 Views
I have Intel I7 10 gen u. CPU.

In cpu device mode
I am getting 1 fps with my intel i7 u 10 cpu.
on 1080 x1920 images.

It feels very slow
And in gpu device mode the loading of the model takes 5 minutes.

And one inference takes 7000ms

Something feels very wrong here. What am i doing wrong?

Or is this performance is expected?

0 Kudos
7 Replies
Peh_Intel
Moderator
1,273 Views

Hi Shimon,


Greetings to you.


You’re doing well. I also observe the same performance issues with CPU and GPU. Thanks for pointing out these issues. We will report them to our development team and get back to you if there are any updates.



Regards,

Peh


0 Kudos
shimondoodkin
Beginner
1,261 Views

I found out that the reason why LoadNetwork takes a huge unrealistic amount of time 5 - 20 minutes.
is because the method compiles the OpenCL kernels on the fly. 
The solution was to create a folder "cl_cache" in the application directory  https://github.com/intel/compute-runtime/blob/master/FAQ.md . then in the second time, it loaded ok.

it is a problem of expectations: (as I understand by the function name)  I expect when I call load network, I expect it to transfer data from disk to memory. and if it takes so much time I start to suspect that there is a bottleneck in CPU transfer. and could conclude this technology does not work.  as my customer concluded, for him it is unrealistic to wait even 3 minutes before playing a game. 

so the solution is to split the function into two. compile the kernels (like making the case) into a specific folder, and the load network method with kernels. from (specified cache like ) folder or from memory struct. then the wait time would match the expectations.

 

 

 

 

 

0 Kudos
shimondoodkin
Beginner
1,259 Views

my performance is like this:

PS C:\Users\user\Documents\Intel\OpenVINO\omz_demos_build\intel64\Release> ./text_detection_demo.exe -m_td "intel\horizontal-text-detection-0001\FP16\horizontal-text-detection-0001.xml" -m_tr "intel\text-recognition-0012\FP16\text-recognition-0012.xml" -i test1.jpg -loop -d_tr "GPU" -d_td "GPU" -b 1
InferenceEngine: API version ......... 2.1
Build ........... 2021.2.0-1877-176bdf51370-releases/2021/2
[ INFO ] Parsing input parameters
[ INFO ] Loading Inference Engine
[ INFO ] Device info:
[ INFO ] GPU
clDNNPlugin version ......... 2.1
Build ........... 2021.2.0-1877-176bdf51370-releases/2021/2

[ INFO ] Loading network files
[ INFO ] Loading network files 1
ReadNetwork
ReadNetwork is 294ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 209653ms
CreateInferRequest
create inferrequest is 17ms
LoadNetwork done
[ INFO ] Loading network files 2
ReadNetwork
ReadNetwork is 179ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 600387ms
CreateInferRequest
create inferrequest is 5ms
LoadNetwork done
[ INFO ] openImagesCapture
[ INFO ] Starting inference
To close the application, press 'CTRL+C' here or switch to the output window and press ESC or Q
text detection model inference (ms) (fps): 203 4.92611
text detection postprocessing: took no time
text recognition model inference (ms) (fps): 72.55 13.7836
text recognition postprocessing (ms) (fps): 0.932 1072.96

text crop (ms) (fps): 0.48815 2048.55

 

 

 

 

 

 

 

 

 

 

 

----

after caching the opencl kernels the load times are more rasonable

 


[ INFO ] Loading network files
[ INFO ] Loading network files 1
ReadNetwork
ReadNetwork is 381ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 4270ms
CreateInferRequest
create inferrequest is 15ms
LoadNetwork done
[ INFO ] Loading network files 2
ReadNetwork
ReadNetwork is 108ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 8671ms
CreateInferRequest
create inferrequest is 6ms
LoadNetwork done
[ INFO ] openImagesCapture
[ INFO ] Starting inference
To close the application, press 'CTRL+C' here or switch to the output window and press ESC or Q

Mode: OpenCL
text detection model inference (ms) (fps): 153 6.53595
text detection postprocessing: took no time
text recognition model inference (ms) (fps): 84.2 11.8765
text recognition postprocessing (ms) (fps): 0.02635 37950.7

text crop (ms) (fps): 0.1501 6662.23

 

 

 

---

 

also a problem that when pressing control+c it does not print the performance info

it only prints it when clicking esc on the image, and so it was more confusing.

 

0 Kudos
Peh_Intel
Moderator
1,192 Views

Hi Shimon,


Thanks for sharing these helpful solutions in the community. We sincerely appreciate your contribution. We appreciate if you can create a pull request with the proposed changes in the openvinotoolkit GitHub so that developers can review.


You may refer to this contribution guide: https://github.com/openvinotoolkit/openvino/wiki/Contribute

 

For the “Control + C” , ‘Esc’ and ‘q’ question, basically “Control + C” is designed to stop the program completely without going to the next execution, which ‘ESC‘ and ‘Q’ is to terminate this loop and go to next execution (for e.g. terminate the output display and process to printout).



Regards,

Peh


0 Kudos
shimondoodkin
Beginner
1,181 Views

there is no solution in my messages

it just shows better where is the problem.

anyways I stopped working on that. I don't see how it will work. 

0 Kudos
Hari_B_Intel
Moderator
1,156 Views

Hi @shimondoodkin 

After further investigation, the Text Detection C++ Demo you’re referring to is designed to work with input images based on this documentation

>Text Detection C++ Demo - Text Detection demo. It detects and recognizes multi-oriented scene text on an input image and puts a bounding box around detected area.

 

So when you input video to the Text Detection demo, the 1 FPS is expected since the text detection model is not quite lightweight for video processing.

Also, based on the Text Detection C++ Demo description,

$ text_detection_demo -h

-i  Required. An input to process. The input must be a single image, a folder of images or anything that cv::VideoCapture can process

 

To answer your 2nd question, regarding GPU taking a longer time to load.

This is due to compilation time being dependent on OpenCL kernels compilation in clDNN, and the results are expected to take a longer time to load if the GPU device is used.

 

Hope this answers your questions.

 

Thank you

0 Kudos
Hari_B_Intel
Moderator
1,114 Views

Hi shimondoodkin


This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.


Thank you


0 Kudos
Reply