Intel® Distribution of OpenVINO™ Toolkit
Community support and discussions about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all things computer vision-related on Intel® platforms.
5764 Discussions

What is the expected fps for text ocr sample?

shimondoodkin
Beginner
667 Views
I have Intel I7 10 gen u. CPU.

In cpu device mode
I am getting 1 fps with my intel i7 u 10 cpu.
on 1080 x1920 images.

It feels very slow
And in gpu device mode the loading of the model takes 5 minutes.

And one inference takes 7000ms

Something feels very wrong here. What am i doing wrong?

Or is this performance is expected?

0 Kudos
7 Replies
Peh_Intel
Moderator
608 Views

Hi Shimon,


Greetings to you.


You’re doing well. I also observe the same performance issues with CPU and GPU. Thanks for pointing out these issues. We will report them to our development team and get back to you if there are any updates.



Regards,

Peh


shimondoodkin
Beginner
596 Views

I found out that the reason why LoadNetwork takes a huge unrealistic amount of time 5 - 20 minutes.
is because the method compiles the OpenCL kernels on the fly. 
The solution was to create a folder "cl_cache" in the application directory  https://github.com/intel/compute-runtime/blob/master/FAQ.md . then in the second time, it loaded ok.

it is a problem of expectations: (as I understand by the function name)  I expect when I call load network, I expect it to transfer data from disk to memory. and if it takes so much time I start to suspect that there is a bottleneck in CPU transfer. and could conclude this technology does not work.  as my customer concluded, for him it is unrealistic to wait even 3 minutes before playing a game. 

so the solution is to split the function into two. compile the kernels (like making the case) into a specific folder, and the load network method with kernels. from (specified cache like ) folder or from memory struct. then the wait time would match the expectations.

 

 

 

 

 

shimondoodkin
Beginner
594 Views

my performance is like this:

PS C:\Users\user\Documents\Intel\OpenVINO\omz_demos_build\intel64\Release> ./text_detection_demo.exe -m_td "intel\horizontal-text-detection-0001\FP16\horizontal-text-detection-0001.xml" -m_tr "intel\text-recognition-0012\FP16\text-recognition-0012.xml" -i test1.jpg -loop -d_tr "GPU" -d_td "GPU" -b 1
InferenceEngine: API version ......... 2.1
Build ........... 2021.2.0-1877-176bdf51370-releases/2021/2
[ INFO ] Parsing input parameters
[ INFO ] Loading Inference Engine
[ INFO ] Device info:
[ INFO ] GPU
clDNNPlugin version ......... 2.1
Build ........... 2021.2.0-1877-176bdf51370-releases/2021/2

[ INFO ] Loading network files
[ INFO ] Loading network files 1
ReadNetwork
ReadNetwork is 294ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 209653ms
CreateInferRequest
create inferrequest is 17ms
LoadNetwork done
[ INFO ] Loading network files 2
ReadNetwork
ReadNetwork is 179ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 600387ms
CreateInferRequest
create inferrequest is 5ms
LoadNetwork done
[ INFO ] openImagesCapture
[ INFO ] Starting inference
To close the application, press 'CTRL+C' here or switch to the output window and press ESC or Q
text detection model inference (ms) (fps): 203 4.92611
text detection postprocessing: took no time
text recognition model inference (ms) (fps): 72.55 13.7836
text recognition postprocessing (ms) (fps): 0.932 1072.96

text crop (ms) (fps): 0.48815 2048.55

 

 

 

 

 

 

 

 

 

 

 

----

after caching the opencl kernels the load times are more rasonable

 


[ INFO ] Loading network files
[ INFO ] Loading network files 1
ReadNetwork
ReadNetwork is 381ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 4270ms
CreateInferRequest
create inferrequest is 15ms
LoadNetwork done
[ INFO ] Loading network files 2
ReadNetwork
ReadNetwork is 108ms
change shape is 0ms
configure is 0ms
prepere output blobs is 0ms
LoadNetwork
loadnetwork is 8671ms
CreateInferRequest
create inferrequest is 6ms
LoadNetwork done
[ INFO ] openImagesCapture
[ INFO ] Starting inference
To close the application, press 'CTRL+C' here or switch to the output window and press ESC or Q

Mode: OpenCL
text detection model inference (ms) (fps): 153 6.53595
text detection postprocessing: took no time
text recognition model inference (ms) (fps): 84.2 11.8765
text recognition postprocessing (ms) (fps): 0.02635 37950.7

text crop (ms) (fps): 0.1501 6662.23

 

 

 

---

 

also a problem that when pressing control+c it does not print the performance info

it only prints it when clicking esc on the image, and so it was more confusing.

 

Peh_Intel
Moderator
527 Views

Hi Shimon,


Thanks for sharing these helpful solutions in the community. We sincerely appreciate your contribution. We appreciate if you can create a pull request with the proposed changes in the openvinotoolkit GitHub so that developers can review.


You may refer to this contribution guide: https://github.com/openvinotoolkit/openvino/wiki/Contribute

 

For the “Control + C” , ‘Esc’ and ‘q’ question, basically “Control + C” is designed to stop the program completely without going to the next execution, which ‘ESC‘ and ‘Q’ is to terminate this loop and go to next execution (for e.g. terminate the output display and process to printout).



Regards,

Peh


shimondoodkin
Beginner
516 Views

there is no solution in my messages

it just shows better where is the problem.

anyways I stopped working on that. I don't see how it will work. 

Hari_B_Intel
Moderator
491 Views

Hi @shimondoodkin 

After further investigation, the Text Detection C++ Demo you’re referring to is designed to work with input images based on this documentation

>Text Detection C++ Demo - Text Detection demo. It detects and recognizes multi-oriented scene text on an input image and puts a bounding box around detected area.

 

So when you input video to the Text Detection demo, the 1 FPS is expected since the text detection model is not quite lightweight for video processing.

Also, based on the Text Detection C++ Demo description,

$ text_detection_demo -h

-i  Required. An input to process. The input must be a single image, a folder of images or anything that cv::VideoCapture can process

 

To answer your 2nd question, regarding GPU taking a longer time to load.

This is due to compilation time being dependent on OpenCL kernels compilation in clDNN, and the results are expected to take a longer time to load if the GPU device is used.

 

Hope this answers your questions.

 

Thank you

Hari_B_Intel
Moderator
449 Views

Hi shimondoodkin


This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.


Thank you


Reply