Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Using OpenCV

ninhngt
Beginner
641 Views
I am using OpenCV in my project. As OpenCV uses OpenMP for multithreading, can I use TBB in other modules in my project. Is there a way to avoid oversubscription ?
0 Kudos
1 Solution
Anton_Pegushin
New Contributor II
641 Views
Quoting - ninhngt

I am using OpenCV for pattern recognition on a large number of photos. I want to do the work in parallel.
Hi,

I see. The most important question then - does the current (without TBB) implementation scale? And how well. If you're already seeing near-4x performance increase on a 4-core machine (and/or ~8x on 8-core), then adding parallelism on the outer-level (going in parallel through the stack of images) might even slow down the application, because you would be adding parallel overhead without an ability for it to actually be beneficial.

And even if your application does not scale very well, I would not recommend jumping to conclusion that it's necessary to introduce another multi-threading framework into the picture right away. A good idea at this point would probably be to run performance and threading analysis on the application with the use of either Intel Thread Profiler or Intel Parallel Amplifier's Concurrency/Locks and Waits Analysis. Analysis results will report bottlenecks, serialization points or areas of poor concurrency, which might lead you to find out what the problem with OpenMP usage in your application is (through OpenCV, of course) and how to fix it.

However, if after the analysis you decide to introduce TBB into the application, then there is a way that we tried it in experiments that showed acceptable performance results. And it's - serialization of OpenMP (can easily be done by the use of an environment variable) and having parallelism in terms of TBB only. But I'm not sure that this is going to be an optimal solution in your particular case. Simply going through a stack of images in parallel, introduces a very coarse-grain parallelism and, depending on the number of images in the stack and the sizes of these images, you might end up having noticeable worker thread imbalance at the end of your parallel algorithm.

Unfortunately I'm not aware of any general solution to this problem and each application of the type (complex mix of OpenMP and TBB) needs to be looked at separately. Is it possible for you and would you be willing to share the source code, so that other community members or TBB developers could look at it and evaluate different models of threading runtimes usages?

View solution in original post

0 Kudos
4 Replies
Anton_Pegushin
New Contributor II
641 Views
Quoting - ninhngt
I am using OpenCV in my project. As OpenCV uses OpenMP for multithreading, can I use TBB in other modules in my project. Is there a way to avoid oversubscription ?
Hi,

could you talk about your OpenCV usage model a bit more? In my understanding there are ways to either avoid or at least limit the effect from oversubcription, but solution would really be use-case dependant.

0 Kudos
ninhngt
Beginner
641 Views
Hi,

could you talk about your OpenCV usage model a bit more? In my understanding there are ways to either avoid or at least limit the effect from oversubcription, but solution would really be use-case dependant.


I am using OpenCV for pattern recognition on a large number of photos. I want to do the work in parallel.
0 Kudos
Anton_Pegushin
New Contributor II
642 Views
Quoting - ninhngt

I am using OpenCV for pattern recognition on a large number of photos. I want to do the work in parallel.
Hi,

I see. The most important question then - does the current (without TBB) implementation scale? And how well. If you're already seeing near-4x performance increase on a 4-core machine (and/or ~8x on 8-core), then adding parallelism on the outer-level (going in parallel through the stack of images) might even slow down the application, because you would be adding parallel overhead without an ability for it to actually be beneficial.

And even if your application does not scale very well, I would not recommend jumping to conclusion that it's necessary to introduce another multi-threading framework into the picture right away. A good idea at this point would probably be to run performance and threading analysis on the application with the use of either Intel Thread Profiler or Intel Parallel Amplifier's Concurrency/Locks and Waits Analysis. Analysis results will report bottlenecks, serialization points or areas of poor concurrency, which might lead you to find out what the problem with OpenMP usage in your application is (through OpenCV, of course) and how to fix it.

However, if after the analysis you decide to introduce TBB into the application, then there is a way that we tried it in experiments that showed acceptable performance results. And it's - serialization of OpenMP (can easily be done by the use of an environment variable) and having parallelism in terms of TBB only. But I'm not sure that this is going to be an optimal solution in your particular case. Simply going through a stack of images in parallel, introduces a very coarse-grain parallelism and, depending on the number of images in the stack and the sizes of these images, you might end up having noticeable worker thread imbalance at the end of your parallel algorithm.

Unfortunately I'm not aware of any general solution to this problem and each application of the type (complex mix of OpenMP and TBB) needs to be looked at separately. Is it possible for you and would you be willing to share the source code, so that other community members or TBB developers could look at it and evaluate different models of threading runtimes usages?
0 Kudos
ninhngt
Beginner
641 Views
Hi,

I see. The most important question then - does the current (without TBB) implementation scale? And how well. If you're already seeing near-4x performance increase on a 4-core machine (and/or ~8x on 8-core), then adding parallelism on the outer-level (going in parallel through the stack of images) might even slow down the application, because you would be adding parallel overhead without an ability for it to actually be beneficial.

And even if your application does not scale very well, I would not recommend jumping to conclusion that it's necessary to introduce another multi-threading framework into the picture right away. A good idea at this point would probably be to run performance and threading analysis on the application with the use of either Intel Thread Profiler or Intel Parallel Amplifier's Concurrency/Locks and Waits Analysis. Analysis results will report bottlenecks, serialization points or areas of poor concurrency, which might lead you to find out what the problem with OpenMP usage in your application is (through OpenCV, of course) and how to fix it.

However, if after the analysis you decide to introduce TBB into the application, then there is a way that we tried it in experiments that showed acceptable performance results. And it's - serialization of OpenMP (can easily be done by the use of an environment variable) and having parallelism in terms of TBB only. But I'm not sure that this is going to be an optimal solution in your particular case. Simply going through a stack of images in parallel, introduces a very coarse-grain parallelism and, depending on the number of images in the stack and the sizes of these images, you might end up having noticeable worker thread imbalance at the end of your parallel algorithm.

Unfortunately I'm not aware of any general solution to this problem and each application of the type (complex mix of OpenMP and TBB) needs to be looked at separately. Is it possible for you and would you be willing to share the source code, so that other community members or TBB developers could look at it and evaluate different models of threading runtimes usages?

Thank you for replying. Unfortunately, my company doesn't allow me to share the source code. Thank you for your help anyway.
0 Kudos
Reply