- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to replace the OpenCV KLT with the IPP implementation.The IPP function has a nice interface for reusing the pyramids of the images.
But using the IPP function, the performance goes down by a factor of two or more. Looking at the program with VTune (See for a list of top consumers below), it turns out that most of the time is spend for threading organization.
I tried static and dynamic linking with IPP but the result is the same. Decreasing the number of threads using ippSetNumThreads improves performance.
Any idea what is wrong?
Thanks,
Rasmus
Used: IPP 7.0.5, static IPP libraries, Release|x64,Visual Studio 2008C++ compiler
"Function / Call Stack","CPU Time","Module","Function (Full)"
"SleepEx","253.3485215","KERNELBASE.dll","SleepEx"
"_kmp_fork_call","153.1760516","libiomp5md.dll","_kmp_fork_call"
"RtlEnterCriticalSection","148.438029","ntdll.dll","RtlEnterCriticalSection"
"y8_ippiOpticalFlowPyrLK_8u_C1R","130.25269","SFMPlugin.dll","y8_ippiOpticalFlowPyrLK_8u_C1R"
"vcomp_for_static_simple_init","70.78258687","libiomp5md.dll","vcomp_for_static_simple_init"
"_kmpc_omp_taskyield","65.49529073","libiomp5md.dll","_kmpc_omp_taskyield"
"y8_ownCopySubpix_8u16u_C1R_Sfs_U8","9.835349407","SFMPlugin.dll","y8_ownCopySubpix_8u16u_C1R_Sfs_U8"
"_kmp_invoke_microtask","9.093397538","libiomp5md.dll","_kmp_invoke_microtask"
"RtlLeaveCriticalSection","6.78507023","ntdll.dll","RtlLeaveCriticalSection"
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried static and dynamic linking with IPP but the result is the same. Decreasing the number of threads using ippSetNumThreads improves performance.
...
- There are less context switches between threads
- There are less "fights" over data when asynchronization object (a Critical Session )is used
- It is possible that a CPU'scache lines are used more efficiently
How big are data sets?
How many threads did you set, that is before and after?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using a640x480 video with approximately 600 frames. Each frame is processed. The OpenCV implementation needs~25s for processing (not all for optical flow). Switching to the IPP implementation raises the time used to ~30s with one thread. Two threads take about 50s, and 8 threads (the default, beacuse I have a quadcore with hyperthreading) more than 100s.
I tried with some other test videos and the results are the same.
Threading seems to work - the workload of my machine is proportional to the number of threads configured. As far as I understand the VTune result a significant amount of time is spend in the OpenMP code?
I've attached a snippet tracking.cpp containing the inititalization code and the implementation of the track function.
Best regards,
Rasmus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seem we can't expect the OpenMP thread always bring benefits for a speical application. especially,your application use OpenM threads intermittently.Twoquick points,
1) you mentioned staic ipp link. Do you link the threaded ipp (ipp*_t.lib)or serial static ipp (ipp*_l.lib)?
2) It is good to reuse the Pyramid. it is possible to reuse the "state" for all of frames, as they are should be same in each operation, thussave the repeated operation?
For example, the below functions should be call one timeacross the whole processing if image size remain unchanged;
IppiPyramid* pyr = NULL;
stat = ippiPyramidInitAlloc(&pyr, maxLevel, roi, rate);
if (stat != ippStsNoErr)
{
USES_CONVERSION;
VGDebug(L"function %s status %s\n", L"ippiPyramidInitAlloc", A2T(ippGetStatusString(stat)));
return NULL;
}
IppiPyramidDownState_8u_C1R** pyrState = (IppiPyramidDownState_8u_C1R**) &(pyr->pState);
Ipp8u** pyr0 = pyr->pImage;
int* pStep = pyr->pStep;
IppiSize* pRoi = pyr->pRoi;
ippiPyramidLayerDownInitAlloc_8u_C1R(pyrState, roi, rate, kernel, _countof(kernel), IPPI_INTER_LINEAR);
pyr0[0] = (Ipp8u*) img.data;
pStep[0] = (int) img.step;
pRoi[0] = roi;
for (int i = 1; i <= maxLevel; ++i)
{
pyr->pImage = ippiMalloc_8u_C1(pRoi.width, pRoi.height, pStep + i);
and
stat = ippiOpticalFlowPyrLKInitAlloc_8u_C1R(&of, roi, winSize.width, hint);
and all free options.
Best Regards
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How about the resultif use the serial library?
Or could you pleaseprovide arunable and comparable smalltest caseso we can evaluate what the problem (including some basic info: like opencv version.).Youcanattach it by private if confidence.
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I extracted the tracking part and generated a small demo program. The demo has a OpenCV and an IPP tracker.
KLTTest.cpp
Please note, that the feature extracting is a little bit different in the original program.
Environment: OpenCV 2.3.1, Visual Studio 2008, Release|64bit, Unicode, IPP supportadded to the project using the Intel Composer.
Best regards,
Rasmus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Iescalate your problem to IPP engineer team.
your current resultlooks true as i read from OpenCV website:
the version 2.3.1 (August, 2011) it have the claim.
Optimization
- Performance of the sparse Lucas-Kanade optical flow has been greatly improved. On 4-core machine it is now 9x faster than the previous version.
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for the info. If you engineer team needs any further information feel free to contact me.
Best Regards,
Rasmus
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page