- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
is there any description available that describes how to use single threaded IPP functions in a multithreaded environment?
'Simple' functions such as ippiHSVToRGB_8u_C3R that do not use buffers, specs or borders are no problem. Each thread works on its own roi. But what about roi sizes (cpu caches, memory alignment and so on).
And how do I use filters with borders (ippiFilterScharrHorizMaskBorder_8u16s_C1R) , or fft with a spec (ippiFFTFwd_RToPack_32f_C1R), or others using a buffer (ippiFastMarching_8u32f_C1R)?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Can you a check on the following IPP sample, it provides some example showing how to calling IPP function with external threading:
ipp\examples\ipp-examples.zip
The examples\ipp-examples\examples\ipp_thread provide the example code for filter and other functions. The document folder at that example has the steps to build and run the sample code.
Thanks,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
WHat are the border flags ippBorderInMemTop and ippBorderInMemBottom used for?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In function ippiFilterScharrHorizMaskBorderGetBufferSize_mt of your example:
what's the logic of:
bufsize = (bufsize + 63) & ~63;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
steffenroeber wrote:
In function ippiFilterScharrHorizMaskBorderGetBufferSize_mt of your example:
what's the logic of:
bufsize = (bufsize + 63) & ~63;
Hi,
It's finding nearest value greater than bufsize and divisible by 64. Like "bufsize += 64 - bufsize % 64;"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Wrong question. Why must the bufSize divisible by 64?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
and why is bufsize *= mxnthr; bufSize is already for complete image roi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1.
WHat are the border flags ippBorderInMemTop and ippBorderInMemBottom used for?
In this example source image is cut into stripes. In top stripe (which is processed by thread #0) bottom border is real pixels, so border type must be ippBorderInMemBottom. Vice versa in bottom stripe (which is processed by thread #(max_nums_thread-1)) top border is real pixels, so border type for this stripe must be ippBorderInMemTop. For this reason for other stripes border type must be ippBorderInMemTop+ippBorderInMemBottom.
2.
Why must the bufSize divisible by 64?
For every thread separate buffer is needed. This bufSize defines buffer size for one thread. If bufSize is divisible by 64 then buffers for all threads will be aligned by 64 (it is important for performance).
3.
why is bufsize *= mxnthr; bufSize is already for complete image roi
Yes, there is mistake. Thanks for notice.
Below the code is better: For this case there is memory saving.
void ippiFilterScharrHorizMaskBorderGetBufferSize_mt(IppiSize dstRoiSize, IppiMaskSize mask, IppDataType srcDataType, IppDataType dstDataType, int numChannels, int *pBufferSize, int *bufStep, int *numthr)
{
int bufsize;
int mxnthr = 1;
if (*numthr <= 1) {
ippiFilterScharrHorizMaskBorderGetBufferSize(dstRoiSize, mask, srcDataType, dstDataType, numChannels, &bufsize);
*bufStep = bufsize;
} else {
int hd;
int hr;
mxnthr = omp_get_max_threads();
if (mxnthr > *numthr) {
mxnthr = *numthr;
omp_set_num_threads(mxnthr);
}
hd = dstRoiSize.height / mxnthr;
hr = dstRoiSize.height % mxnthr;
dstRoiSize.height = hd + hr;
ippiFilterScharrHorizMaskBorderGetBufferSize(dstRoiSize, mask, srcDataType, dstDataType, numChannels, &bufsize);
bufsize = (bufsize + 63) & ~63;
*bufStep = bufsize;
bufsize *= mxnthr;
}
*pBufferSize = bufsize;
*numthr = mxnthr;
}
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for that explanation. Now it works.
But next questions:
What about linear transformations? For example: ippiFFTInit_R_32f. There are
IppiFFTSpec_R_32f* pFFTSpec
Ipp8u* pMemInit
can that be shared in threads?
Is it possible to parallelize ippiHoughLine_Region_8u32f_C1R and similars?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can't share a single pMemInit buffer in multiple threads, but you can share FFTSpec.
If you need to run the same type and size of FFT in multiple threads, you can initialize FFTSpec only once, and then free pMemInit after initialization. Once you initialize FFTSpec (IppiFFTSpec_R_32f), you can share it between multiple threads since it is not modified by ippiFFT processing functions. But you will need a separate work buffer (pBuffer) for each ippiFFT function running in its own thread.
You can find an example for ippiFFTInit_R_32f here:
https://software.intel.com/en-us/node/504249
Best regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What about the ippiHoughLine_Region_8u32f_C1R?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
steffenroeber wrote:
Hi Steffen.
is there any description available that describes how to use single threaded IPP functions in a multithreaded environment?
I am attaching example of threading morphology functions. You can compile it and run to undestand how it works.
Thanks for using IPP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
steffenroeber wrote:
What about the ippiHoughLine_Region_8u32f_C1R?
The result of this function is sorted list of lines. You can split region by angles and process them in parallel. But you need to keep in mind following:
For example single threaded version returns 10 lines sorted by weight from whole image.
In parallel mode every thread returns 10 lines too so total number of lines is 10*(N of threads)). After finishing multi-threaded version you need to analyze these 50 lines and select the first 10 strongest lines. They will be equal the result of single-threaded function. For example you can calculate number of pixels at every returned line. Of course it is overhead but unfortunately current API does not provide infomation about weight of line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wjat do you mean with "split region by angles"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
steffenroeber wrote:
Wjat do you mean with "split region by angles"?
Look please at description of function in manual
IppStatus ippiHoughLine_Region_8u32f_C1R(const Ipp8u* pSrc, int srcStep, IppiSize roiSize, IppPointPolar* pLine, IppPointPolar dstRoi[2], int maxLineCount, int*pLineCount, IppPointPolar delta, int threshold, Ipp8u* pBuffer);
"dstRoi Specifies the range of parameters of straight lines to be detected." It means that function return only lines which have angles from dstRoi[0].theta to dstRoi[1].theta. For multithreaded version you can split region by N parts with angle step (dstRoi[1].theta-dstRoi[0].theta)/N and call every thread with its own dstRoi parameter. The code could be:
deltaTheta = (dstRoiST[1].theta-dstRoiST[0].theta)/N;
for(n=0;n<N;n++){
dstRoiMT
dstRoiMT
dstRoiMT
dstRoiMT
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. This function alwo works. Now next one: ippiHoughLine_8u32f_C1R
Here I have a roi. Can I use that for parallelization?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
steffenroeber wrote:
Ok. This function alwo works. Now next one: ippiHoughLine_8u32f_C1R
Here I have a roi. Can I use that for parallelization?
Sorry, but I don't undestand question
You cannot parallelize ippiHoughLine_8u32f_C1R and ippiHoughLine_Region_8u32f_C1R by splitting on tiles in roi. The both function use pixels of whole image so for correct parallelization you can split Hough space only. Therefore ippiHoughLine_8u32f_C1R cannot be parallelized because it does not have API for splitting Hough space. But you can replace ippiHoughLine_8u32f_C1R with ippiHoughLine_Region_8u32f_C1R with lines from diapasone [0..2PI]

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page