- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am working on vision algorithms, using IPP 5.3 so far, and now we start evaluating IPP 6.0, to try and squeeze some additional perf boost using the new DMIP.
Well, I have played with it a bit, by simply implementing a simple edge detection algorithm based on Sobel.
I compared the new DMIP implementation perf to the "standard" IPP implementation perf when computing edges of relatively large matrix (11MB). Indeed a perf boost of ~ X3 was achieved on a Core2 Duo T7300@2GHz CPU. Impressive.
But 2 things still bother me:
1. It appears that although the IPP-implementation uses ~90% CPU, the DMIP-implementation uses ~50% - suggesting that DMIP is only utilizing a single core. Any comments???
2. I would like to see if indeed the perf boost in caused due to CPU cache fault reduction in the DMIP (the whole idea behind DMIP). How can I validate this on Win platform?
Cheers
Hagay
I am working on vision algorithms, using IPP 5.3 so far, and now we start evaluating IPP 6.0, to try and squeeze some additional perf boost using the new DMIP.
Well, I have played with it a bit, by simply implementing a simple edge detection algorithm based on Sobel.
I compared the new DMIP implementation perf to the "standard" IPP implementation perf when computing edges of relatively large matrix (11MB). Indeed a perf boost of ~ X3 was achieved on a Core2 Duo T7300@2GHz CPU. Impressive.
But 2 things still bother me:
1. It appears that although the IPP-implementation uses ~90% CPU, the DMIP-implementation uses ~50% - suggesting that DMIP is only utilizing a single core. Any comments???
2. I would like to see if indeed the perf boost in caused due to CPU cache fault reduction in the DMIP (the whole idea behind DMIP). How can I validate this on Win platform?
Cheers
Hagay
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please provide your test case for us to investigate 50% cpu usage ussue?
You may use VTune to measure an average cache misses count to see the DMIP advantages
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vladimir,
Thanks for the response.
The code snippet is below, please note: (1) It is in C++/CLI (2) The input and output matrices are allocated on managed heap (3) The Matrix generic datatype is a T one-dimensional array (4) The code works as expected
About VTunes - thanks, I'll check it.
Hagay
Code snippet:
Thanks for the response.
The code snippet is below, please note: (1) It is in C++/CLI (2) The input and output matrices are allocated on managed heap (3) The Matrix generic datatype is a T one-dimensional array (4) The code works as expected
About VTunes - thanks, I'll check it.
Hagay
Code snippet:
[cpp]Matrix^EdgeDetectDmip::DetectEdges(Matrix ^input) { Matrix ^ output = gcnew Matrix (input->Size); IppiSize roi = {input->Size.Width, input->Size.Height}; pin_ptr inData = &(input->GetBuffer()[0]); pin_ptr outData = &(output->GetBuffer()[0]); IppDataType dataType = ipp8u; IppChannels channels = ippC1; Image A(inData, dataType, channels, roi, roi.width); Image D(outData, dataType, channels, roi, roi.width); Kernel KH(idmFilterSobelHoriz); Kernel KV(idmFilterSobelVert); Graph O; O=To32f(A); D=MaxVal(To8u(Sqrt(Sqr(O*KH)+Sqr(O*KV))), 150, 0); return output; }[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Vladimir Dudnik
Could you please provide your test case for us to investigate 50% cpu usage ussue?
You may use VTune to measure an average cache misses count to see the DMIP advantages
Regards,
Vladimir
Hi Vladimir,
I have supplied the code snippet, could you please go over it and update me?
Also, it appears the problem is in some OMP failure.
Running the following code, it appears that DMIP can not use OMP for some reason:
int i = Control::GetMaxNumThread(); // i get the value 1, although this is a dual core CPU
DMIP::idmStatus stat = Control::SetNumThread(2); // stat is -4, meaning OMP error
However, IPP is using the two cores without a problem:
int cores = ippGetNumCoresOnDie(); // returns 2
int numThreads;
IppStatus status1 = ippGetNumThreads(&numThreads); // returns 2
Any explanations/solutions?
Thanks, Hagay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hagay,
we've found issue in IPP 6.0 DMIP DLL which do not allow to set number of threads more than 1. This will be fixed in the next version. Please submit bug report to Intel Premier Support, so you will be notified when solution will be ready.
Regards,
Vladimir
we've found issue in IPP 6.0 DMIP DLL which do not allow to set number of threads more than 1. This will be fixed in the next version. Please submit bug report to Intel Premier Support, so you will be notified when solution will be ready.
Regards,
Vladimir
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page