Community
cancel
Showing results for 
Search instead for 
Did you mean: 
71 Views

Evaluating DMIP

Hi all,

I am working on vision algorithms, using IPP 5.3 so far, and now we start evaluating IPP 6.0, to try and squeeze some additional perf boost using the new DMIP.

Well, I have played with it a bit, by simply implementing a simple edge detection algorithm based on Sobel.
I compared the new DMIP implementation perf to the "standard" IPP implementation perf when computing edges of relatively large matrix (11MB). Indeed a perf boost of ~ X3 was achieved on a Core2 Duo T7300@2GHz CPU. Impressive.

But 2 things still bother me:
1. It appears that although the IPP-implementation uses ~90% CPU, the DMIP-implementation uses ~50% - suggesting that DMIP is only utilizing a single core. Any comments???
2. I would like to see if indeed the perf boost in caused due to CPU cache fault reduction in the DMIP (the whole idea behind DMIP). How can I validate this on Win platform?

Cheers
Hagay
0 Kudos
4 Replies
Vladimir_Dudnik
Employee
71 Views


Could you please provide your test case for us to investigate 50% cpu usage ussue?

You may use VTune to measure an average cache misses count to see the DMIP advantages

Regards,
Vladimir
71 Views

Hi Vladimir,

Thanks for the response.
The code snippet is below, please note: (1) It is in C++/CLI (2) The input and output matrices are allocated on managed heap (3) The Matrix generic datatype is a T one-dimensional array (4) The code works as expected

About VTunes - thanks, I'll check it.

Hagay

Code snippet:
[cpp]Matrix ^EdgeDetectDmip::DetectEdges(Matrix ^input)
{
	Matrix^ output = gcnew Matrix(input->Size);

	IppiSize roi = {input->Size.Width, input->Size.Height};
	pin_ptr inData = &(input->GetBuffer()[0]);
	pin_ptr outData = &(output->GetBuffer()[0]);

	IppDataType dataType = ipp8u;
	IppChannels channels = ippC1;

	Image A(inData, dataType, channels, roi, roi.width);
	Image D(outData, dataType, channels, roi, roi.width);
	Kernel KH(idmFilterSobelHoriz);
	Kernel KV(idmFilterSobelVert);
	
	Graph O;
	O=To32f(A);
	D=MaxVal(To8u(Sqrt(Sqr(O*KH)+Sqr(O*KV))), 150, 0);

	return output;
}[/cpp]

71 Views

Quoting - Vladimir Dudnik

Could you please provide your test case for us to investigate 50% cpu usage ussue?

You may use VTune to measure an average cache misses count to see the DMIP advantages

Regards,
Vladimir

Hi Vladimir,

I have supplied the code snippet, could you please go over it and update me?
Also, it appears the problem is in some OMP failure.

Running the following code, it appears that DMIP can not use OMP for some reason:
int i = Control::GetMaxNumThread(); // i get the value 1, although this is a dual core CPU
DMIP::idmStatus stat = Control::SetNumThread(2); // stat is -4, meaning OMP error

However, IPP is using the two cores without a problem:
int cores = ippGetNumCoresOnDie(); // returns 2
int numThreads;
IppStatus status1 = ippGetNumThreads(&numThreads); // returns 2

Any explanations/solutions?

Thanks, Hagay
Vladimir_Dudnik
Employee
71 Views

Hi Hagay,


we've found issue in IPP 6.0 DMIP DLL which do not allow to set number of threads more than 1. This will be fixed in the next version. Please submit bug report to Intel Premier Support, so you will be notified when solution will be ready.

Regards,
Vladimir
Reply