Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Evaluating DMIP

hagay_lupeskoinscan-
340 Views
Hi all,

I am working on vision algorithms, using IPP 5.3 so far, and now we start evaluating IPP 6.0, to try and squeeze some additional perf boost using the new DMIP.

Well, I have played with it a bit, by simply implementing a simple edge detection algorithm based on Sobel.
I compared the new DMIP implementation perf to the "standard" IPP implementation perf when computing edges of relatively large matrix (11MB). Indeed a perf boost of ~ X3 was achieved on a Core2 Duo T7300@2GHz CPU. Impressive.

But 2 things still bother me:
1. It appears that although the IPP-implementation uses ~90% CPU, the DMIP-implementation uses ~50% - suggesting that DMIP is only utilizing a single core. Any comments???
2. I would like to see if indeed the perf boost in caused due to CPU cache fault reduction in the DMIP (the whole idea behind DMIP). How can I validate this on Win platform?

Cheers
Hagay
0 Kudos
4 Replies
Vladimir_Dudnik
Employee
340 Views

Could you please provide your test case for us to investigate 50% cpu usage ussue?

You may use VTune to measure an average cache misses count to see the DMIP advantages

Regards,
Vladimir
0 Kudos
hagay_lupeskoinscan-
340 Views
Hi Vladimir,

Thanks for the response.
The code snippet is below, please note: (1) It is in C++/CLI (2) The input and output matrices are allocated on managed heap (3) The Matrix generic datatype is a T one-dimensional array (4) The code works as expected

About VTunes - thanks, I'll check it.

Hagay

Code snippet:
[cpp]Matrix ^EdgeDetectDmip::DetectEdges(Matrix ^input)
{
	Matrix^ output = gcnew Matrix(input->Size);

	IppiSize roi = {input->Size.Width, input->Size.Height};
	pin_ptr inData = &(input->GetBuffer()[0]);
	pin_ptr outData = &(output->GetBuffer()[0]);

	IppDataType dataType = ipp8u;
	IppChannels channels = ippC1;

	Image A(inData, dataType, channels, roi, roi.width);
	Image D(outData, dataType, channels, roi, roi.width);
	Kernel KH(idmFilterSobelHoriz);
	Kernel KV(idmFilterSobelVert);
	
	Graph O;
	O=To32f(A);
	D=MaxVal(To8u(Sqrt(Sqr(O*KH)+Sqr(O*KV))), 150, 0);

	return output;
}[/cpp]

0 Kudos
hagay_lupeskoinscan-
340 Views
Quoting - Vladimir Dudnik

Could you please provide your test case for us to investigate 50% cpu usage ussue?

You may use VTune to measure an average cache misses count to see the DMIP advantages

Regards,
Vladimir

Hi Vladimir,

I have supplied the code snippet, could you please go over it and update me?
Also, it appears the problem is in some OMP failure.

Running the following code, it appears that DMIP can not use OMP for some reason:
int i = Control::GetMaxNumThread(); // i get the value 1, although this is a dual core CPU
DMIP::idmStatus stat = Control::SetNumThread(2); // stat is -4, meaning OMP error

However, IPP is using the two cores without a problem:
int cores = ippGetNumCoresOnDie(); // returns 2
int numThreads;
IppStatus status1 = ippGetNumThreads(&numThreads); // returns 2

Any explanations/solutions?

Thanks, Hagay
0 Kudos
Vladimir_Dudnik
Employee
340 Views
Hi Hagay,


we've found issue in IPP 6.0 DMIP DLL which do not allow to set number of threads more than 1. This will be fixed in the next version. Please submit bug report to Intel Premier Support, so you will be notified when solution will be ready.

Regards,
Vladimir
0 Kudos
Reply