Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Performance loss between IPP3.0 and IPP6.1

rohwedder
Beginner
1,392 Views
Hi,

we had the need to switch to WindowsXP-64Bit because our software needs too much memory for a WindowsXP-32Bit OS (huge images for computer vision). On Windows-32bit we were using IPP3.0. Therefore we had to upgrade because IPP3.0 is not available for 64Bit, so now we use IPP6.1. Additionally we were hoping for a small performance boost as side effect of the new optimizations (e.g. multithreading, since our system is a dual core system) in the new version, but in the contrary! We encountered cases in which the old IPP3.0 was faster than the new one and this effect got even worse when we used ippSetNumThreads(1) to limit computation to only one thread just as IPP3.0 already did. IPP6.1 was more than two times slower than the old IPP3.0.
Example:
ippiFilterMedian_16s_C1R(...) with the following parameters
srcStep=4090
dstStep=4050
dstRoiSize={ width=2025 height=6337 }
maskSize={ width=21 height=1 }
anchor={ x=0 y=0 }
Size of source image={ width=2045 height=6337 }
Image Content=Random noise created by a randomizer

Our test computer:
Intel Core 2 Duo CPU (E6750 @ 2,66GHz) with 8GB of RAM on Windows XP 64Bit

Does anyone have an idea why the older version is slower and is there a solution for our problem?

Thank you,

Rohwedder AG
0 Kudos
22 Replies
Emmanuel_W_
New Contributor I
1,304 Views
Quoting - rohwedder
Hi,

we had the need to switch to WindowsXP-64Bit because our software needs too much memory for a WindowsXP-32Bit OS (huge images for computer vision). On Windows-32bit we were using IPP3.0. Therefore we had to upgrade because IPP3.0 is not available for 64Bit, so now we use IPP6.1. Additionally we were hoping for a small performance boost as side effect of the new optimizations (e.g. multithreading, since our system is a dual core system) in the new version, but in the contrary! We encountered cases in which the old IPP3.0 was faster than the new one and this effect got even worse when we used ippSetNumThreads(1) to limit computation to only one thread just as IPP3.0 already did. IPP6.1 was more than two times slower than the old IPP3.0.
Example:
ippiFilterMedian_16s_C1R(...) with the following parameters
srcStep=4090
dstStep=4050
dstRoiSize={ width=2025 height=6337 }
maskSize={ width=21 height=1 }
anchor={ x=0 y=0 }
Size of source image={ width=2045 height=6337 }
Image Content=Random noise created by a randomizer

Our test computer:
Intel Core 2 Duo CPU (E6750 @ 2,66GHz) with 8GB of RAM on Windows XP 64Bit

Does anyone have an idea why the older version is slower and is there a solution for our problem?

Thank you,

Rohwedder AG

Did you validate that IPP is initialized correctly and actually using optimize code for your processor?

Emmanuel
0 Kudos
Vladimir_Dudnik
Employee
1,304 Views
Yeah, that seems like some odd results. Could you please check what optimized code was dispatched by IPP 6.1? The difference in performance is similar to the usual difference between generic C code (PX libraries or MX libraries in case of EM64T architecture) and SSE optimized libraries (W7, T7, V8, P8for 32-bit and M7, N8, U8 for EM64T architecture).

Is not this a case where you link with static libraries but do not call ippStaticInit function?

Regards,
Vladimir
0 Kudos
rohwedder
Beginner
1,304 Views
Yeah, that seems like some odd results. Could you please check what optimized code was dispatched by IPP 6.1? The difference in performance is similar to the usual difference between generic C code (PX libraries or MX libraries in case of EM64T architecture) and SSE optimized libraries (W7, T7, V8, P8for 32-bit and M7, N8, U8 for EM64T architecture).

Is not this a case where you link with static libraries but do not call ippStaticInit function?

Regards,
Vladimir

We link the files from the stublib directory, which should be the dynamic libraries. We haven't used the ippStaticInit function so far, but I guess that it should only be used for static libraries anyway, right? Of course I would like to find out which DLLs are beeing used and if they are the optimized versions, but I don't know how. Suggestions?

Thanks
0 Kudos
kdiamond
Beginner
1,304 Views
Quoting - rohwedder

We link the files from the stublib directory, which should be the dynamic libraries. We haven't used the ippStaticInit function so far, but I guess that it should only be used for static libraries anyway, right? Of course I would like to find out which DLLs are beeing used and if they are the optimized versions, but I don't know how. Suggestions?

Thanks

Run you application in debug mode from the IDE you use (VS ?) and look what actual DLLs are being loaded-all common IDEs show that info.
0 Kudos
rohwedder
Beginner
1,304 Views
Quoting - kdiamond

Run you application in debug mode from the IDE you use (VS ?) and look what actual DLLs are being loaded-all common IDEs show that info.

The follwing DLLs are loaded by our application 64Bit version using IPP6.1:
ntdll.dll, mscoree.dll, KERNEL32.dll,
advapi32.dll, RPCRT4.dll, Secur32.dll,
MSVCR80D.dll, msvcrt.dll,
ippiem64t-6.1.dll, ippcoreem64t-6.1.dll,
libiomp5md.dll, USER32.dll, GDI32.dll,
msvcm80d.dll, ole32.dll, ippiu8-6.1.dll,
SHLWAPI.dll, mscorwks.dll, MSVCR80.dll,
shell32.dll, comctl32.dll, mscorlib.ni.dll,
mscorjit.dll, diasymreader.dll, rsaenh.dll,
PSAPI.DLL, System.ni.dll

This means that the processor code u8 (New Optimizations for 64-bit applications on Intel Core 2 and Intel Xeon 5100 Processors) seems to be used, which should be the right one, isn't it?

So again: Why is our 64Bit IPP6.1 (running on 2 cores) slower than the old 32Bit IPP3.0 (running on 1 core)???

Would it help if I provide you the source code? If yes, how do you want to have it?

Thanks
0 Kudos
Vladimir_Dudnik
Employee
1,304 Views
Hello,

we surely would like to get a test case for that issue. From what you described above everything seems to be done in right way. That means now is our step to take a look into the problem.

Regards,
Vladimir
0 Kudos
rohwedder
Beginner
1,304 Views
Hello,

we surely would like to get a test case for that issue. From what you described above everything seems to be done in right way. That means now is our step to take a look into the problem.

Regards,
Vladimir

The observed timings are:

IPP 3.0:
32Bit 1 Thread: 606 ms
32Bit 2 Threads: not supported
64Bit 1 Thread: not supported
64Bit 2 Threads: not supported

IPP 6.1:
32Bit 1 Thread: 1182 ms
32Bit 2 Threads: 596 ms
64Bit 1 Thread: 1297 ms
64Bit 2 Threads: 655 ms

As can be seen 64Bit IPP6.1 is slower than 32Bit IPP3.0. Only 32Bit IPP6.1 with 2 threads is slightly faster than 32Bit IPP3.0 but even then by using 2 threads I would expect a more significant performance boost.

These times were produced by the following code (C++/CLI; MS VS2005 SP1; Release) and the CPU and computer mentioned earlier in this discussion:

[cpp]#include "memory.h"
#include "ippi.h"
#include "ipps.h"

#ifdef IPP61
#include "ippcore.h"
#endif

#ifdef IPP61
#define THREADTESTS 2
#else
#define THREADTESTS 1
#endif

using namespace System;
using namespace System::Diagnostics;

int main(array<:STRING> ^args)
{
    int iSrcWidth	 = 2045;
    int iSrcHeight	 = 6337;
    int iFilterHalfSizeX = 10;
    int iFilterHalfSizeY = 0;
    int iDstWidth	 = iSrcWidth - iFilterHalfSizeX * 2;
    int iDstHeight	 = iSrcHeight - iFilterHalfSizeY * 2;
    int iFilterSizeX	 = iFilterHalfSizeX * 2 + 1;
    int iFilterSizeY	 = iFilterHalfSizeY * 2 + 1;

    short *pSrcData = 0;
    short *pDstData = 0;
    try
    {
        pSrcData = new short[iSrcWidth * iSrcHeight];
        memset(	pSrcData,
                0,
                iSrcWidth *
                iSrcHeight *
                sizeof(unsigned short));

        Random random(0);
        for(int iY = 0; iY < iSrcHeight; iY++)
        {
            for(int iX = 0; iX < iSrcWidth; iX++)
            {
                pSrcData[iX + iY * iSrcWidth] = (short)(random.Next());
            }
        }


        pDstData = new short[iDstWidth * iDstHeight];

        for(int i = 1; i <= THREADTESTS; i++)
        {
#ifdef IPP61
            ippSetNumThreads(i);
#endif
            Console::WriteLine("Threads: " + i.ToString());

            for(int j = 0; j < 10; j++)
            {
                memset(	pDstData,
                        0,
                        iDstWidth *
                        iDstHeight *
                        sizeof(short));

                
                IppiSize ippiMask = { iFilterSizeX,
                                      iFilterSizeY};

                IppiPoint ippiPoint = { 0,
                                        0};

                IppiSize ippiSize = {   iDstWidth,
                                        iDstHeight};

                Stopwatch ^tStopwatch = gcnew Stopwatch();
                tStopwatch->Start();
                
                ippiFilterMedian_16s_C1R( (Ipp16s*)pSrcData,
                                          iSrcWidth * sizeof(short),
                                          (Ipp16s*)pDstData,
                                          iDstWidth * sizeof(short),
                                          ippiSize,
                                          ippiMask,
                                          ippiPoint);

                tStopwatch->Stop();

                Console::WriteLine(  tStopwatch->ElapsedMilliseconds +
                                    " ms");
            }
        }
    }
    finally
    {
        if(0 != pSrcData)
        {
            delete [] pSrcData;
            pSrcData = 0;
        }

        if(0 != pDstData)
        {
            delete [] pDstData;
            pDstData = 0;
        }
    }

    Console::WriteLine("Press any key to continue . . .");
    Console::ReadKey();

    return 0;
}
[/cpp]



Thanks for your help.
0 Kudos
Emmanuel_W_
New Contributor I
1,304 Views
Quoting - rohwedder

The observed timings are:

IPP 3.0:
32Bit 1 Thread: 606 ms
32Bit 2 Threads: not supported
64Bit 1 Thread: not supported
64Bit 2 Threads: not supported

IPP 6.1:
32Bit 1 Thread: 1182 ms
32Bit 2 Threads: 596 ms
64Bit 1 Thread: 1297 ms
64Bit 2 Threads: 655 ms

As can be seen 64Bit IPP6.1 is slower than 32Bit IPP3.0. Only 32Bit IPP6.1 with 2 threads is slightly faster than 32Bit IPP3.0 but even then by using 2 threads I would expect a more significant performance boost.

These times were produced by the following code (C++/CLI; MS VS2005 SP1; Release) and the CPU and computer mentioned earlier in this discussion:

[cpp]#include "memory.h"
#include "ippi.h"
#include "ipps.h"

#ifdef IPP61
#include "ippcore.h"
#endif

#ifdef IPP61
#define THREADTESTS 2
#else
#define THREADTESTS 1
#endif

using namespace System;
using namespace System::Diagnostics;

int main(array<:STRING> ^args)
{
    int iSrcWidth	 = 2045;
    int iSrcHeight	 = 6337;
    int iFilterHalfSizeX = 10;
    int iFilterHalfSizeY = 0;
    int iDstWidth	 = iSrcWidth - iFilterHalfSizeX * 2;
    int iDstHeight	 = iSrcHeight - iFilterHalfSizeY * 2;
    int iFilterSizeX	 = iFilterHalfSizeX * 2 + 1;
    int iFilterSizeY	 = iFilterHalfSizeY * 2 + 1;

    short *pSrcData = 0;
    short *pDstData = 0;
    try
    {
        pSrcData = new short[iSrcWidth * iSrcHeight];
        memset(	pSrcData,
                0,
                iSrcWidth *
                iSrcHeight *
                sizeof(unsigned short));

        Random random(0);
        for(int iY = 0; iY < iSrcHeight; iY++)
        {
            for(int iX = 0; iX < iSrcWidth; iX++)
            {
                pSrcData[iX + iY * iSrcWidth] = (short)(random.Next());
            }
        }


        pDstData = new short[iDstWidth * iDstHeight];

        for(int i = 1; i <= THREADTESTS; i++)
        {
#ifdef IPP61
            ippSetNumThreads(i);
#endif
            Console::WriteLine("Threads: " + i.ToString());

            for(int j = 0; j < 10; j++)
            {
                memset(	pDstData,
                        0,
                        iDstWidth *
                        iDstHeight *
                        sizeof(short));

                
                IppiSize ippiMask = { iFilterSizeX,
                                      iFilterSizeY};

                IppiPoint ippiPoint = { 0,
                                        0};

                IppiSize ippiSize = {   iDstWidth,
                                        iDstHeight};

                Stopwatch ^tStopwatch = gcnew Stopwatch();
                tStopwatch->Start();
                
                ippiFilterMedian_16s_C1R( (Ipp16s*)pSrcData,
                                          iSrcWidth * sizeof(short),
                                          (Ipp16s*)pDstData,
                                          iDstWidth * sizeof(short),
                                          ippiSize,
                                          ippiMask,
                                          ippiPoint);

                tStopwatch->Stop();

                Console::WriteLine(  tStopwatch->ElapsedMilliseconds +
                                    " ms");
            }
        }
    }
    finally
    {
        if(0 != pSrcData)
        {
            delete [] pSrcData;
            pSrcData = 0;
        }

        if(0 != pDstData)
        {
            delete [] pDstData;
            pDstData = 0;
        }
    }

    Console::WriteLine("Press any key to continue . . .");
    Console::ReadKey();

    return 0;
}
[/cpp]



Thanks for your help.

Hi,

I didn't try to run your sample so I might be off but you should try to allocate to allocate memory using the IppiAlloc function or unsure that the memory is aligned on 16 byte boundaries. This sometimes have a huge impact on performance.

Emmanuel
0 Kudos
rohwedder
Beginner
1,304 Views
Quoting - eweber

Hi,

I didn't try to run your sample so I might be off but you should try to allocate to allocate memory using the IppiAlloc function or unsure that the memory is aligned on 16 byte boundaries. This sometimes have a huge impact on performance.

Emmanuel

I tried your hint but it didn't change anything. All timings are equal to the ones I mentioned earlier in this discussion. The updated code is the following:

[cpp]#include "memory.h"
#include "ippi.h"
#include "ipps.h"

#ifdef IPP61
#include "ippcore.h"
#endif

#ifdef IPP61
#define THREADTESTS 2
#else
#define THREADTESTS 1
#endif

using namespace System;
using namespace System::Diagnostics;

int main(array<:STRING> ^args)
{
	unsigned char *pSrcData = 0;
	unsigned char *pDstData = 0;
	try
	{
		int iSrcWidth			= 2045;
		int iSrcHeight			= 6337;
		int	iSrcPitch			= 0;
		int iFilterHalfSizeX	= 10;
		int iFilterHalfSizeY	= 0;
		int iDstWidth			= iSrcWidth - iFilterHalfSizeX * 2;
		int iDstHeight			= iSrcHeight - iFilterHalfSizeY * 2;
		int	iDstPitch			= 0;
		int iFilterSizeX		= iFilterHalfSizeX * 2 + 1;
		int iFilterSizeY		= iFilterHalfSizeY * 2 + 1;

		pSrcData = (unsigned char*)ippiMalloc_16s_C1(iSrcWidth, iSrcHeight, &iSrcPitch);

		memset(	pSrcData,
				0,
				iSrcPitch *
				iSrcHeight);

		Random random(0);
		for(int iY = 0; iY < iSrcHeight; iY++)
		{
			for(int iX = 0; iX < iSrcWidth; iX++)
			{
				short *pData = (short*)(pSrcData + iX * sizeof(short) + iY * iSrcPitch);
				*pData = (short)(random.Next());
			}
		}

		pDstData = (unsigned char*)ippiMalloc_16s_C1(iDstWidth, iDstHeight, &iDstPitch);

		for(int i = 1; i <= THREADTESTS; i++)
		{
#ifdef IPP61
			ippSetNumThreads(i);
#endif
			Console::WriteLine("Threads: " + i.ToString());

			for(int j = 0; j < 10; j++)
			{
				memset(	pDstData,
						0,
						iDstPitch *
						iDstHeight);
				
				IppiSize ippiMask = {	iFilterSizeX,
										iFilterSizeY};
					
				IppiPoint ippiPoint = {	0,
										0};

				IppiSize ippiSize = {	iDstWidth,
										iDstHeight};

				Stopwatch ^tStopwatch = gcnew Stopwatch();
				tStopwatch->Start();

				ippiFilterMedian_16s_C1R(	(Ipp16s*)pSrcData,
											iSrcPitch,
											(Ipp16s*)pDstData,
											iDstPitch,
											ippiSize,
											ippiMask,
											ippiPoint);

				tStopwatch->Stop();

				Console::WriteLine(	tStopwatch->ElapsedMilliseconds +
									" ms");
			}
		}
	}
	finally
	{
		if(0 != pSrcData)
		{
			ippiFree(pSrcData);
			pSrcData = 0;
		}

		if(0 != pDstData)
		{
			ippiFree(pDstData);
			pDstData = 0;
		}
	}

	Console::WriteLine("Press any key to continue . . .");
	Console::ReadKey();

    return 0;
}[/cpp]

Thank you anyway
0 Kudos
Chao_Y_Intel
Moderator
1,304 Views

Hi,

This code does not look to consider the image border. For filter functions, IPP also assume adjacent border pixels also exist. Check here for more information:

http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-processing-an-image-from-edge-to-edge/


This will make the test will use some uninitialized memory, and make some errors. See if it works after fixing this problem.

Thanks,
Chao

0 Kudos
rohwedder
Beginner
1,304 Views
Quoting - Chao Y (Intel)

Hi,

This code does not look to consider the image border. For filter functions, IPP also assume adjacent border pixels also exist. Check here for more information:

http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-processing-an-image-from-edge-to-edge/


This will make the test will use some uninitialized memory, and make some errors. See if it works after fixing this problem.

Thanks,
Chao


I don't see why I don't consider the image borders. The destination image is smaller than the source image which should be sufficient.

The according code snippet:

[cpp]int iSrcWidth    = 2045;  
int iSrcHeight   = 6337;  
int iFilterHalfSizeX = 10;  
int iFilterHalfSizeY = 0;  
int iDstWidth    = iSrcWidth - iFilterHalfSizeX * 2;  
int iDstHeight   = iSrcHeight - iFilterHalfSizeY * 2;  
int iFilterSizeX     = iFilterHalfSizeX * 2 + 1;  
int iFilterSizeY     = iFilterHalfSizeY * 2 + 1;  [/cpp]

If you still think that it is wrong, please let me know how it is being done correctly. Thanks
0 Kudos
Chao_Y_Intel
Moderator
1,304 Views

Hi rohwedder,

I overlooked this code. It looks fine. We havesome test for the performance.

Thanks,
Chao
0 Kudos
rohwedder
Beginner
1,304 Views
Quoting - Chao Y (Intel)

Hi rohwedder,

I overlooked this code. It looks fine. We havesome test for the performance.

Thanks,
Chao

Are you doing the performance tests you mentioned or what is happening? I still have no idea where this performance problem is coming from or rather how to solve it.

Thanks
0 Kudos
Chao_Y_Intel
Moderator
1,304 Views

Hello,

This problem can be reproduced here. Our engineer owner checked performance of this function. The algorithm of ippiFilterMedian_16s_C1R function in v30 and in v61 were changed for using the OMP threading. For small and low masks, the algorithm in IPP 3.0 is a bit better. For other masks, IPP 6.1 is better.

In this test case, it looks you are using 1D mask. This is not intended for optimization in this function. For 1D mask, you can try function in IPPS domain.

Thanks,
Chao


0 Kudos
shyaki
Beginner
1,304 Views
Quoting - rohwedder
Hi,

we had the need to switch to WindowsXP-64Bit because our software needs too much memory for a WindowsXP-32Bit OS (huge images for computer vision). On Windows-32bit we were using IPP3.0. Therefore we had to upgrade because IPP3.0 is not available for 64Bit, so now we use IPP6.1. Additionally we were hoping for a small performance boost as side effect of the new optimizations (e.g. multithreading, since our system is a dual core system) in the new version, but in the contrary! We encountered cases in which the old IPP3.0 was faster than the new one and this effect got even worse when we used ippSetNumThreads(1) to limit computation to only one thread just as IPP3.0 already did. IPP6.1 was more than two times slower than the old IPP3.0.
Example:
ippiFilterMedian_16s_C1R(...) with the following parameters
srcStep=4090
dstStep=4050
dstRoiSize={ width=2025 height=6337 }
maskSize={ width=21 height=1 }
anchor={ x=0 y=0 }
Size of source image={ width=2045 height=6337 }
Image Content=Random noise created by a randomizer

Our test computer:
Intel Core 2 Duo CPU (E6750 @ 2,66GHz) with 8GB of RAM on Windows XP 64Bit

Does anyone have an idea why the older version is slower and is there a solution for our problem?

Thank you,

Rohwedder AG

I'd like to bring to your attention that your IPP3.0was on the 32bit OS while your IPP6.1 is on a 64bit OS. In my tests, IPP6.1 is much slower on 64bit OS than on 32bit OS.
0 Kudos
Chao_Y_Intel
Moderator
1,304 Views
Quoting - shyaki

I'd like to bring to your attention that your IPP3.0was on the 32bit OS while your IPP6.1 is on a 64bit OS. In my tests, IPP6.1 is much slower on 64bit OS than on 32bit OS.

Is it for ippiFilterMedian_16s_C1R function or others?

thanks,
Chao
0 Kudos
rohwedder
Beginner
1,304 Views
Quoting - Chao Y (Intel)

Is it for ippiFilterMedian_16s_C1R function or others?

thanks,
Chao

We used the old IPP3.0 (32Bit edition) and the new IPP6.1 (32Bit and 64Bit editions) on a 64Bit Windows XP operating system and there were no big performance differences between IPP6.1 32Bit and IPP6.1 64Bit (see timings earlier in the discussion). Anyway, it is not an option for us to use 32Bit because of the amount of memory we need.
We only tried out the ippiFilterMedian_16s_C1R because this is the function we need right now... well, actually we need the ippiFilterMedian_16u_C1R but this function doesn't exist in IPP3.0 and we were using the ippiFilterMedian_16s_C1R (and a few 16u<->16s conversion functions) to replace it in the old implementation. Of course with IPP6.1 we will use ippiFilterMedian_16u_C1R but after tests we found out it is not faster than ippiFilterMedian_16s_C1R. As you can imagine it would have been a bad example if we posted our code with ippiFilterMedian_16u_C1R in this forum because it would not have been compareable.

Thanks
0 Kudos
rohwedder
Beginner
1,304 Views
Quoting - Chao Y (Intel)

Hello,

This problem can be reproduced here. Our engineer owner checked performance of this function. The algorithm of ippiFilterMedian_16s_C1R function in v30 and in v61 were changed for using the OMP threading. For small and low masks, the algorithm in IPP 3.0 is a bit better. For other masks, IPP 6.1 is better.

In this test case, it looks you are using 1D mask. This is not intended for optimization in this function. For 1D mask, you can try function in IPPS domain.

Thanks,
Chao



I tried calling the ippsFilterMedian_16s for each line of my image but unfortunately it seems to produce wrong results. Have I found another bug? Anyway, my images will be 16u in the end and there is no ippsFilterMedian_16u. By the way, generally we also use ippiFilterMin_8/16u_C1R, ippiFilterMax_8/16u_C1R, ippiFilterBox_8/16u_C1R but for our current project we only need ippiFilterMedian_8/16u_C1R.
Can this performance issue be fixed?
0 Kudos
Chao_Y_Intel
Moderator
1,304 Views
Quoting - rohwedder

I tried calling the ippsFilterMedian_16s for each line of my image but unfortunately it seems to produce wrong results. Have I found another bug? Anyway, my images will be 16u in the end and there is no ippsFilterMedian_16u. By the way, generally we also use ippiFilterMin_8/16u_C1R, ippiFilterMax_8/16u_C1R, ippiFilterBox_8/16u_C1R but for our current project we only need ippiFilterMedian_8/16u_C1R.
Can this performance issue be fixed?


Hello,

It looks that it has two problems here:

1> ippsFilterMedian_16s error, Maybe you attach your file here. So we can check if there any problem with this functions.

2> Fixing the performance issue with ippiFilterMin_8/16u_C1R:
This looks a little different with performance problem in ippiFilterMedian_16s. The old issue is comparing performance of IPP 3.0 and IPP 6.1 for ippiFilterMedian_16s.
Here are you comparing ippiFilterMin_8/16u_C1R performance with ippiFilterMedian_16s in IPP 6.1. You find 16u functions is slower than 16s.

Do I understand it correctly?

Thanks,
Chao
0 Kudos
rohwedder
Beginner
1,126 Views
Quoting - Chao Y (Intel)


Hello,

It looks that it has two problems here:

1> ippsFilterMedian_16s error, Maybe you attach your file here. So we can check if there any problem with this functions.

2> Fixing the performance issue with ippiFilterMin_8/16u_C1R:
This looks a little different with performance problem in ippiFilterMedian_16s. The old issue is comparing performance of IPP 3.0 and IPP 6.1 for ippiFilterMedian_16s.
Here are you comparing ippiFilterMin_8/16u_C1R performance with ippiFilterMedian_16s in IPP 6.1. You find 16u functions is slower than 16s.

Do I understand it correctly?

Thanks,
Chao

I don't know if there are performance issues with any other function than ippiFilterMedian_16s/16u_C1R. Of course I haven't compared Min to Median neither 16s to 16u nor any other non-equal combination. I only compared IPP3.0 to IPP6.1. Furthermore I was asuming that the Median 16s should in general have a quite similar performance as the 16u version, right? Usually the implementations of Min, Max and Median (I don't know about the Box filter) are pretty similar and therefore I also guess that these functions all have the same performance issue (if you compare IPP3.0 to IPP6.1).

The other Problem with the ippsFilterMedian_16s can be seen with the following code using the IPP6.1:
[cpp]#include 
#include 
#include 

using namespace System;
using namespace System::Diagnostics;

int main(array<:STRING> ^args)
{
	const int iInputArraySize = 3;
	const int iFilterSize = 3;
	const int iOutputArraySize = iInputArraySize - iFilterSize + 1;

	//Prepare input array
	short inputArray[iInputArraySize];
	Console::WriteLine("InputArray:");
	Random random(0);
	for(int i = 0; i < iInputArraySize; i++)
	{
		inputArray = (short)(random.Next());
		Console::Write(inputArray.ToString() + " ");
	}
	Console::WriteLine();
	Console::WriteLine();

	//Prepare output array
	short outputArray[iOutputArraySize];
	memset(outputArray, 0, iOutputArraySize * sizeof(short));

	IppStatus ippStatus = ippsFilterMedian_16s(inputArray, outputArray, iOutputArraySize, iFilterSize);
	Console::WriteLine(gcnew String(ippGetStatusString(ippStatus)));
	Console::WriteLine();

	Console::WriteLine("OutputArray:");
	for(int i = 0; i < iOutputArraySize; i++)
	{
		Console::Write(outputArray.ToString() + " ");
	}
	Console::WriteLine();

	Console::WriteLine("Press any key to continue . . .");
	Console::ReadKey();

    return 0;
}[/cpp]

This code produces the following output. The correct median should be -28346 and see for yourself what ippsFilterMedian_16s computes:

InputArray:
-30182 7692 -28346

ippStsNoErr: No error, it's OK

OutputArray:
-30182
Press any key to continue . . .


Maybe I programmed an error, but at least at the moment I don't know how. Any ideas? Anyway, since we will be using 16u images it won't be an option for use to use ippsFilterMedian_16s because there is no 16u version.

Thanks
0 Kudos
Reply