ippiResize Threading Problems

Shawn_Gibson · ‎04-12-2017

I have been migrating a very large application from using the very old IPP 6.x to the latest IPP 2017 (R2). This is a direct migration, I have not used IPP 7, IPP 8, or IPP 9. I would say I have about 98% functionality with the new API, but I ran into some problems with image resizing.

In IPP 6.x, we were using the ippiResizeSqrPixel function call. In this case we are scaling the image down, so we were using IPPI_INTER_SUPER.

status = ippiResizeSqrPixel_8u_C3R(pImageData, srcSize, iStride * sizeof(Ipp8u), srcRect, pDst, (dstWidth * channels * sizeof(Ipp8u)), dstRect, scale, scale, 0, 0, IPPI_INTER_SUPER, pExtBuffer);

I figured out how to do this with IPP 2017 using the standard functions and all was working.

ippiResizeGetSize_8u, ippiResizeSuperInit_8u, ippiResizeGetBufferSize_8u, to set things up and then call ippiResizeSuper_8u_C3R.

However I noticed this methodology was much slower than the older call to ippiResizeSqrPixel. Roughly 5 - 10 times slower depending. I read somewhere on these forums that ippiResizeSqrPixel used internal threading, hence its speed. With the latest API, I ran across the _LT methods, ippiResizeGetSize_LT, ippiResizeSuperInit_LT, ippiResizeGetBufferSize_LT, and finally ippiResizeSuper_8u_C3R_LT.

Upon implementing this resize path, the times were now much more comparable to ippiResizeSqrPixel...however the images are now garbled in the 'y' direction, sometimes replicating previous sections in the image. If I call ippSetNumThreads_LT(1) before using the _LT methods, then the image is now correct again, however there is no speed up and its times are comparable (if not slighly slower) to just using ippiResize without _LT.

Since setting ippSetNumThreads_LT(1) causes the code to work properly, this seems like a bug, or more likely I haven't quite figured out how to use the new functionality correctly. Is this a known issue?

Thanks!

Aleksey_Y_ · ‎04-12-2017

"In Intel® Integrated Performance Primitives (Intel® IPP) 8.2 and later versions, multi-threading (internal threading) libraries are deprecated due to issues with performance and interoperability with other threading models, but made available for legacy applications....

For new application development, it is highly recommended to use the single-threaded versions with application-level threading"

https://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-threading-openmp-faq

Valentin_K_Intel · ‎04-13-2017

Hi Shawn,

Could you please provide a code example, where ippiResizeSuper_8u_C3R_LT works incorrectly?

Thanks,
Valentin

Shawn_Gibson · ‎04-13-2017

This is the main entry call.  Usually before getting to this stage, setNumThreads has already been called (tested on a laptop with 8 logical cores, and workstation with 48).  The incoming image will get scaled 2 different times.
Original image size: 7296 x 13200, stride = 21888 (BGR image).
1st scale down will be 0.5 in x & y, so the new size will be 3648 x 6600, strid = 10944.
A 2nd 'thumbnail' image will also be created with a 2nd call to this method and will be scaled down massively (53 x 96, stride = 159).

template<>
	inline IppStatus IppiFunction<Ipp8u, Any>::ResizeImage_LT(double scale, int32_t channels, const Ipp8u* pSrc, IppiSize srcSize, int32_t srcStep, IppiRect srcRect,
		Ipp8u* pDst, IppiRect dstRect)
	{
		IppStatus status;

		IppiInterpolationType interpolation;

		if (scale < 1)
		{
			interpolation = IppiInterpolationType::ippSuper; //IPPI_INTER_SUPER only works with downsizing an image
		}
		else
		{
			interpolation = IppiInterpolationType::ippLinear;
		}

		IppiSizeL srcSizeL;
		srcSizeL.width = srcSize.width;
		srcSizeL.height = srcSize.height;

		IppiSizeL dstSizeL;
		dstSizeL.width = dstRect.width;
		dstSizeL.height = dstRect.height;

		IppSizeL init_buffer_size = 0; // may not be required
		IppSizeL spec_size;

		status = ippiResizeGetSize_LT(srcSizeL, dstSizeL, IppDataType::ipp8u, interpolation, 0, &spec_size, &init_buffer_size);

		if (ippStsNoErr == status)
		{
			IppiResizeSpec_LT* pSpec = (IppiResizeSpec_LT*)IppsFunction<IppSizeL>::Malloc(spec_size);

			if (scale < 1)
			{
				status = ippiResizeSuperInit_LT(srcSizeL, dstSizeL, IppDataType::ipp8u, channels, pSpec);
			}
			else
			{
				status = ippiResizeLinearInit_LT(srcSizeL, dstSizeL, IppDataType::ipp8u, channels, pSpec);
			}

			IppSizeL working_buffer_size = 0;
			status = ippiResizeGetBufferSize_LT(pSpec, &working_buffer_size);

			Ipp8u * pExtBuffer = IppsFunction<Ipp8u>::Malloc(working_buffer_size); // a working buffer is required for resizing

			IppSizeL srcStepL = srcStep * sizeof(Ipp8u);
			IppSizeL dstStepL = dstRect.width * channels * sizeof(Ipp8u);

			status = IppiFunction<Ipp8u, Any>::Resize_LT(channels, interpolation, pSrc, srcStepL, pDst, dstStepL, pSpec, pExtBuffer);

			IppsFixedFunction::Free(pExtBuffer);
			IppsFixedFunction::Free(pSpec);
		}

		return status;
	}

The method IppiFunction<Ipp8u, Any>::Resize_LT will essentially call the ResizeSuper function.

template<>
	inline IppStatus IppiFunction<Ipp8u, Any>::ResizeSuper_LT(int32_t channels, const Ipp8u* pSrc, IppSizeL srcStep, Ipp8u* pDst, IppSizeL dstStep,
		const IppiResizeSpec_LT* pSpec, Ipp8u* pBuffer)
	{
		IppStatus status;

		switch (channels)
		{
		case 1:
			status = ippiResizeSuper_8u_C1R_LT(pSrc, srcStep, pDst, dstStep, pSpec, pBuffer);
			break;
		case 3:
			status = ippiResizeSuper_8u_C3R_LT(pSrc, srcStep, pDst, dstStep, pSpec, pBuffer);
			break;
		case 4:
			status = ippiResizeSuper_8u_C4R_LT(pSrc, srcStep, pDst, dstStep, pSpec, pBuffer);
			break;
		}

		return status;
	}

If setNumThreads is 8 on the laptop, or 12 on the desktop, a scaled image is created, however the image is usually not correct with multiple chunks of the image being replicated in the y direction, or with chunks of the image being misplaced (pieces that should be at the top end up and the bottom, or vice versa).

Timing information

ippiResizeSqrPixel (Ipp v6)

Original size by a scale of 0.5: Average is around 37 ms. Original size to thumbnail by a scale of about 0.00727: Average is around 230.7 ms.

ippiResize (Ipp 2017)

Original size by a scale of 0.5: Average is around 182.5 ms. Original size to thumbnail: Average is around 58.1 ms. Found this interesting that the thumbnail scaling is much faster. The image however is correct.

ippiResize_LT (Ipp 2017) - setNumThreads = 12

Original size by a scale of 0.5: Average is around 33 ms. Original size to thumbnail: Average is around 59.4 ms. The image is incorrect and garbled.

ippiResize_LT (Ipp 2017) - setNumThreads = 1

Original size by a scale of 0.5: Average is around 187.2 ms. Original size to thumbnail: Average is around 72.3 ms. The image is correct.

Hope this is enough information.

Valentin_K_Intel · ‎04-13-2017

Thank you for the code example, Shawn. We will investigate the problem.

Best regards,
valentin

Valentin_K_Intel · ‎04-26-2017

Hi Shawn,

I cannot reproduce the problem. Could you please provide some details?
1. What platform (CPU, OS, ia32 or intel64) do you use?
2. Does the problem appear for IPP 2017 Update 2?

Best regards,
Valentin

Shawn_Gibson · ‎04-26-2017

I have repeated the problem on 2 different systems:

System 1:

Win7, intel64, Intel Core i7-4800MQ CPU @ 2.70 GHz, 4 cores, 8 logical processors.
IPP2017 Update 2.

System 2:

Win7, intel64, Intel Xeon E5-2690 v3 @ 2.6 GHz, 2x12 cores, 48 logical processors
IPP2017 Update 2.

Valentin_K_Intel · ‎04-28-2017

Hi Shawn,

Thank you very much for providing the details. I have reproduced the issue. The problem will be fixed in next IPP releases.

Best regards,
Valentin

Valentin_K_Intel · ‎05-10-2017

Hi Shawn,

As workaround you can build TL libraries from the sources. The instructions can be found here: https://software.intel.com/en-us/node/684665

Best regards,
Valentin