IPP can not allocate images larger than 2GB in em64t version

dirk · ‎08-30-2010

Hello,

I was a little shocked, when I recognized that the IPP can not allocate memory blocks larger than 2GB in a 64bit environment. The ippMalloc function receives an (int length) and all ippiMalloc functions internally perform an integer multplication in order to calculate the size of the image buffer. This fails to allocate image buffers larger than 2GB.

I now switched back and allocate the image buffer myself, but I'm afraid using any function of the IPP since I expect severe trouble for any function that will calculate the pointer offset in the image with a simple

> ptr + size.height*step

calulation. Since in all ipp* function define size.height and step as int types the calculation must fail if not correctly casted from int to size_t or any other 64bit datatype.

One way out of this problem would be to change the step parameter from int to size_t.

I have checked version 5.3, 6.0 and 6.1 and all have the problem. I have not looked into IPP 7 so far.

Dirk

PS.: Your website about IPP will clearly state that there is no limitation on allocation, see:
http://software.intel.com/en-us/articles/performance-tools-for-software-developers-memory-function-faq/

What is the maximum amount of memory that can be allocated by Intel IPP functions?
There are no restrictions to the amount of memory that can be allocated except as defined by the the user's operating system and system hardware.

Chao_Y_Intel · ‎08-30-2010

Dirk,

Thanks for post the problem here. It is true that the malloc function is limited with 32 bit "int". The system malloc can be used for large memory allocations. Currently, the image processing function is usingdata type "int" for length/size parameters. It also has the constraints of 32-bit data types. Our engineer team is reviewing therequest to support 64 bit "int", and check what we can improvefor future versions.

To use the IPP function with very large images now, users can first divide the image to chunks. Each chunk can call related IPP functions. Users can provide right pointer to the image chunk first. For example, if we have:

Image size = 50000x50000x4 = 10^10

Max int = 2147483647 = ~2^31

Therefore we need not less than 5 chunks (10^10/2^31 = ~4.7): - lets consider 5:

Chunk1: pSrc1 = pSrcImage; pDst1 = pDstImage; step (both Src & Dst) = 200000; chunk.width = 50000; chunk.height = 10000;

Chunk2: pSrc2 = pSrc1 + step*chunk.height; pDst2 = pDst1 + step*chunk.height; step = 200000; chunk.width = 50000; chunk.height = 10000;
....

Chunk5: pSrc5 = pSrc4 + step*chunk.height; pDst4 = pDstImage + step*chunk.height; step = 200000; chunk.width = 50000; chunk.height = 10000;

Before calling IPP function, user can calculate right image chuck pointer first, then call related IPP functions. For each chuck data,

Thanks,
Chao

dirk · ‎08-31-2010

Hello Chao,

thanks for the answer, but frankly speaking chunking the image buffer is not a solution. It is not even a workaround, it is a design decision that was necessary years ago, when no 64bit operating systems where available. You could also spin this further and make every line a chunk and provide an interface for an array of pointers to the start of the lines. Then you would be really free in the size limitations. This would be an interesting approach also, since this would enable better deployment on 32bit platforms, where memory is not only limited by the maximum amount but also by the maximum available continuous block.

The design decision of the IPP is to have a continous block of memory and not providing an interface for large images on 64bit platforms is a real issue (and also a deviation from what you claim in your product anouncement on your website).

As I wrote, for the allocation issue there is a workaround. But what about all image processing functions? What about ippiMean, where the sum of all pixel is calculated? Will this overflow? Will these functions work correctly when provided with a larger buffer?

Another person on this forum pointed out a similar problems with the fourier transform not able to deal with large buffers. So please give an overview of what is and what is not possible for each function, and also fix this issue in the next version. Having a size_t step variable would probably solve 90% of all issues.

Dirk

Gennady_F_Intel · ‎08-31-2010

"I have checked version 5.3, 6.0 and 6.1 and all have the problem. I have not looked into IPP 7 so far"

Actually, we are working on this problem. This problem has not been solved in 7.0 yet but as we hope this functionality will available the next version.

--Gennady

dirk · ‎09-03-2012

Hello Gennady, is there any news on this front? There is currently the 7.1beta program, but I can't see from the changelist, that support for >2GB problem is addressed. Dirk