Solved: Maximum Size of Float Image in ippiCrossCorrValid

rsapaico · ‎04-03-2012

Hello,

I hope somebody can help me with the following.

I am using Intel IPP 6.1, and I am running on Windows 7 x64 (16GB of memory).

I am trying to use the "ippiCrossCorrValid_NormLevel_8u32f_C1R" function; however, it returns the following error:ippStsMemAllocErr (Not enough memory allocated for the operation).

I have searched in the forum; however I haven't been able to find anything with regard to this problem.

The relevant part of my code is something like (the pointers to the sample and template have been defined previously without any problem):

[cpp]IppStatus ippsta; IppSize ippszSam = {16443,3284}; int iStepSam = ippszSam.width; IppSize ippszTmpl = {4899,3280}; int iStepTmpl = ippszTmpl.width; int iXncc = ippszSam.width - ippszTmpl.width + 1; // result width int iYncc = ippszSam.height - ippszTmpl.height + 1; // result height int iStepncc = iXncc * sizeof(float); // result step size IppiSize ippszncc = { iXncc , iYncc }; std::vector vfNCC( iXncc * iYncc ); // result container float* pfNCC = &(vfNCC[0]); // Calling the correlation routine ippsta = ippiCrossCorrValid_NormLevel_8u32f_C1R( pucSam, iStepSam, ippszSam, pucTemplate, iStepTmpl, ippszTmpl, pfNCC, iStepncc );[/cpp]

I have seen in the following page that the maximum allowed size for an image should be ~2^29:

http://software.intel.com/en-us/articles/mkl-ipp-choosing-an-fft/

However, in this case, even if the Sample image is converted to float, it will require less than 2^28 bytes (16443 x 3284 x 4 bytes).

Therefore, I don't have it clear why the IPP function is returning the "ippStsMemAllocErr" status. So, I would like to ask a couple of things:

1. For the above function, and the above image sizes, what is the number of bytes allocated in memory? Is my assumption correct?

2. Is the maximum allowed size actually ~2^29 bytes?

If so, why is the call to the IPP functions returning an error?

If not, then which is the maximum allowed size when dealing with images?

If there is any other information that I may have missed, please kindly let me know.

Thank you very much in advance for your kind cooperation.

Best regards,

Luis

igorastakhov · ‎04-17-2012

Hi ALL,

I see your problems - the main issue here is IPP API constrains - first of all IPP uses it's own malloc that is just a wrapper over runtime malloc - and the 1st constrain is that ippMalloc has int parameter for allocation size - so you (and we internally) can't allocate more than 2Gb; second - ippiCrossCorr API constrains - there is a plan to remove any internal memory allocations in the future IPP versionsand to provide additional parameter for all functions that require additional memory buffer - pBuffer - so it will be customer responsibility on providing required memory buffers according to ippiCrossCorrGetBufferSize() (also planned).

PS just for Sergey K: actually there is no full-size buffer for transposition - after allFFT by rows done - only 4 - 16 columns are transposed at once (depends on arch and cache size) - sothe cyclic buffer for this purpose is rather small.

Regards,
Igor

View solution in original post

SergeyKostrov · ‎04-03-2012

Please take a look at my comments.

Quoting rsapaico

...

I am using Intel IPP 6.1, and I am running on Windows 7 x64 (16GB of memory).

I am trying to use the "ippiCrossCorrValid_NormLevel_8u32f_C1R" function; however, it returns the following error:ippStsMemAllocErr (Not enough memory allocated for the operation).

[SergeyK] It is possibly related to small valuesassigned forHeap ( Reserve / Commit )and Stack( Reserve / Commit). Try to check theLinker Settings.

I have searched in the forum; however I haven't been able to find anything with regard to this problem.

The relevant part of my code is something like (the pointers to the sample and template have been defined previously without any problem):

- collapse sourceview plaincopy to clipboardprint?

IppStatusippsta;
IppSizeippszSam={16443,3284};

[SergeyK] This is ~205MB and this is nota big size, actually.
intiStepSam=ippszSam.width;
IppSizeippszTmpl={4899,3280};
intiStepTmpl=ippszTmpl.width;
intiXncc=ippszSam.width-ippszTmpl.width+1;//resultwidth
intiYncc=ippszSam.height-ippszTmpl.height+1;//resultheight
intiStepncc=iXncc*sizeof(float);//resultstepsize
IppiSizeippszncc={iXncc,iYncc};
std::vector<float>vfNCC(iXncc*iYncc);//resultcontainer

[SergeyK] I would recommend to allocate memory forthe array from the Heap instead and later it could be
copied to the STL 'vector' for processing.
float*pfNCC=&(vfNCC[0]);
//Callingthecorrelationroutine
ippsta=ippiCrossCorrValid_NormLevel_8u32f_C1R(pucSam,iStepSam,ippszSam,
pucTemplate,iStepTmpl,ippszTmpl,
pfNCC,iStepncc);

[cpp]...[/cpp]

I have seen in the following page that the maximum allowed size for an image should be ~2^29:

[SergeyK] I've donetesting some time ago for a 32-bit Windows platformand it is ~2^30, or 1.09GB,
and if your image is based on '8u' type this is an image with size34207x34207.
However, in case of a 64-bit Windows platform numbers could be higher.

http://software.intel.com/en-us/articles/mkl-ipp-choosing-an-fft/

However, in this case, even if the Sample image is converted to float, it will require less than 2^28 bytes (16443 x 3284 x 4 bytes).

Therefore, I don't have it clear why the IPP function is returning the "ippStsMemAllocErr" status. So, I would like to ask a couple of things:

1. For the above function, and the above image sizes, what is the number of bytes allocated in memory? Is my assumption correct?

[SergeyK] It is not clear if 'ippiCrossCorrValid...' uses some additional memory inside.

2. Is the maximum allowed size actually ~2^29 bytes?

[SergeyK]As I mentioned already it is ~2^30 for a 32-bit platform.

If so, why is the call to the IPP functions returning an error?

If not, then which is the maximum allowed size when dealing with images?

[SergeyK] In case a 32-bit Windows platform and '8u' type:

34207x34207 if a memory allocated with 'ippiMalloc...' IPP functions
34208x34208 if a memory allocated with 'malloc' CRT function

...

rsapaico · ‎04-03-2012

Dear Sergey,

Thank you very much for your prompt response.

Please let me continue with the discussion:

I am using Intel IPP 6.1, and I am running on Windows 7 x64 (16GB of memory).
I am trying to use the "ippiCrossCorrValid_NormLevel_8u32f_C1R" function; however, it returns the following error:ippStsMemAllocErr (Not enough memory allocated for the operation).

[SergeyK]It is possibly related to small valuesassigned forHeap( Reserve / Commit )andStack( Reserve / Commit). Try to check theLinker Settings.

- The default heap reserve size is 1MB, and the default heap commit size is 4KB.

The same applies for the Stack's reserve and commit sizes.

- I tried increasing the reserve size little by little until I got to 1GB.

I also increased the commit size up to 1MB.

- However, the same "ippStsMemAllocErr" happens.

Shouldn't those sizes be enough?

Just in case it is relevant, I am using Visual Studio 2010.

2. Is the maximum allowed size actually ~2^29 bytes?

[SergeyK]As I mentioned already it is ~2^30for a32-bitplatform.

If so, why is the call to the IPP functions returning an error?
If not, then which is the maximum allowed size when dealing with images?

[SergeyK]In case a32-bitWindows platform and '8u' type:

34207x34207if a memory allocated with 'ippiMalloc...'IPPfunctions
34208x34208if a memory allocated with 'malloc'CRTfunction

You are right. I tried "ippiMalloc" with the image sizes you mentioned (34207x34207), both for '8u' and '32f' types, and there were both successful.

It makes me wonder even more why the call to ippiCrossCorrValid would fail, given that my image is much smaller.

1. For the above function, and the above image sizes, what is the number of bytes allocated in memory? Is my assumption correct?

[SergeyK]It is not clear if 'ippiCrossCorrValid...' uses some additional memory inside.

Is there any way for me to know, before making the call to "ippiCrossCorrValid", whether there will be a memory allocation error?

In general, is it any way to know why this allocation error happens?

Thank you very much in advance for your help.

Best regards,

Luis

SergeyKostrov · ‎04-04-2012

Hi Luis,

Quoting rsapaico

...
- However, the same "ippStsMemAllocErr" happens.

Shouldn't those sizes be enough?

[SergeyK] It has to be enough. Please take a look at a Note below.

...

I tried "ippiMalloc" with the image sizes you mentioned (34207x34207), both for '8u' and '32f' types, and there were both successful.

[SergeyK] Thank you for confirming my results!

Is there any way for me to know, before making the call to "ippiCrossCorrValid", whether there will be a memory allocation error?

In general, is it any way to know why this allocation error happens?

[SergeyK] I'll try to spend some time todaywith your initial test case.

I havetwo more questions:

Are you making a call to 'ippiCrossCorrValid...'form a DLL or from EXE?
Did you set new Heap ( Reserve / Commit ) and Stack ( Reserve / Commit ) values for a DLL or for EXE?

Note:

Unfortunately, similar problems are happening almost every day with software developers using different
software products. Please take a look at two threads ( my comments are in Posts #2 ):

http://software.intel.com/en-us/forums/showthread.php?t=104191&o=a&s=lr
http://software.intel.com/en-us/forums/showthread.php?t=104137&o=a&s=lr

Also, for one of my software subsystem the following Heap ( Reserve / Commit ) and Stack ( Reserve / Commit )
values are used.They are defined in a cpp source file forEXEmodule and they work:

[cpp] #pragma message ( "*** New HEAP Commit:Reserve and STACK Commit:Reserve Values Defined ***" ) #ifdef _RTDEBUG // Case 1 // #pragma comment( linker, "/HEAP:1069547520,1069547520" ) // #pragma comment( linker, "/STACK:4194304,4194304" ) // Case 2 // #pragma comment( linker, "/HEAP:134217728,134217728" ) // #pragma comment( linker, "/STACK:134217728,134217728" ) // Case 3 // #pragma comment( linker, "/HEAP:268435456,268435456" ) // #pragma comment( linker, "/STACK:268435456,268435456" ) // Case 4 #pragma comment( linker, "/HEAP:536870912,536870912" ) #pragma comment( linker, "/STACK:536870912,536870912" ) #endif #ifdef _RTRELEASE // Case 1 // #pragma comment( linker, "/HEAP:1069547520,1069547520" ) // #pragma comment( linker, "/STACK:4194304,4194304" ) // Case 2 // #pragma comment( linker, "/HEAP:134217728,134217728" ) // #pragma comment( linker, "/STACK:134217728,134217728" ) // Case 3 // #pragma comment( linker, "/HEAP:268435456,268435456" ) // #pragma comment( linker, "/STACK:268435456,268435456" ) // Case 4 #pragma comment( linker, "/HEAP:536870912,536870912" ) #pragma comment( linker, "/STACK:536870912,536870912" ) #endif[/cpp]

rsapaico · ‎04-04-2012

Hello Sergey,

Thank you very much for your support on this matter.

Please let me reply to your questions:

Are you making a call to 'ippiCrossCorrValid...'form a DLL or from EXE?

The call to "ippiCrossCorrValid" is made from a DLL.

Did you set newHeap( Reserve / Commit ) andStack( Reserve / Commit ) values for a DLL or for EXE?

I set the Heap and Stack values for the DLL.

Did I have to set them for the EXE that is using the DLL?

Also, what would be a good Heap(Reserve) size considering the image size I am working with (~205MB)?

Finally, if the Stack(Reserve) size is not enough, shouldn't the system allocate the memory using the Virtual Memory? This is just something I had on my mind...

As for your other Notes, I will take a look at them. Thanks.

Thanks a lot for your help, I hope I can have it working soon.

Best regards,

Luis

SergeyKostrov · ‎04-05-2012

Hi Luis,

Quoting rsapaico

...

Did I have to set them for the EXE that is using the DLL?

[SergeyK] Yes, I would try it. Some time ago I had some issues and setting these values for a DLL
didn't help.

Also, what would be a good Heap(Reserve) size considering the image size I am working with (~205MB)?

[SergeyK] What about 512MB?

Finally, if the Stack(Reserve) size is not enough, shouldn't the system allocate the memory using the Virtual Memory?

[SergeyK] If an application allocates a maximum allowed amount then a new request for more Stack
memory will fail. That is why some users see an error message like 'Stack Overflow'

Please review a part of your codes where you allocate a memory for the destination buffer'pDst'.

// Purpose: ippiCrossCorr_Norm() function allows you to compute the
// cross-correlation of an image and a template (another image).
// The cross-correlation values are image similarity measures: the
// higher cross-correlation at a particular pixel, the more
// similarity between the template and the image in the neighborhood
// of the pixel. If IppiSize's of image and template are Wa * Ha and
// Wb * Hb correspondingly, then the IppiSize of the resulting
// matrice with normalized cross-correlation coefficients will be
//
// a) in case of 'Full' suffix:
// ( Wa + Wb - 1 )*( Ha + Hb - 1 ).
// b) in case of 'Same' suffix:
// ( Wa )*( Ha ).
// c) in case of 'Valid' suffix:
// ( Wa - Wb + 1 )*( Ha - Hb + 1 ).
...
// suffix 'R' (ROI) means only scanline alingment (srcStep), in
// 'Same' and 'Full' cases no any requirements for data outstand
// the ROI - it's assumes that template and src are zerro padded.
//
...
// Arguments:
// pSrc - pointer to the source image ROI;
// srcStep - step in bytes through the source image buffer;
// srcRoiSize- size of the source ROI in pixels;
// pTpl - pointer to the template ( feature ) image ROI;
// tplStep - step in bytes through the template image buffer;
// tplRoiSize- size of the template ROI in pixels;
// pDst - pointer to the destination buffer;
// dstStep - step in bytes through the destination image buffer;
...
// Return:
// ippStsNoErr - Ok
// ippStsNullPtrErr - at least one of the pointers to pSrc, pDst or pTpl is NULL;
// ippStsSizeErr - at least one of the sizes of srcRoiSize or tplRoiSize
// is less or equal zero, or at least one of the sizes
// of srcRoiSize is smaller then the corresponding size
// of the tplRoiSize;
// ippStsStepErr - at least one of the srcStep, tplStep or dstStep is less or equal zero;
//
// ippStsMemAllocErr - an error occur during allocation memory for internal buffers.

I can't confirm it practically but it looks like your problem is related to a wrong size ofsome internal buffer(s).

Best regards,
Sergey

rsapaico · ‎04-06-2012

Hi Sergey,

Thanks a lot for your response.

I can't confirm it practically but it looks like your problem is related to a wrong size ofsome internal buffer(s).

Unfortunately, my problem is not related to that. I have checked several times the allocation of those pointers.

Also, if I use a smaller image, the "ippiCrossCorr_Norm" works without any problem, by using exactly the same code (I am reading the image from a BMP file).

The problem, I think, is related to what the "ippiCrossCorr_Norm" does inside once it is called.

It seems to me that while I can allocate 2^30 bytes, the working image cannot be this size, because it appears that the "ippiCrossCorr_Norm" is allocating memory internally for the processing.

Unfortunately, the only ones who can response are the developers of the library.

I kindly invite you to try (whenever you have time) to use the "ippiCrossCorr_Norm" for an image size similar to the one of my example, to confirm that the function will not work.

I am still wondering what the maximum size for an image is...and I think that the only ones who can tell me are IPP developers. What do you think?

Best regards,

Luis Sapaico

SergeyKostrov · ‎04-06-2012

Hi Luis,

Quoting rsapaico

...The problem, I think, is related to what the "ippiCrossCorr_Norm" does inside once it is called.

It seems to me that while I can allocate 2^30 bytes, the working image cannot be this size, because it appears that the "ippiCrossCorr_Norm" is allocating memory internally for the processing.

[SergeyK] You could easily verify it under Debugger and monitor a'PF Usage' in theTask Manager
on 'Performance' property page.

Unfortunately, the only ones who can response are the developers of the library.

I kindly invite you to try (whenever you have time) to use the "ippiCrossCorr_Norm" for an image size similar to the one of my example, to confirm that the function will not work.

I am still wondering what the maximum size for an image is...and I think that the only ones who can tell me are IPP developers. What do you think?

[SergeyK] I hope that IPP developers will respond.

Best regards,

Luis Sapaico

Best regards,
Sergey

SergeyKostrov · ‎04-07-2012

I confirm that thereare memory relatedproblems with many'ippiCrossCorr_...' functions on 32-bit Windows platforms and
I'd like to provide some techical details.

Here is a summary because lots of questions areasked by LuisS andthere are some myresponses:

[LuisS] ...However, in this case, even if the Sample image is converted to float, it will require less than 2^28 bytes (16443 x 3284 x 4 bytes)...

Significantly less than 2^30 bytes.

[LuisS] ..Therefore, I don't have it clear why the IPP function is returning the "ippStsMemAllocErr" status...

It is because the 'ippiCrossCorr_...' functions are allocating a significant amount of memory internally.
On32-bit Windows platforms ( without AWE ) an application can not allocate more than 2GB of memory.

[LuisS] ...1. For the above function, and the above image sizes, what is the number of bytes allocated in memory? Is my assumption correct?..

Could Intel Software Engineers answer the question?

[LuisS] ...2. Is the maximum allowed size actually ~2^29 bytes?..

It depends on an IPP function and it is already confirmed by two Software Developers that this is about 2^30 bytes for a 32-bit Windows platform.

[LuisS] ...If so, why is the call to the IPP functions returning an error?..

Because inyour case the function tries to allocate lots of memory andit exceedsthe 2GB limitfor a 32-bit platform.

[LuisS] ...If not, then which is the maximum allowed size when dealing with images?..

It depends on a platform, that is, a 32-bit or 64-bit. I wonder if Intel Software Engineerscould follow up?

[SergeyK] ...It is not clear if 'ippiCrossCorrValid...' uses some additional memory inside...

Confirmed after a series of tests and it uses a lot.

[SergeyK] ...It is possibly related to small values assigned for Heap ( Reserve / Commit ) and Stack ( Reserve / Commit )...

No. I don't confirm this. Please do your own tests with a modified Test-Case in the next Post.

[LuisS] ...Is there any way for me to know, before making the call to "ippiCrossCorrValid", whether there will be a memory allocation error?..

I think you could make a series of tests and as a result you will have some numbers.

[LuisS] ...In general, is it any way to know why this allocation error happens?..

Already answered.

[LuisS] ...Unfortunately, my problem is not related to that. I have checked several times the allocation of those pointers...
[LuisS] ...Also, if I use a smaller image, the "ippiCrossCorr_Norm" works without any problem...
[LuisS] ...The problem, I think, is related to what the "ippiCrossCorr_Norm" does inside once it is called...

All three statements are confirmed after a series of tests.

[LuisS] ...I kindly invite you to try (whenever you have time) to use the "ippiCrossCorr_Norm" for
an image size similar to the one of my example, to confirm that the function will not work...

Please take a look ata modified Test-Case in the next Post.

SergeyKostrov · ‎04-07-2012

A modified Test-Case:

[cpp] // Case 1 // #pragma comment( linker, "/HEAP:1069547520,1069547520" ) // #pragma comment( linker, "/STACK:4194304,4194304" ) // Case 2 // #pragma comment( linker, "/HEAP:134217728,134217728" ) // #pragma comment( linker, "/STACK:134217728,134217728" ) // Case 3 // #pragma comment( linker, "/HEAP:268435456,268435456" ) // #pragma comment( linker, "/STACK:268435456,268435456" ) // Case 4 // #pragma comment( linker, "/HEAP:536870912,536870912" ) // #pragma comment( linker, "/STACK:536870912,536870912" ) // Version 3 { // Downsample Factor #define DS_FACTOR 1 // #define DS_FACTOR 2 // #define DS_FACTOR 4 // #define DS_FACTOR 8 // #define DS_FACTOR 16 // #define DS_FACTOR 32 // #define DS_FACTOR 64 // Test 1 - It works when the DS_FACTOR is greater than 1 // It doesn't work when the DS_FACTOR is equal to 1 #define IMAGE_W ( RTint )( 16433 / DS_FACTOR ) #define IMAGE_H ( RTint )( 3284 / DS_FACTOR ) #define TEMPL_W ( RTint )( 4899 / DS_FACTOR ) #define TEMPL_H ( RTint )( 3280 / DS_FACTOR ) // Test 2 - It works // #define IMAGE_W ( RTint )( 16384 / DS_FACTOR ) // #define IMAGE_H ( RTint )( 4096 / DS_FACTOR ) // #define TEMPL_W ( RTint )( 8192 / DS_FACTOR ) // #define TEMPL_H ( RTint )( 2048 / DS_FACTOR ) static Ipp8u pucSam[ IMAGE_W * IMAGE_H ] = { 0x0 }; static Ipp8u pucTemplate[ TEMPL_W * TEMPL_H ] = { 0x0 }; RTint iStepSam = IMAGE_W * sizeof( Ipp8u ); RTint iStepTmpl = TEMPL_W * sizeof( Ipp8u ); IppiSize ippRoiSam = { IMAGE_W, IMAGE_H }; IppiSize ippRoiTmpl = { TEMPL_W, TEMPL_H }; RTint iXncc = ippRoiSam.width - ippRoiTmpl.width + 1; // Result width RTint iYncc = ippRoiSam.height - ippRoiTmpl.height + 1; // Result height RTint iStepncc = iXncc * sizeof( RTfloat ); // Result step size // It works // std::vector< RTfloat > vfNCC( iXncc * iYncc ); // Result container // RTfloat *pfNCC = &( vfNCC[0] ); // It works static Ipp32f pfNCC[ ( IMAGE_W - TEMPL_W + 1 ) * ( IMAGE_H - TEMPL_H + 1 ) ] = { 0.0f }; // It doesn't work ( Win32 exception 0xC00000FD - Stack Overflow ) // Ipp32f pfNCC[ ( IMAGE_W - TEMPL_W + 1 ) * ( IMAGE_H - TEMPL_H + 1 ) ] = { 0.0f }; CrtPrintf( RTU("Image Size: %5ld x %5ldnTemplate Size: %5ld x %5ldnDownsample Factor: %ldn"), ( RTint )IMAGE_W, ( RTint )IMAGE_H, ( RTint )TEMPL_W, ( RTint )TEMPL_H, ( RTint )DS_FACTOR ); st = ::ippiCrossCorrValid_NormLevel_8u32f_C1R( &pucSam[0], iStepSam, ippRoiSam, &pucTemplate[0], iStepTmpl, ippRoiTmpl, &pfNCC[0], iStepncc ); CrtPrintfA( "%sn", ::ippGetStatusString( st ) ); } //*/ } [/cpp]

Outputs for different Downsample Factors are as follows:

...
Image Size : 16433 x 3284
Template Size : 4899 x 3280
Downsample Factor: 1
Not enough memory allocated for the operation
...

...
Image Size : 8216 x 1642
Template Size : 2449 x 1640
Downsample Factor: 2
No error, it's OK
...

...
Image Size : 4108 x 821
Template Size : 1224 x 820
Downsample Factor: 4
No error, it's OK
...

SergeyKostrov · ‎04-07-2012

A screenshot demonstrates a total amount of memory allocated for a Test-Case with
a source image16384x4096 and a template image 8192x2048:

SergeyKostrov · ‎04-07-2012

A screenshot demonstrates a total amount of memory allocated for a Test-Case with
a source image16384x4096 and a template image 8192x2048:

SergeyKostrov · ‎04-07-2012

A screenshot demonstratestotal amounts of memory allocated forTest-Cases with ( from left to right ):

source image4096x1024 and a template image 2048x512
source image8192x2048 and a template image 4096x1024
source image16384x4096 and a template image 8192x2048

SergeyKostrov · ‎04-07-2012

Hi Luis,

I wonder if a downsampling workaround could work for you? That is, downsample your big source image, or both images ( source and template ),
by some factor.

Best regards,
Sergey

igorastakhov · ‎04-09-2012

Hi All,

ippiCrossCorr for such huge images/templates is based on 2D 32f FFT and uses several internal buffers proportional to template size (extended to the nearest power of two); if you use threaded version of IPP library - this number is multiplied by the number of available cores; the best approach is provided above - to perform downsampling or use pyramides and perform CrossCorr for reduced image/template - with aproximate coordinates of max coefficient you'll be able to reduce search area for the full resolution image (CrossCorr is very expensive - up to ~300-600 cpu clocks/pixel, while downsampling is ~100x faster).

Regards,
Igor

SergeyKostrov · ‎04-10-2012

Quoting igorastakhov

...ippiCrossCorr for such huge images/templates is based on 2D 32f FFT and uses several internal buffers
proportional to template size (extended to the nearest power of two)...

Image size for a source image,that Luis uses, can not be considered as ahuge.

Regarding 2D FFT. That is a very interesting technical information, but I really don't understand why almost
1.5GB of memory is used to calculate it.

Best regards,
Sergey

igorastakhov · ‎04-11-2012

Sergey,

take a look at the formula and try to analyse how many buffers you need for separate calculation of nominator (via FFT) and denominator, take into account that FFT itself requires internal twiddle tables and a buffer for FFT itself + buffer for optimal transpose columns to rows for 2D operation. For smal (in comparison with image size) template sizes (~8x less) so called "frame" algorithm is used and FFT size is proportional to template size, for your case FFT size is proportional to src size:

16443*3284 -> 32768*4096*4 = ~500 M for 1 buffer

algorithm requires 3 buffers - so 1.5 Gb

Regards,
Igor

SergeyKostrov · ‎04-11-2012

I'd like to make a note regarding a matrix transpose operation:

Why a Diagonal based ( inplace )algorithm for a matrix transposeis not used?

Quoting igorastakhov

Sergey,

take a look at the formula and try to analyse how many buffers you need for separate calculation of nominator (via FFT) and denominator, take into account that FFT itself requires internal twiddle tables and a buffer for FFT itself + buffer for optimal transpose columns to rows for 2D operation. For smal (in comparison with image size) template sizes (~8x less) so called "frame" algorithm is used and FFT size is proportional to template size, for your case FFT size is proportional to src size:

16443*3284 -> 32768*4096*4 = ~500 M for 1 buffer

algorithm requires 3 buffers - so 1.5 Gb

Regards,
Igor

The memoryproblem with 'ippiCrossCorr...' IPP functions is almost 10 years old because it is easily
reproducible with IPP v3.x.

Best regards,
Sergey

rsapaico · ‎04-17-2012

Hello Sergey, Igor,

Thank you very much for your feedback.

16443*3284 -> 32768*4096*4 = ~500 M for 1 buffer

algorithm requires 3 buffers - so 1.5 Gb

Just to add, I am using 64-bit IPP Libraries, and I also have 16GB of memory.

I also have 8 CPUs.

There is one thing I quite didn't understand:

Is 1.5GB the total memory allocated, regardless of the number of threads that I am using?

Or 1.5GB is allocated per thread?

On the other hand, the image size (16443x3284) is already a subsampled version of the original image, and at present I cannot afford to further subsample it because I will lose some details.

Thank you very much for your feedback, it is very helpful.

I hope we can manage to find a "maximum size" for an image, so that I can know beforehand whether "ippiCrossCorrValid" will return the allocation error.

Best regards,

Luis

rsapaico · ‎04-17-2012

Hello again,

I just did some experiments regarding the number of threads.

I set the number of threads using "ippSetNumThreads".

ippSetNumThreads = 1

ippSetNumThreads = 2

"ippiCrossCorrValid_NormLevel_8u32f_C1R" works (returns "ippStsNoErr").

ippSetNumThreads > 2

"ippiCrossCorrValid_NormLevel_8u32f_C1R" returns "ippStsMemAllocErr".

So it seems that the number of threads is the deciding factor, as Igor pointed out before.

Just for the record, if I am using 4 threads, does it mean that I need 4*1.5GB of memory, for the image size in discussion?

Best regards,

Luis Sapaico

SergeyKostrov · ‎04-17-2012

Hi everybody,

Quoting rsapaico

...

Just to add, I am using 64-bit IPP Libraries, and I also have 16GB of memory.

I also have 8 CPUs.

[SergeyK] In that case the situation gets even more interesting because there are lots of memory.
What is a size of the Virtual Memory file?

There is one thing I quite didn't understand:

Is 1.5GB the total memory allocated, regardless of the number of threads that I am using?

Or 1.5GB is allocated per thread?

On the other hand, the image size (16443x3284) is already a subsampled version of the original image, and at present
I cannot afford to further subsample it because I will lose some details.

[SergeyK] In that case a solution based on Partitioning of the source image( suggested byIgor ) looks good.

Did you try to consider another solutions, like SADor Neural Network based?

SAD - Sum of Absolute Differences

Best regards,
Sergey