ippmalloc delay in memory availabity for large amounts of data

Ockham_s_Razor · ‎11-12-2020

There appears to be major difference when pages are zeroed in ippMalloc and malloc under windows for large amounts of data.

In particular consider a "4k RBG" image in Ipp32f. (The 4k refers to 4k pixels per scan line.) This has size

n = 4096*3112*3*sizeof(ipp32f) or about 153 Mb

allocate it using auto *data = ippMalloc(n);

Now time the loop

for (auto i = 0; i < 5; i++)
{
memset(data, 0, n);
}

the timing results for each loop, in milliseconds, are

[0]: 18.111
[1]: 4.8840
[2]: 5.7340
[3]: 5.3860
[4]: 5.4100

change the allocation to

auto *data = malloc(n);

The timing becomes

[0]: 5.1219
[1]: 5.4199
[2]: 5.0430
[3]: 5.6440
[4]: 5.5810

Notice the first access is about 12 milliseconds longer. Which on my system is the time to zero this amount of data.

This delay is present in any call that needs to access all of the data.

Windows malloc works by a background thread zeroing memory used by virtualmalloc, thus the program usually doesn't wait for the the zeroing of memory.

I'm more pointing this out than asking a question as it took many hours to track down this delay.

If this was not an intentional design, it may be a case that ippMalloc can use the windows background zeroing of pages to improve performance when large amounts of memory are allocated.

PS: although 12 may seem small, when processing 150,000 4k images, it adds up

Abhinav_S_Intel · ‎11-18-2020

For further investigating this issue, Can you please provide us with an reproducer?