Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6594 Discussions

ippmalloc delay in memory availabity for large amounts of data

Ockham_s_Razor
Beginner
359 Views

There appears to be major difference when pages are zeroed in ippMalloc and malloc under windows for large amounts of data.

In particular consider a "4k RBG" image in Ipp32f.  (The 4k refers to 4k pixels per scan line.)  This has size

 n = 4096*3112*3*sizeof(ipp32f) or about 153 Mb

allocate it using  auto *data = ippMalloc(n);

Now time the loop

for (auto i = 0; i < 5; i++)
{
    memset(data, 0, n);
}

the timing results for each loop, in milliseconds, are

[0]: 18.111
[1]: 4.8840
[2]: 5.7340
[3]: 5.3860
[4]: 5.4100

change the allocation to

auto *data = malloc(n);

The timing becomes

[0]: 5.1219
[1]: 5.4199
[2]: 5.0430
[3]: 5.6440
[4]: 5.5810

Notice the first access is about 12 milliseconds longer.  Which on my system is the time to zero this amount of data.

This delay is present in any call that needs to access all of the data.

Windows malloc works by a background thread zeroing memory used by virtualmalloc, thus the program usually doesn't wait for the the zeroing of memory.

I'm more pointing this out than asking a question as it took many hours to track down this delay.

If this was not an intentional design, it may be a case that ippMalloc can use the windows background zeroing of pages to improve performance when large amounts of memory are allocated.

PS: although 12 may seem small, when processing 150,000 4k images, it adds up

 

 

 

0 Kudos
1 Reply
Abhinav_S_Intel
Moderator
319 Views

For further investigating this issue, Can you please provide us with an reproducer?

Reply