There appears to be major difference when pages are zeroed in ippMalloc and malloc under windows for large amounts of data.
In particular consider a "4k RBG" image in Ipp32f. (The 4k refers to 4k pixels per scan line.) This has size
n = 4096*3112*3*sizeof(ipp32f) or about 153 Mb
allocate it using auto *data = ippMalloc(n);
Now time the loop
for (auto i = 0; i < 5; i++)
memset(data, 0, n);
the timing results for each loop, in milliseconds, are
change the allocation to
auto *data = malloc(n);
The timing becomes
Notice the first access is about 12 milliseconds longer. Which on my system is the time to zero this amount of data.
This delay is present in any call that needs to access all of the data.
Windows malloc works by a background thread zeroing memory used by virtualmalloc, thus the program usually doesn't wait for the the zeroing of memory.
I'm more pointing this out than asking a question as it took many hours to track down this delay.
If this was not an intentional design, it may be a case that ippMalloc can use the windows background zeroing of pages to improve performance when large amounts of memory are allocated.
PS: although 12 may seem small, when processing 150,000 4k images, it adds up