- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There appears to be major difference when pages are zeroed in ippMalloc and malloc under windows for large amounts of data.
In particular consider a "4k RBG" image in Ipp32f. (The 4k refers to 4k pixels per scan line.) This has size
n = 4096*3112*3*sizeof(ipp32f) or about 153 Mb
allocate it using auto *data = ippMalloc(n);
Now time the loop
for (auto i = 0; i < 5; i++)
{
memset(data, 0, n);
}
the timing results for each loop, in milliseconds, are
[0]: 18.111
[1]: 4.8840
[2]: 5.7340
[3]: 5.3860
[4]: 5.4100
change the allocation to
auto *data = malloc(n);
The timing becomes
[0]: 5.1219
[1]: 5.4199
[2]: 5.0430
[3]: 5.6440
[4]: 5.5810
Notice the first access is about 12 milliseconds longer. Which on my system is the time to zero this amount of data.
This delay is present in any call that needs to access all of the data.
Windows malloc works by a background thread zeroing memory used by virtualmalloc, thus the program usually doesn't wait for the the zeroing of memory.
I'm more pointing this out than asking a question as it took many hours to track down this delay.
If this was not an intentional design, it may be a case that ippMalloc can use the windows background zeroing of pages to improve performance when large amounts of memory are allocated.
PS: although 12 may seem small, when processing 150,000 4k images, it adds up
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For further investigating this issue, Can you please provide us with an reproducer?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page