- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to periodically zero init large float arrays, 100+M bytes. I used memset (it calls intel_new_memset) now and it becomes the top hot spot during vtune analysis. How can I optimize this part? Thanks.
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zhu Wang,
What this may indicate is you are unnecessarily zero init-ing the large float arrays.
If this is not the case, then, the zeroing out of the array may be the hottest spot
but it will not benifet from further optimizations.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Zhu Wang,
What this may indicate is you are unnecessarily zero init-ing the large float arrays.
If this is not the case, then, the zeroing out of the array may be the hottest spot
but it will not benifet from further optimizations.
Jim Dempsey
Thanks for your response. I have to zero init the large arays. I wonder whether I should chop it into smaller arrays, or use some other methods. This is a sparse array. Are you saying this is the best I can do with memset?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you sure you must zero init the arrays?
Are your arrays allocated as full arrays, inited as full arrays, then sparsely used?
(i.e. you are zero-initing elements that will never beused.)
Many methods that require initial values of 0.0 are loops performing summations
for(i=0; i
This can easily be converted to
if(firstTime) // or test outer loop control varaiable value
{
firstTime = false;
for(i=0; i
}
else
{
for(i=0; i
}
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Are you sure you must zero init the arrays?
Are your arrays allocated as full arrays, inited as full arrays, then sparsely used?
(i.e. you are zero-initing elements that will never beused.)
Many methods that require initial values of 0.0 are loops performing summations
for(i=0; i
This can easily be converted to
if(firstTime) // or test outer loop control varaiable value
{
firstTime = false;
for(i=0; i
}
else
{
for(i=0; i
}
Jim Dempsey
Thank you for the example. but my case is not as simple as this. It is more like a FFT transform, which requires a valid init value for all matrix elements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include <xmmintrin.h>
_mm_stream_ps() may be a good bet, since it bypasses cache. But write-combining still occurs, so basically a cache line at at time is zeroed in memory without polluting the cache. This may cut the overhead in half. I suspect you are operating on doubles, but since you are just zeroing out memory, it's ok to pretend you're zeroing (twice as many) floats. I think the address of your arrays need to be 16-byte alignd to use this feature. You can use posix_memalign() in place of malloc() if you are allocating memory from heap.
Good luck,
-Jeff
_mm_stream_ps() may be a good bet, since it bypasses cache. But write-combining still occurs, so basically a cache line at at time is zeroed in memory without polluting the cache. This may cut the overhead in half. I suspect you are operating on doubles, but since you are just zeroing out memory, it's ok to pretend you're zeroing (twice as many) floats. I think the address of your arrays need to be 16-byte alignd to use this feature. You can use posix_memalign() in place of malloc() if you are allocating memory from heap.
Good luck,
-Jeff
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page