Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Employee
26 Views

BKM for zero init large float array

I need to periodically zero init large float arrays, 100+M bytes. I used memset (it calls intel_new_memset) now and it becomes the top hot spot during vtune analysis. How can I optimize this part? Thanks.
0 Kudos
5 Replies
Highlighted
26 Views


Zhu Wang,

What this may indicate is you are unnecessarily zero init-ing the large float arrays.
If this is not the case, then, the zeroing out of the array may be the hottest spot
but it will not benifet from further optimizations.

Jim Dempsey
0 Kudos
Highlighted
Employee
26 Views


Zhu Wang,

What this may indicate is you are unnecessarily zero init-ing the large float arrays.
If this is not the case, then, the zeroing out of the array may be the hottest spot
but it will not benifet from further optimizations.

Jim Dempsey

Thanks for your response. I have to zero init the large arays. I wonder whether I should chop it into smaller arrays, or use some other methods. This is a sparse array. Are you saying this is the best I can do with memset?
0 Kudos
Highlighted
26 Views


Are you sure you must zero init the arrays?

Are your arrays allocated as full arrays, inited as full arrays, then sparsely used?
(i.e. you are zero-initing elements that will never beused.)

Many methods that require initial values of 0.0 are loops performing summations

for(i=0; i Array += function(...);

This can easily be converted to

if(firstTime) // or test outer loop control varaiable value
{
firstTime = false;
for(i=0; i Array = function(...);
}
else
{
for(i=0; i Array += function(...);
}

Jim Dempsey
0 Kudos
Highlighted
Employee
26 Views


Are you sure you must zero init the arrays?

Are your arrays allocated as full arrays, inited as full arrays, then sparsely used?
(i.e. you are zero-initing elements that will never beused.)

Many methods that require initial values of 0.0 are loops performing summations

for(i=0; i Array += function(...);

This can easily be converted to

if(firstTime) // or test outer loop control varaiable value
{
firstTime = false;
for(i=0; i Array = function(...);
}
else
{
for(i=0; i Array += function(...);
}

Jim Dempsey

Thank you for the example. but my case is not as simple as this. It is more like a FFT transform, which requires a valid init value for all matrix elements.
0 Kudos
Highlighted
Beginner
26 Views

#include <xmmintrin.h>
_mm_stream_ps() may be a good bet, since it bypasses cache. But write-combining still occurs, so basically a cache line at at time is zeroed in memory without polluting the cache. This may cut the overhead in half. I suspect you are operating on doubles, but since you are just zeroing out memory, it's ok to pretend you're zeroing (twice as many) floats. I think the address of your arrays need to be 16-byte alignd to use this feature. You can use posix_memalign() in place of malloc() if you are allocating memory from heap.

Good luck,
-Jeff
0 Kudos