How to zero out a huge vector?

soroosh · ‎01-29-2004

I'm trying to figure out a faster way to zero out a huge vector (1000000+ elements) in fortran. I'm using A = 0.0 instead of looping through the vector, which reduces the time significantly, but still my code spends most of it's time zeroing out the vector and I have to do this multiple times. I appreciate it if someone out there can share other techniques which can be used zero out a huge vector more efficiently.

TimP · ‎01-29-2004

If your vector is that large,I guess you'reusing IPF. In that case, you should be compiling with -O3 option, so that you get automatic software prefetch. Assuming it's not worth your while to evaluate the IPP library, you can dig into this further by examining the compiler optimization reports.

On Xeon, of course, for a maximum size vector, you would be using options like -xW -vec_report3, looking for generation of parallel store instructions.

For example, here is a script to compile some Fortran sources and generate optimization reports. The options which are turned on include asking for the statistics on prefetch, if any. You would be looking for theloop to report unrolling by 8 (possibly more, for less than 64-bit data) and for it to have set a prefetch distance (given in iterations of the unrolled loop) equivalent to at least 300 clock cycles (for a fast Itanium 2). If it scheduled for 3 clock cycles, prefetch distance of 100 should be good. Longer distances could be desired if you are running multi-processor and using threaded parallel compilation.

ifort -c -O3 -ip
-opt_report -opt_report_file=$*.opt -opt_report_phase ecg
-opt_report_phase hlo -opt_report_level max $*.f

As you can see, if you didn't give the proper clues about what you want, my answer may be well off target. So I won't go any further.

soroosh · ‎01-30-2004

Thanks for the info. I'm already using O3 option with all sorts of standard optimization techniques, but still it takes a lot of time. I was hoping to find another way to do this faster (bit shifting or other neat tricks, which I'm not aware off). If I'm understanding you correctly, looks like I'm doing this as fast as possible. It's still taking almost 30 percent of the total run time just to zero out vectors.

rahzan · ‎02-02-2004

I don't know if this will do it but the Win SDK has a ZeroMemory

routine. If used correctly it might be faster than =0.

Tim

Jugoslav_Dujic · ‎02-03-2004

I doubt it. ZeroMemory is one of rare "fake" APIs -- they're not actually stored in system DLLs, but provided as C macros or RTL implementation. Here's excerpt from winnt.h:

#define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))

In CVF, it's implemented in DFWIN.lib, which is not a stub for any system dll, but a collection of "macro" functions. I'm sure compiler can optimize expression:

f=0

at least a bit better than

call ZeroMemory(f, sizeof(f))

Jugoslav

Zhanghong_T_ · ‎02-03-2004

But I have tested in CVF and found ZeroMemory is faster than the normal way.