- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to figure out a faster way to zero out a huge vector (1000000+ elements) in fortran. I'm using A = 0.0 instead of looping through the vector, which reduces the time significantly, but still my code spends most of it's time zeroing out the vector and I have to do this multiple times. I appreciate it if someone out there can share other techniques which can be used zero out a huge vector more efficiently.
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your vector is that large,I guess you'reusing IPF. In that case, you should be compiling with -O3 option, so that you get automatic software prefetch. Assuming it's not worth your while to evaluate the IPP library, you can dig into this further by examining the compiler optimization reports.
On Xeon, of course, for a maximum size vector, you would be using options like -xW -vec_report3, looking for generation of parallel store instructions.
For example, here is a script to compile some Fortran sources and generate optimization reports. The options which are turned on include asking for the statistics on prefetch, if any. You would be looking for theloop to report unrolling by 8 (possibly more, for less than 64-bit data) and for it to have set a prefetch distance (given in iterations of the unrolled loop) equivalent to at least 300 clock cycles (for a fast Itanium 2). If it scheduled for 3 clock cycles, prefetch distance of 100 should be good. Longer distances could be desired if you are running multi-processor and using threaded parallel compilation.
ifort -c -O3 -ip
-opt_report -opt_report_file=$*.opt -opt_report_phase ecg
-opt_report_phase hlo -opt_report_level max $*.f
-opt_report -opt_report_file=$*.opt -opt_report_phase ecg
-opt_report_phase hlo -opt_report_level max $*.f
As you can see, if you didn't give the proper clues about what you want, my answer may be well off target. So I won't go any further.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the info. I'm already using O3 option with all sorts of standard optimization techniques, but still it takes a lot of time. I was hoping to find another way to do this faster (bit shifting or other neat tricks, which I'm not aware off). If I'm understanding you correctly, looks like I'm doing this as fast as possible. It's still taking almost 30 percent of the total run time just to zero out vectors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know if this will do it but the Win SDK has a ZeroMemory
routine. If used correctly it might be faster than =0.
Tim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I doubt it. ZeroMemory is one of rare "fake" APIs -- they're not actually stored in system DLLs, but provided as C macros or RTL implementation. Here's excerpt from winnt.h:
#define RtlZeroMemory(Destination,Length) memset((Destination),0,(Length))
In CVF, it's implemented in DFWIN.lib, which is not a stub for any system dll, but a collection of "macro" functions. I'm sure compiler can optimize expression:
f=0
at least a bit better than
call ZeroMemory(f, sizeof(f))
Jugoslav
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But I have tested in CVF and found ZeroMemory is faster than the normal way.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page