send "go" signal to PXFFORK-ed process?

nooj · ‎11-02-2009

I have a single-thread simulation code that spends about 15% of its time writing output data. Many speedup tricks have been implemented (autoparallelization, vectorization, etc.; but not (yet!) OpenMP, BLAS, or similar), and the latest brilliant idea I had was to fork a child process to do the writing. Given that I have idle processors, it offered a theoretical speedup of 15% and actual speedup of 13%. Not bad!

However, execution goes like this:

parallelizable stuff (60%); non-parallelizable stuff (25%); write output (15%); repeat.

So writing the data now occurs in parallel with tasks that are themselves parallelizable, and not during the part that is more or less non-parallelizable. If you see what I mean. Is there an easy way to signal my child process to wait until I say "go" to write its data to a file? After all, I have its process ID. Thus I can parallelize the parallelizable stuff without losing a processor, and make good use of the idle processors during the non-parallelizable part. I would like the solution to work on both linux and mac ifort versions 11.x. Is my now-idle child process going to get paged out to disk? That would be sad. Is this kind of thinking just not the right way to go when converting single-thread code to scalable parallel code?

I realize there are other ways to accomplish speedup (ignore the problem; rewrite the "non-parallelizable stuff" so it is parallelizable; don't overwrite data from the previous timestep, and fork when the non-parallelizable tasks begin), but those require large rewrites I'm not ready to handle.

One other option I haven't tried is to write the output to a big variable, then fork and write to a file when the non-parallelizable tasks start. How much speedup do you think I would get trying this? I suspect it would be a lot less than what I got. The output is one 4+MB ascii text file of 12-digit precision real numbers per iteration. (Yeah, writing a binary file would be a lot better, wouldn't it? Not gonna happen for a couple of months.)

Any help would be appreciated!

- Fred

TimP · ‎11-02-2009

If your aim is to use asynchronous I/O, why not use the Fortran facilities for that?

nooj · ‎11-02-2009

asynchronous i/o? whoah, i didn't know that existed! i'll give it a try! (we won't know how well it worked until i manage to max out all processors).

i have one followup question, though, which came up just now when i investigated asynchronous i/o: what defines a "small" write (which asynchronous i/o is bad at), versus a "large" write (which it's good at)?

my code looks like this (lots of "small" writes; see below). how can i convert it to "large" writes? i have complete control over the shape and arrangement of data within the arrays that are written. (that is to say, if i knew what "large" writes looked like, i could probably rewrite the code to use them.)

- Fred

! this writes x, y, and z values of data points in 3-space.

! all the x's come first, on one line.

! then all the y's, on a second line.

! then all the z's, on a third line.

do i = 1, MAX_NODES ! roughly 100 to 10,000

write(gmv_file,'(f)',advance="NO") x_lin(i,1)

enddo

write(gmv_file,*)

do i = 1, MAX_NODES

write(gmv_file,'(f)',advance="NO") x_lin(i,2)

enddo

write(gmv_file,*)

do i = 1, MAX_NODES

write(gmv_file,'(f)',advance="NO") x_lin(i,3)

enddo

write(gmv_file,*)

! now write integers representing node numbers which

! form hexahedral elements (ie, are shaped like deformed cubes).

do i = 1, MAX_ELEMENTS ! roughly 100 to 10,000

do j = 1, 8

write(gmv_file,'(i8)', advance="NO") IEN_LIN(i,j)

enddo

write(gmv_file,*)

enddo

write(gmv_file,*)