- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, how do I write to a file from a parallel_for loop? I do some calculations in parallel and need to write the results to a file. I have looked at the pipeline examples. Is there an other way to write to a file from parallel execution?
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your question is incomplete. What sequence do you want your output?
a) in order of the iteration space (i.e. collated)
or
b) in arbitrary order (i.e. order of completion of processing)
Assuming you want collated output without using parallel_pipeline consider the following
// non-parallell_for for this part
// up to 4 objects at a time
for(i=0; i .lt. nObj; i += 4)
{
parallel_invoke(
[&]() { doWork(Obj); },
[&]() { if(i+1 .lt. nObj) doWork(Obj[i+1]); },
[&]() { if(i+2 .lt. nObj) doWork(Obj[i+2]); },
[&]() { if(i+3 .lt. nObj) doWork(Obj[i+3]); });
doWrite(Obj);
if(i+1 .lt. nObj) doWrite(Obj[i+1]);
if(i+2 .lt. nObj) doWrite(Obj[i+2]);
if(i+3 .lt. nObj) doWrite(Obj[i+3]);
}
Obviously the above is hard wired and not as optimal as a parallel_pipeline, but on the other hand it is relatively easy to implement.
Jim Dempsey
a) in order of the iteration space (i.e. collated)
or
b) in arbitrary order (i.e. order of completion of processing)
Assuming you want collated output without using parallel_pipeline consider the following
// non-parallell_for for this part
// up to 4 objects at a time
for(i=0; i .lt. nObj; i += 4)
{
parallel_invoke(
[&]() { doWork(Obj); },
[&]() { if(i+1 .lt. nObj) doWork(Obj[i+1]); },
[&]() { if(i+2 .lt. nObj) doWork(Obj[i+2]); },
[&]() { if(i+3 .lt. nObj) doWork(Obj[i+3]); });
doWrite(Obj);
if(i+1 .lt. nObj) doWrite(Obj[i+1]);
if(i+2 .lt. nObj) doWrite(Obj[i+2]);
if(i+3 .lt. nObj) doWrite(Obj[i+3]);
}
Obviously the above is hard wired and not as optimal as a parallel_pipeline, but on the other hand it is relatively easy to implement.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And a revision:
Jim Dempsey
[cpp]if(nObj) { // non-parallell_for for this part // up to 4 objects at a time parallel_invoke( [&]() { doWork(Obj[0]); }, [&]() { if(1 .lt. nObj) doWork(Obj[1]); }, [&]() { if(2 .lt. nObj) doWork(Obj[2]); }, [&]() { if(3 .lt. nObj) doWork(Obj[3]); }); if(nObj .le. 4) { doWrite(Obj[0]); if(1 .lt. nObj) doWrite(Obj[1]); if(2 .lt. nObj) doWrite(Obj[2]); if(3 .lt. nObj) doWrite(Obj[3]); } else { // nObj .gt. 4 int i; for(i=4; i .lt. nObj; i += 4) { parallel_invoke( [&]() { // output prior work data (all 4) doWrite(Obj[i-4]); doWrite(Obj[i-3]); doWrite(Obj[i-2]); doWrite(Obj[i-1]); }, [&]() { doWork(Obj); }, [&]() { if(i+1 .lt. nObj) doWork(Obj[i+1]); }, [&]() { if(i+2 .lt. nObj) doWork(Obj[i+2]); }, [&]() { if(i+3 .lt. nObj) doWork(Obj[i+3]); }); } // for if(i-4 .lt. nObj) doWork(Obj[i-4]); if(i-3 .lt. nObj) doWork(Obj[i-3]); if(i-2 .lt. nObj) doWork(Obj[i-2]); if(i-1 .lt. nObj) doWork(Obj[i-1]); } } // if(nObj) [/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the output records are all the same size, maybe random I/O might work (lseek/fseek/...), but I doubt whether the O.S. would be equipped to transparently and efficiently handle multiple write buffers to avoid killing performance with long seek-related waits, so maybe there's an opportunity here for another parallel data structure, or even an active adapter. You might also try mapping the output file to memory to improve random-access performance, e.g., using mmap().
But we need more information than just whether the output should be in order. The pipeline example is just that, an example, not a design pattern. If you want something comparable just to get your feet wet, without regard for the ultimate goal of performance, go ahead and try the random I/O (properly synchronised, of course), otherwise you need to question everything, including the choice of parallel_for(). Perhaps all it takes is buffering everything in resident memory first: if you have the gigabytes, use them!
But we need more information than just whether the output should be in order. The pipeline example is just that, an example, not a design pattern. If you want something comparable just to get your feet wet, without regard for the ultimate goal of performance, go ahead and try the random I/O (properly synchronised, of course), otherwise you need to question everything, including the choice of parallel_for(). Perhaps all it takes is buffering everything in resident memory first: if you have the gigabytes, use them!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ended up with writing to a buffer and then use a mutex around the file writer. That workes, but I kind hoped that there was something in the tbb library that would handle the syncronization automatically. The data is calculated from GB's of data, but the results are not that big, so the size is no issue for that ;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you can hold the buffer in memory, and only write it after the parallel_for() has ended, I don't see the need for a mutex?
I think the art in designing languages and libraries is in capturing and facilitating useful patterns (it's a lot easier to do object-oriented programming in C++ than in C), not to throw in everything and the kitchen sink.
I think the art in designing languages and libraries is in capturing and facilitating useful patterns (it's a lot easier to do object-oriented programming in C++ than in C), not to throw in everything and the kitchen sink.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page