Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Writing to a file in parallel

rolf-anders
Beginner
388 Views

Hi, how do I write to a file from a parallel_for loop? I do some calculations in parallel and need to write the results to a file. I have looked at the pipeline examples. Is there an other way to write to a file from parallel execution?

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
388 Views
Your question is incomplete. What sequence do you want your output?

a) in order of the iteration space (i.e. collated)
or
b) in arbitrary order (i.e. order of completion of processing)

Assuming you want collated output without using parallel_pipeline consider the following

// non-parallell_for for this part
// up to 4 objects at a time
for(i=0; i .lt. nObj; i += 4)
{
parallel_invoke(
[&]() { doWork(Obj); },
[&]() { if(i+1 .lt. nObj) doWork(Obj[i+1]); },
[&]() { if(i+2 .lt. nObj) doWork(Obj[i+2]); },
[&]() { if(i+3 .lt. nObj) doWork(Obj[i+3]); });
doWrite(Obj);
if(i+1 .lt. nObj) doWrite(Obj[i+1]);
if(i+2 .lt. nObj) doWrite(Obj[i+2]);
if(i+3 .lt. nObj) doWrite(Obj[i+3]);
}

Obviously the above is hard wired and not as optimal as a parallel_pipeline, but on the other hand it is relatively easy to implement.

Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
388 Views
And a revision:

[cpp]if(nObj)
{
    // non-parallell_for for this part
    // up to 4 objects at a time
    parallel_invoke(
        [&]() { doWork(Obj[0]); },
        [&]() { if(1 .lt. nObj) doWork(Obj[1]); },
        [&]() { if(2 .lt. nObj) doWork(Obj[2]); }, 
        [&]() { if(3 .lt. nObj) doWork(Obj[3]); });
    if(nObj .le. 4)
    {
        doWrite(Obj[0]);
        if(1 .lt. nObj) doWrite(Obj[1]); 
        if(2 .lt. nObj) doWrite(Obj[2]); 
        if(3 .lt. nObj) doWrite(Obj[3]); 
    }
    else
    {
        // nObj .gt. 4
        int i;
        for(i=4; i .lt. nObj; i += 4)
        {
            parallel_invoke(
                [&]() {
                    // output prior work data (all 4)
                    doWrite(Obj[i-4]);
                    doWrite(Obj[i-3]); 
                    doWrite(Obj[i-2]); 
                    doWrite(Obj[i-1]); 
                },
                [&]() { doWork(Obj); },
                [&]() { if(i+1 .lt. nObj) doWork(Obj[i+1]); },
                [&]() { if(i+2 .lt. nObj) doWork(Obj[i+2]); }, 
                [&]() { if(i+3 .lt. nObj) doWork(Obj[i+3]); });
        } // for
        if(i-4 .lt. nObj) doWork(Obj[i-4]);
        if(i-3 .lt. nObj) doWork(Obj[i-3]);
        if(i-2 .lt. nObj) doWork(Obj[i-2]); 
        if(i-1 .lt. nObj) doWork(Obj[i-1]);
    }
} // if(nObj)
[/cpp]

Jim Dempsey
0 Kudos
RafSchietekat
Valued Contributor III
388 Views
If the output records are all the same size, maybe random I/O might work (lseek/fseek/...), but I doubt whether the O.S. would be equipped to transparently and efficiently handle multiple write buffers to avoid killing performance with long seek-related waits, so maybe there's an opportunity here for another parallel data structure, or even an active adapter. You might also try mapping the output file to memory to improve random-access performance, e.g., using mmap().

But we need more information than just whether the output should be in order. The pipeline example is just that, an example, not a design pattern. If you want something comparable just to get your feet wet, without regard for the ultimate goal of performance, go ahead and try the random I/O (properly synchronised, of course), otherwise you need to question everything, including the choice of parallel_for(). Perhaps all it takes is buffering everything in resident memory first: if you have the gigabytes, use them!
0 Kudos
rolf-anders
Beginner
388 Views
I ended up with writing to a buffer and then use a mutex around the file writer. That workes, but I kind hoped that there was something in the tbb library that would handle the syncronization automatically. The data is calculated from GB's of data, but the results are not that big, so the size is no issue for that ;)
0 Kudos
RafSchietekat
Valued Contributor III
388 Views
If you can hold the buffer in memory, and only write it after the parallel_for() has ended, I don't see the need for a mutex?

I think the art in designing languages and libraries is in capturing and facilitating useful patterns (it's a lot easier to do object-oriented programming in C++ than in C), not to throw in everything and the kitchen sink.
0 Kudos
Reply