Solved: Performance of Defined Input/Output Procedure

hentall_maccuish__ja · ‎08-12-2020

Hello,

I’m new to defined input/output procedures and I find that the one I have written is eating up 80% of my runtime. I’m trying to saving a lot of data to disk but this still doesn’t seem correct to me; I am generating and processing all this data in the program as well, so just saving it to disk taking 80% of the runtime seems disproportionate. I can see why my I/O procedure would induce a lot of loops and be slow; however, being new to defined I/O procedures I’m not sure what I can do about it. Any suggestion would be greatly appreciated. The defined type I am trying to save with its routine is below.

Thanks,

Jamie

    Module Policy
    implicit none
    type sparseCOOType
        integer :: col
        integer :: row
        real :: val
    end type sparseCOOType

    type policyType
        type(sparseCOOType), allocatable :: COO(:)
    contains
    procedure :: write_sample => write_container_sample_impl
    procedure :: read_sample  => read_container_sample_impl

    generic   :: write(unformatted) => write_sample
    generic   :: read(unformatted) => read_sample
    end type policyType
    contains

    subroutine write_container_sample_impl(this, unit, iostat, iomsg)
    class(policyType), intent(in)    :: this
    integer, intent(in)         :: unit
    integer, intent(out)        :: iostat
    character(*), intent(inout) :: iomsg
    integer :: i

    write(unit, iostat=iostat, iomsg=iomsg) size(this%COO)
    do i=1,size(this%COO)
        write(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%col
        write(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%row
        write(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%val
    end do
    end subroutine write_container_sample_impl

    subroutine read_container_sample_impl(this, unit, iostat, iomsg)
    class(policyType), intent(inout) :: this
    integer, intent(in)         :: unit
    integer, intent(out)        :: iostat
    character(*), intent(inout) :: iomsg
    integer :: i, sizeCOO

    read(unit, iostat=iostat, iomsg=iomsg) sizeCOO
    allocate(this%COO(sizeCOO))

    do i=1,sizeCOO
        read(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%col
        read(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%row
        read(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%val
    end do

    end subroutine read_container_sample_impl
    end module

Arjen_Markus · ‎08-12-2020

Your current procedure writes and reads all the data elements in turn. But the sparse matrix COO is an array of a simple derived type that just contains three scalars. Such a type can be written directly via the default facilities. So, you could write your policy object "this" via:

write(lun) this%COO

That will make the writing (and similarly the reading) much faster

View solution in original post

mecej4 · ‎08-12-2020

When it comes to performance questions, one is able to provide useful guidance only in the context of what_actions/how_often/in_what_way. You showed the code for the module, but provided little information regarding how that module is used, how much data is generated and written, etc.

If your program did nothing more than storing and restoring the row, col and val arrays using your defined I/O procedures, it is quite reasonable for the I/O to take most of the run time. It is only when we compare the I/O time with the rest of the time time spent (in doing other useful things) that we can decide whether the I/O is taking up more than a reasonable fraction of the run time.

hentall_maccuish__ja · ‎08-12-2020

The I/O procedure is called 16 times to save large allocatable arrays of type policyType having dimension ranging from (30,8,1,1,10) to (30,8,3003,11,10). I have included below the open, write close snippet that makes those 16 calls.

More than that I don’t see how any additional information is relevant to diagnosing the efficiency of I/O procedure. All the code is posted for the I/O procedure and all the information about the calls to it also. When I gave the 80% figure it was meant to be indicative and I thought you would assume some average level of computational complexity behind the data I am trying to save. Not that I might be generating large amounts of random numbers and saving them to disk. Also whether I am write or wrong about 80% of runtime being very high for the saving of data if there is or isn’t a way to make the code more efficient is a separate issue that can be addressed without this information. I did not post additional information about the program as it seems to make my issue as clear as possible. That said, if you think it may be useful, here is an attempt of a summary of what is going on to generate the data. Each entry in the allocatable array of type policyType is the solution to a sequential quadratic programming problem, the input function value to which have to be found by fixed point iteration and which itself is embedded in an outer fixed point iteration.

Thanks,

Jamie

open (unit=201,form="unformatted", file=outfile, status='unknown', action='write')
write (201)  modelObjects%policy 
close( unit=201)

mecej4 · ‎08-12-2020

I suspect that there are two reason for the I/O taking up much CPU time.

The first is that you have a DO loop with an iteration count as large as 30*8*3003*11*10.

Each iteration of the loop writes or reads 12 bytes of "payload" as 3 records plus 24 bytes of record length markers. For each such record, your defined I/O method gets called, with 4+ arguments on the stack. That is a lot of overhead.

In addition, you include the optional arguments IOSTAT= and IOMSG=. Finding and stuffing in the values to return, even the zero and blank that are normally expected, is an additional overhead.

All this is done 16 times, as you wrote.

Is it possible to restructure the work so that the entire col array is written with one WRITE? Similarly for the row and val arrays?

hentall_maccuish__ja · ‎08-12-2020

Great thanks, I will remove IOSTAT= and IOMSG= as a start. I had wondered if removing these might help a bit but hadn’t tried yet. It’s good to have that confirmed.

“Is it possible to restructure the work so that the entire col array is written with one WRITE? Similarly for the row and val arrays?” I don’t know that was the kind of question I was trying to get at when I said I could see why my code would imply lots of loops but don’t know how to change it. This is my first user defined I/O procedure so I don’t know what options are available there, as I have written it, it seems that for each entry in the array of type policy there is one call so I have access to just the object coo. Are there other options for the read write routine that would pass me the whole object?

If there aren’t other option with the I/O procedure itself, I can’t see how to reorganise the data to be able to write en masses. COO represents a sparse matrix so will have variable size for each entry in the array of type policyType, and I need to make sure the values of each COO for each entry in the main array are kept separate. Previously I didn’t use a user defined Policy and it was just an allocatalbe array with an extra couple of dimension (i.e. I wasn’t taking advantage of the sparsity of the matrix) which was much easier to write to disk but exhausted my memory for the largest arrays.

mecej4 · ‎08-12-2020

You have selected the (row, col, val) triplet as your basic entity. It gives you great flexibility in the parts of the code where you may visit row and col values in arbitrary order, computing the corresponding val. On the other hand, there is no way of storing information regarding the relation(s) of one triplet to the myriads of other triplets.

We could, instead, have selected as basic entity a sparse matrix type, with (n, nnz, row(:), col(:), val(:)) as our basic entity. In that case, the I/O would be done without any defined I/O procedures, and we would process a moderate number of large records, instead of processing a huge number of tiny records using defined I/O procedures, as you are doing now.

That brings me back to asking the kind of questions that you may not like: What advantages does type sparseCOOType give you in the portions of the code that you have not shown but have described rather tersely? How difficult would it be to define a sparse matrix type of the type that I mentioned, and use that instead of the disjointed triplets in those portions?

Arjen_Markus · ‎08-12-2020

I have only followed the discussion from afar, but would it be a solution to gather the triplets into arrays as suggested by mecej4 and then write them to file (and on input revert the process)? That would reduce the number of reads/writes, while increasing the memory usage.

Another possibility: write to a memory-based file first and then dump its contents to a file on disk.

hentall_maccuish__ja · ‎08-12-2020

Hello Arjen,

Increasing memory usage is a no-go as I'm pretty much at the limit their. Don't follow the memory-based file suggestion, is the suggestion just to work with a binary stream inside the program? If so that sounds really messy.

Thanks,

Jamie

hentall_maccuish__ja · ‎08-12-2020

The sparseCOOType gives no really advantage in the very long very messy section of the code I have described tersely, that’s why I described it tersely. I just don’t know what alternative basic entity structures fit my data and would be more efficient for I/O. This is probably at least partly because I do not clearly understand when a user defined I/O is required, and when it is not. What I am struggling with is the fact that storing a sparse array needs size to be variable which from my limited understanding means I need a user defined I/O procedure.

“We could, instead, have selected as basic entity a sparse matrix type, with (n, nnz, row(:), col(:), val(:)) as our basic entity. In that case, the I/O would be done without any defined I/O procedures, and we would process a moderate number of large records, instead of processing a huge number of tiny records using defined I/O procedures, as you are doing now.” That sounds great but not I’m not sure I follow. What are n and nnz here? Are you suggesting I use either compressed sparse row (or column) where nnz is the number and n collapse all the other dimension I have in the array of type policyType? If so then I have coded up below what I understand you to be saying and it gives me an error “error #5514: A derived type I/O list item that contains a pointer or an allocatable component (ROW) requires a user-defined derived-type input/output procedure.” So it doesn’t appear to be saveable without defined I/O procedure but maybe this wasn’t what you were suggesting.

    program scratch
    implicit none

    type policyType
        integer :: n
        integer :: nnz
        integer, allocatable :: row(:)
        integer, allocatable :: col(:)
        real, allocatable :: val(:)
    end type policyType

    type (policyType):: policy

    open (unit=201,form="unformatted", file="outputFile", status='unknown', action='write')
    write (201)  policy
    close( unit=201)

    end program

hentall_maccuish__ja · ‎08-12-2020

EDIT: What I wrote here is wrong but I'll leave for posterity.

I just thought even if I hadn't misunderstood you and I can't save without a user defined routine this format seems like it would be much more efficient than the one I have now as I would only have one call to the user defined I/O. That's great, thanks. Being able to save without user defined I/O would be even better!

mecej4 · ‎08-12-2020

Assuming that the arrays have been allocated and values filled in properly, just use

write (201)  policy%n,policy%nnz,policy%row,policy%col,policy%val

instead of

write (201) policy

Depending on how the work is structured, you may have to write (and read) two records per matrix:

write (201) policy%n,policy%nnz
write (201) policy%row,policy%col,policy%val

for facilitating the subsequent READ, with allocation of arrays between the two READs.

hentall_maccuish__ja · ‎08-12-2020

Sorry I'm still unclear about how you are suggesting I store the data in this solution. Currently I have an arrray policy(:) and for each entry in policy I have a spare matrix saved as COO. With the code I wrote interpreting your suggestion I need one object of type policyType for each of the entries in the previous array just now they are stored as CSR. So still multiple calls to write (although now they don't need a user defined routine) and I no longer understand what the point of n is.

Arjen_Markus · ‎08-12-2020

Your current procedure writes and reads all the data elements in turn. But the sparse matrix COO is an array of a simple derived type that just contains three scalars. Such a type can be written directly via the default facilities. So, you could write your policy object "this" via:

write(lun) this%COO

That will make the writing (and similarly the reading) much faster

mecej4 · ‎08-12-2020

Perhaps this sketch of a program to form and dump two square matrices will help.

n = number of rows = number of columns

nnz = number of non-zero entries in matrix

    program scratch
    implicit none

    type spMatType
        integer :: n
        integer :: nnz
        integer, allocatable :: row(:)
        integer, allocatable :: col(:)
        real, allocatable :: val(:)
    end type spMatType

    type (spMatType):: mat1, mat2
    
    mat1%n = 5
    mat1%nnz = 13
    allocate(mat1%row(mat1%nnz),mat1%col(mat1%nnz),mat1%val(mat1%nnz))
    mat1%row = [1,1,1, 2,2, 3,3,3, 4,4,4, 5,5]
    mat1%col = [1,2,3, 1,2, 3,4,5, 1,3,4, 2,5]
    mat1%val = [1.0,-1.0,-3.0, -2.0,5.0, 4.0,6.0,4.0, -4.0,2.0,7.0, 8.0,-5.0]

! ...
! similar code to assign values for sparse matrix mat2
! ...
    open (unit=201,form="unformatted", file="outputFile", status='unknown', action='write')
 
    write (201) mat1%n, mat1%nnz
    write (201) mat1%row, mat1%col, mat1%val
    
    write (201) mat2%n, mat2%nnz
    write (201) mat2%row, mat2%col, mat2%val
! 
! pairs of WRITE statements for mat3, etc.
!    
    close( unit=201)

    end program

hentall_maccuish__ja · ‎08-12-2020

OK understood. I thought you were saying I could have one single object of type spMatType for all my sparse matrices but I need one object for each sparse matrix. In that case I understand but the number of object I will need and the number of write statement will still be very large. The advantage is in removing the loop at the object COO level. Does that seem right? Thanks

mecej4 · ‎08-12-2020

Instead of variables mat1, mat2, etc., you can declare an array mat(5), say, and use mat(1) in place of mat1, mat(2) in place of mat2, and so on.

hentall_maccuish__ja · ‎08-12-2020

Yes I understood that but that is still as many write statements as there are elements in mat(:) which would be just as many calls as I have currently to my user defined routine. This makes me think the gains would be prettty much the same as the suggestion of replacing the loop in my I/O routine with write(lun) this%COO but that the amount of code rewriting is great.

mecej4 · ‎08-12-2020

Originally, you were writing three I/O records for each element of each matrix. Instead, now you could be writing three I/O records for each matrix.

You may also be able to write code such as

    do i = 1, n_mat
       write (201) mat(i)%n,   mat(i)%nnz
       write (201) mat(i)%row, mat(i)%col, mat(i)%val
    end do

if all the information is formed and collected before dumping to file.

hentall_maccuish__ja · ‎08-12-2020

In regard to your first point. I understand the performance advantage over my original solution, I don’t understand if there is one over simply replacing my original I/O routine with:

    subroutine write_container_sample_impl(this, unit, iostat, iomsg)
    class(policyType), intent(in)    :: this
    integer, intent(in)         :: unit
    integer, intent(out)        :: iostat
    character(*), intent(inout) :: iomsg
    integer :: i
    
    write(unit, iostat=iostat, iomsg=iomsg) this%COO

    end subroutine write_container_sample_impl

as suggested above. My best guess is there isn’t as this seems to be a single I/O record per matrix.

In regards to the second point are you saying that the compiler would be better able to optimize the loop in

    do i = 1, n_mat
       write (201) mat(i)%n,   mat(i)%nnz
       write (201) mat(i)%row, mat(i)%col, mat(i)%val
    end do

Than

write (201) policy

where policy is an array of size mat_n with the I/O as above? If so couldn't I get the same advantage from doing

    do i1=1,n1
        do i2=1,n2
            do i3=1,n3
                do i4=1,n4
                    do i5=1,n5
                        write policy(i1,i2,i3,i4,i5)%coo
                    end do
                end do
            end do            
        end do
    end do

Which requires less re-writing of code

JohnNichols · ‎08-12-2020

do i1=1,n1 do i2=1,n2 do i3=1,n3 do i4=1,n4 do i5=1,n5 write policy(i1,i2,i3,i4,i5)%coo end do end do end do end do end do

do i1=1,n1
        do i2=1,n2
            do i3=1,n3
                do i4=1,n4
                    do i5=1,n5
                        write policy(i1,i2,i3,i4,i5)%coo
                    end do
                end do
            end do            
        end do
    end do

At a pinch when you are looking at developing a new program you might do this -- but based on a reasonable assumption about the other code -- writing will consume a lot -- the real question is why are you writing -- is there a better way to use the data in the program or is this a permanent write to an output file.

If I was doing something like this - open a SQL database and shove the whole thing in as a blob.

hentall_maccuish__ja · ‎08-13-2020

Hi John,
This is being saved as unformatted so it is a binary blob, surely formatting this as an SQL database would only add overhead. Given I have no need of this data to be in a database I don't see the advantage.
The only reason to save to disk is to get around insufficient RAM, (pretty much every machine I might run this on will have an order of magnitude more hard drive than RAM so not a question of going to a better machine). I then read the files one by one back into the same program to run simulation on the results and delete the files. So no reason for SQL, and I want the files to be the format that intel fortran will work fastest with which presumably is its own unformatted binary.
Thanks,
Jamie