- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I’m new to defined input/output procedures and I find that the one I have written is eating up 80% of my runtime. I’m trying to saving a lot of data to disk but this still doesn’t seem correct to me; I am generating and processing all this data in the program as well, so just saving it to disk taking 80% of the runtime seems disproportionate. I can see why my I/O procedure would induce a lot of loops and be slow; however, being new to defined I/O procedures I’m not sure what I can do about it. Any suggestion would be greatly appreciated. The defined type I am trying to save with its routine is below.
Thanks,
Jamie
Module Policy
implicit none
type sparseCOOType
integer :: col
integer :: row
real :: val
end type sparseCOOType
type policyType
type(sparseCOOType), allocatable :: COO(:)
contains
procedure :: write_sample => write_container_sample_impl
procedure :: read_sample => read_container_sample_impl
generic :: write(unformatted) => write_sample
generic :: read(unformatted) => read_sample
end type policyType
contains
subroutine write_container_sample_impl(this, unit, iostat, iomsg)
class(policyType), intent(in) :: this
integer, intent(in) :: unit
integer, intent(out) :: iostat
character(*), intent(inout) :: iomsg
integer :: i
write(unit, iostat=iostat, iomsg=iomsg) size(this%COO)
do i=1,size(this%COO)
write(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%col
write(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%row
write(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%val
end do
end subroutine write_container_sample_impl
subroutine read_container_sample_impl(this, unit, iostat, iomsg)
class(policyType), intent(inout) :: this
integer, intent(in) :: unit
integer, intent(out) :: iostat
character(*), intent(inout) :: iomsg
integer :: i, sizeCOO
read(unit, iostat=iostat, iomsg=iomsg) sizeCOO
allocate(this%COO(sizeCOO))
do i=1,sizeCOO
read(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%col
read(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%row
read(unit, iostat=iostat, iomsg=iomsg) this%COO(i)%val
end do
end subroutine read_container_sample_impl
end module
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your current procedure writes and reads all the data elements in turn. But the sparse matrix COO is an array of a simple derived type that just contains three scalars. Such a type can be written directly via the default facilities. So, you could write your policy object "this" via:
write(lun) this%COO
That will make the writing (and similarly the reading) much faster
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Arjen. I thought I couldn't do this because COO is itself allocatalbe but I re-read the standard that had been quoted to me that made me think that (reposted below) and I think I would only need user defined I/O if one of the elements of COO is allocatable. Is that right?
That should speed it up a fair amount. My guess is that the bigger problem is the number of calls from the array policy but I will definitely use this
“If a list item of derived type in an unformatted input/output statement is not processed by a defined input/output procedure (12.6.4.8), and if any subobject of that list item would be processed by a defined input/output procedure, the list item is treated as if all of the components of the object were specified in the list in component order (7.5.4.7); those components shall be accessible in the scoping unit containing the data transfer statement and shall not be pointers or allocatable. If a derived-type list item is not processed by a defined input/output procedure and is not treated as a list of its individual components, all the subcomponents of that list item shall be accessible in the scoping unit containing the data transfer statement and shall not be pointers or allocatable.”
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The key point to note is that if the IOlist contains only members of a derived type, and each of the members is one of the intrinsic types, you do not need to write defined I/O procedures.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Arjen, this simple change cut the runtime in half! Down to 4.5 hours
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does setting "buffered" on the open statement help?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrew,
Thanks, that's a good suggestion. I'm currently locked out of my development environment but will try this when I get in a couple of weeks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm late to this discussion, and seems that you received a satisfactory answer.
The consumption of 80% of your runtime is not surprising. Your code calls three Fortran runtime library functions named: for_write_seq in tight loop. The OS name is not stated and in case of Windows for_write_seq would call probably NTDLL WriteFIle which in turn calls NTOSKRNL Zw/NtReadFile operating in the kernel space and calling low level disk access driver hierarchy. The large overhead is expected in this case and more accurate value might be found by using VTune profiler.
By reading your post I'm not sure if kernel overhead of disk data writing was accounted for 80% consumption time
--Bernard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Bernard,
The 80% figure was pulled out of VTune profiler. If memory serves the time was consumed by the write statements inside the user defined I/O routine so I guess by the writes and not kernel overhead, or is there a better way to check this in VTune profiler?
Thanks,
Jamie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jamie,
I presume that 80% was a total time of execution containing both user mode and kernel mode part. I'm sure that kernel mode part will dominate the execution time anyway. VTune GUI by default will set the counters to count both user and kernel event triggers, so probably it is as mentioned the total value.
I'm using personally the VTune CLI version where I can tweak more the performance events and its modifiers. I think that you can do the same in GUI version by creating the custom analysis and choosing the relevant performance events mainly the fixed counter events.
there a better way to check this in VTune profiler?
Maybe hotspot analysis with kernel code cycle counting and the callstack collection would suffice to visualize the distribution of the load per user/kernel modules
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I'll look into this and give it a try when I get access back to my development environment (I'm locked out for a few weeks)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Looking back at your example I came to conclusion that mentioned 80% may contain the only user mode processing time. Most of the time is spent in for_write_seq and further in ReadFile functions and not even crossing the kernel boundary.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »