Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

To PACK and read or not?

Vishnu
Novice
442 Views

I have a section of my code where the main bottleneck is non-contiguous reads. So one option is for me to PACK it and then read it. Also, packing it gives some other benefits where I can use modules I already wrote. I can not yet test this out very well because I will need to scale my system a bit to see appreciable effects, and there are other problems to that right now.

So my question is, Is the packing operation equivalent to moving between non-contiguous sections of an array?

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
442 Views

Can you show or describe the non-contiguous nature (and if you are referring to file or array reads)? Use of PACK implies array.

After PACK, how may times will the packed data be referenced by your procedures?
Will you also return the data using UNPACK?

Would reorganizing your data, to not require PACK, be more beneficial?

Jim Dempsey

0 Kudos
Vishnu
Novice
442 Views

jimdempseyatthecove wrote:

Can you show or describe the non-contiguous nature (and if you are referring to file or array reads)? Use of PACK implies array.

Yes, it is an array. The PACKing will be in a SUBROUTINE inside a MODULE, of which the array will be an INTENT(IN) argument.

It is non-contiguous in that the few elements I am reading may be spread anywhere in this large array. The ratio of the number of elements read to the size of the array can easily be as low as 10^(-5) ~ 10(-6).

jimdempseyatthecove wrote:

After PACK, how may times will the packed data be referenced by your procedures?

Each element of the PACKed array will be read/referenced about SIZE(packed_array) times. The same no of reads is required even when the elements lie scattered, non-contiguous.

jimdempseyatthecove wrote:

Will you also return the data using UNPACK?

Yes, a new array of same size as the packed one, of INTENT(OUT) in the SUBROUTINE, will be UNPACKed with the same mask.

jimdempseyatthecove wrote:

Would reorganizing your data, to not require PACK, be more beneficial?

Yes, that is an option I am considering, but apart from requiring me to re-write lots of other modules, its benefits over this approach are not very clear to me (due to certain complexities of the problem).

0 Kudos
jimdempseyatthecove
Honored Contributor III
442 Views

>>Each element of the PACKed array will be read/referenced about SIZE(packed_array) times.

If the SIZE(packed_array) is more than a handful of elements, say > 100 then there would be little benefit to reorganizing your data.

If your usage pattern repeats several/many times, it may be of use to sort the array such that the otherwise PACKed sections are in contiguous sections of the resultant array. Then use Array(From:To) to select each subsection.

When you pack the data, do the packing such that it is vector friendly as well as cache line friendly.

Jim Dempsey

 

0 Kudos
Vishnu
Novice
442 Views

jimdempseyatthecove wrote:

If the SIZE(packed_array) is more than a handful of elements, say > 100 then there would be little benefit to reorganizing your data.

Why is that so? I would've thought that it is when there are a lot of elements that it would be useful, because if not PACKed, due to their random organization in the larger array, the reads would be more and more inefficient. Now, Is a PACKing as computationally intensive as one (non-contiguous) read of each element? If so, I think there will be an improvement, because otherwise, each element would be read SIZE(packed_array) times, non-contiguously.

jimdempseyatthecove wrote:

If your usage pattern repeats several/many times, it may be of use to sort the array such that the otherwise PACKed sections are in contiguous sections of the resultant array. Then use Array(From:To) to select each subsection.

Yes, this action does repeat lots of times, but, as I mentioned earlier, due to certain complexities of the problem, this is not trivially the best solution, and requires some analysis.

0 Kudos
jimdempseyatthecove
Honored Contributor III
442 Views

>>Why is that so?

My interpretation of "reorganize your data" is such that PACK is unnecessary (data is maintained in a PACKed manner). You stated that to reorganize your data would affect many files. When a sizable number of elements, relative to the numbers of times it is used (once or many times by each of your many files), then the overhead to perform the PACK is negligible. Re-writing your many files may introduce errors as well as only return a small percentage improvement.

When you structure your PACK-ing, you would want to pack in a manner that is vector friendly as well as cache line friendly. If your algorithm is not vector friendly, it may benefit from being cache line friendly.

If your PACKing time is a substantial portion of the compute time .AND. if your program is such that you are performing:

loop:
   Pack part of the data
   Compute using packed data
   Unpack (same) part of the data
end loop

Consider using multiple threads: one to pack the data, one or more to compute on the packed data, one to unpack the data.

This will better utilize your memory bandwidth as well as potentially the cache utilization.

Jim Dempsey

0 Kudos
Vishnu
Novice
442 Views

Thanks for that Jim, Yes, I will be looking into reorganizing the data.

Yes, that is the rough structure of my loop & data access, and I am using OpenMP (and later MPI) to parallelize.

0 Kudos
Reply