Solved: Here is something that works - Page 2

mecej4 · ‎09-09-2015

The 16.0 compiler issues the following warning message about a structure:

..\include\dmumps_struc.h(14): warning #6379: The structure contains one or more misaligned fields.   [DMUMPS_STRUC]
      TYPE DMUMPS_STRUC

Although I wish to make this message go away, by inserting padding or rearranging the sequence, in this case the declaration of the structure (i.e., user-defined type) spans 265 lines, causing the number of suspects to be large.

It would help if the compiler flagged (or flagged when requested) the particular structure members that caused the misalignment.

An older version of the file in question: http://mumps.sourcearchive.com/documentation/4.9.2.dfsg/dmumps__struc_8h_source.html .

TimP · ‎09-10-2015

I'm guessing that you're not looking for alignments suitable for vectorization of short (or longer) arrays, and that your message may be about failing to get 8-byte alignment of real64 and the like. As the build procedure probably has to work for combinations such as ifort and MSVC, there's no chance of automatic padding working, thus SEQUENCE must be in effect, and padding must be explicit. As Jim said, the traditional rule to avoid padding is to order the elements in decreasing order of size. The tradition goes back to f77 and earlier, when putting single byte elements in the same structure with larger ones wasn't supported by standards. Even then, the different padding treatments of real(10) by various compilers were among the reasons for many Fortrans not supporting it.

For 32-bit Windows, mis-aligned 64-bit data used to be routine, and throwing a warning by default would have been annoying. Some programmers claimed that attempts to get alignment were in violation of the ABI. Microsoft even advertised a feature whereby their compiler performed all 64-bit moves, including doubles, by pairs of 32-bit integer operations, to mitigate the performance consequences.

I was just looking into whether Windows gcc can support 32-byte alignment. Apparently, the answer is no, even for 64-bit mode, although I didn't try reconfiguring ld for that purpose.

ifort still doesn't support 32-byte alignment of COMMON (to my knowledge), although Intel compilers do support it for individual arrays, and this was on the wish list from the time when 32-byte alignment became important.

View solution in original post

Kevin_D_Intel · ‎09-12-2015

I submitted a feature enhancement requesting inclusion of names of structure members causing the misalignment in the warning #6379.

(Internal tracking id: DPD200376029)

mecej4 · ‎09-12-2015

Thanks, Kevin. That will be quite helpful.

jimdempseyatthecove · ‎09-12-2015

R.O.

There is no assurance that the size of a POINTER array descriptor will remain the same over time.

Though it is highly probable that the size of a POINTER to an array descriptor will be a multiple of the architecture C_INTPTR_T (32-bit, 64-bit, other-bit), there is no assurance that the number of such units will remain the same.

The intrinsic SIZEOF does not return the size of the POINTER array descriptor, it returns the size of that which it points to.

There is no assurance that a priorly declared user defined type containing a pointer to an array (of same rank) can be used with SIZEOF to determine the size of a pointer to an array (of same rank), though it may work today. IOW declare a type containing SEQUENCE and xxx, POINTER :: x(Your rank here). Then use SIZEOF(that type) in determining the pad amount.

Therefore, there is no (universal) compile time way to know the size of the POINTER.

Of the above assumptions, the safest may be to assume the size of the POINTER array descriptor is some multiple of INTEGER. Therefore the type layout should be

non-allocatable REAL(8) and INTEGER(8) arrays and scalars
non-allocatable REAL(4) and INTEGER(4) arrays and scalars
(note, if you require alignment of any of those, you know the counts and can figure out the pads)
(note 2, at this point you are only assured at an offset of multiple of INTEGER(4)
Now declare your pointers
Lastly, declare the character (though with known sizes, you can place them in front of the pointers).

I tend to place the characters in front of the pointers, since this declares a POD structure in the front of the type, and unknowns at the back (easier for interoperational purposes.

Getting the pads right (and keeping them right) is always a headache.

Jim Dempsey

mecej4 · ‎09-12-2015

I have constructed a small reproducer to make a case for a request to have the compiler issue not just a warning but provide additional information to help the programmer with padding for alignment. If you compile the program at the end of this post with the Version 16.0 32-bit IFort compiler, with options /Zi /list, in the cross-reference section of the listing file you will see the following:

SYMBOL CROSS REFERENCE

 Name                       Object Declared Type            Bytes Dimen Elements Attributes       References                        
                                                                                                                                    
 COL                        Dummy  30       I(4)            4     1     0        ARG,IN           38                                
 COL                        Local  38       I(4)            4     1     1        PTR              38                                
 N                          Dummy  30       I(4)            4           scalar   ARG,IN           35                                
 N                          Local  35       I(4)            4           scalar                    35                                
 NNZ                        Dummy  30       I(4)            4           scalar   ARG,IN           30,31,36,37,38,39                 
 NNZ                        Local  36       I(4)            4           scalar                    36

Note the apparent duplication of variables. In fact, the second item of each pair is the structure member of spmat with that member name. To discern that the item is a structure member, you will have to look away from the cross-reference listing and examine the source code listing. This deficiency of the /list option needs to be worked on (OTOH, we are happy to have the /list option available to us). It would help if the address of the (immediate) parent structure were listed.

The second instance of COL in the excerpt above points out a second deficiency of the the listing, the one that pertains to misalignment. Note the helpful tag PTR in the second line with COL. Nowhere in the listing can you find that the actual length of the item is 36 bytes, and the reported length of 4 bytes is that of the target of the pointer, not the length of the pointer variable. This is exactly what Jim has been stressing, in #24 and elsewhere. Without that information (36 byte memory footprint of variable) available, padding becomes a guessing game, and Jim correctly observed "Getting the pads right (and keeping them right) is always a headache".

There is one way of finding that missing information (structure member memory size = 36). That is to generate a PDB file (the reason for my specifying /Zi above), and to build and run the MSVC tool dia2dump on the PDB file. That generates a huge file, but we can use the -type SPARSE_MATRIX option to narrow things down. The following output lines contain the missing piece of the puzzle (note the 0x24, which = 36, the size of the structure member whose type is "Fortran pointer" ).

 long<NoType>, Data: 0x01 0x00 0x00 0x00 0x24 0x00 0x00 0x00 , ROW
 long<NoType>, Data: 0x01 0x00 0x00 0x00 0x24 0x00 0x00 0x00 , COL
float<NoType>, Data: 0x01 0x00 0x00 0x00 0x24 0x00 0x00 0x00 , VA

It would be great if the cross reference listing showed

a mark such as M for misaligned structure members,
for Fortran pointer variables, the size of the pointer variable rather than the length of its anonymous target (which is already shown in the Type column as "I(4)", if the target is an array, this length is that of just one array element).

Here is the reproducer code:

module sp_mod
type sparse_matrix
   sequence
   integer :: n
   integer :: nnz
   integer, pointer, dimension(:) :: row
   integer, pointer, dimension(:) :: col
   real,    pointer, dimension(:) :: val
end type
end module

program sp_test
use sp_mod
implicit none
type(sparse_matrix) :: spmat
integer :: n=3, nnz=5, row(5),col(5)
real :: val(5)

row=[1,2,2,3,3]
col=[1,2,3,1,3]
val=[2.0,3.0,5.0,7.0,4.0]

call sp_mat_create(n,nnz,row,col,val,spmat)

contains

subroutine sp_mat_create(n,nnz,row,col,val,spmat)
implicit none
integer, intent(in) :: n, nnz,row(nnz),col(nnz)
real, intent(in) :: val(nnz)

type (sparse_matrix),intent(out)  :: spmat

spmat%n   = n
spmat%nnz = nnz
allocate(spmat%row(nnz)); spmat%row = row
allocate(spmat%col(nnz)); spmat%col = col
allocate(spmat%val(nnz)); spmat%val = val

return
end subroutine sp_mat_create

end program sp_test

jimdempseyatthecove · ‎09-13-2015

Here is something that works now. Feel free to massage to your own purposes:

    program SizeofPointer
        implicit none
        type Pointer_real_8_rank_1_t
            sequence
            real(8), pointer :: x(:)
        end type Pointer_real_8_rank_1_t
        type Pointer_real_8_rank_2_t
            sequence
            real(8), pointer :: x(:,:)
        end type Pointer_real_8_rank_2_t
        type(Pointer_real_8_rank_1_t) :: t_Pointer_real_8_rank_1_t
        type(Pointer_real_8_rank_2_t) :: t_Pointer_real_8_rank_2_t
        integer, parameter :: SizeofPointer_real_8_rank_1 = sizeof(t_Pointer_real_8_rank_1_t)
        integer, parameter :: SizeofPointer_real_8_rank_2 = sizeof(t_Pointer_real_8_rank_2_t)
        print *, 'SizeofPointer_real_8_rank_1', SizeofPointer_real_8_rank_1
        print *, 'SizeofPointer_real_8_rank_2', SizeofPointer_real_8_rank_2
    end program SizeofPointer
----
 SizeofPointer_real_8_rank_1          36
 SizeofPointer_real_8_rank_2          48

The above is compiled as 32-bit.

Jim Dempsey

jimdempseyatthecove · ‎09-13-2015

Using something like R.O. suggests in determining your pads:

    program SizeofPointer
        implicit none
        type Pointer_real_8_rank_1_t
            sequence
            real(8), pointer :: x(:)
        end type Pointer_real_8_rank_1_t
        type Pointer_real_8_rank_2_t
            sequence
            real(8), pointer :: x(:,:)
        end type Pointer_real_8_rank_2_t
        type(Pointer_real_8_rank_1_t) :: t_Pointer_real_8_rank_1_t
        type(Pointer_real_8_rank_2_t) :: t_Pointer_real_8_rank_2_t
        integer, parameter :: SizeofPointer_real_8_rank_1 = sizeof(t_Pointer_real_8_rank_1_t)
        integer, parameter :: SizeofPointer_real_8_rank_2 = sizeof(t_Pointer_real_8_rank_2_t)
        integer, parameter :: SizeofInteger = sizeof(SizeofInteger)
        type foo
            sequence
            integer :: i
            real(8), pointer :: p(:)
            ! padd to cache line offset
            integer(1):: pad(MODULO(64*123456 - (SizeofPointer_real_8_rank_1+SizeofInteger),64)) 
            real(8) :: vec(8)
        end type foo
        
        type(foo) :: test
        print *, 'SizeofPointer_real_8_rank_1', SizeofPointer_real_8_rank_1
        print *, 'SizeofPointer_real_8_rank_2', SizeofPointer_real_8_rank_2
        print *, "Test for alignment", loc(test.vec)-loc(test.i), ubound(test.pad)
    end program SizeofPointer
---
 SizeofPointer_real_8_rank_1          36
 SizeofPointer_real_8_rank_2          48
 Test for alignment          64          24

Jim Dempsey

mecej4 · ‎09-13-2015

Thanks for the illustrative code with SIZEOF() used to find the memory footprints of the structure members.

I worked through the MUMPS package, after removing the original padding, which did not appear correct to me. I got the package to compile and run without misalignment warnings, but ran into another problem: the padding that worked for 32-bit compiles did not work for 64-bit compiles, and vice versa. This implies that two sets of include files would need to be used. The MUMPS FAQ page says that one could try to remove SEQUENCE from the structure declarations, but when I tried that some source files would not compile with MAKE (mismatched actual and dummy arguments). I did not investigate further.

andrew_4619 · ‎09-13-2015

I prefer to use ISO_C_BINDING C_SIZEOF, SIZEOF is an extension.

jimdempseyatthecove · ‎09-13-2015

integer(C_PTR), parameter :: SizeofC_PTR = sizeof(SizeofC_PTR) ! or simply C_PTR

If you can determine the MUMPS requirements, you may be able to use one INCLUDE file that can make the determination.

I am surprised you did not ask why the "64*123456", you probably figured it out. For those scratching there head, the idea is to produce some multiple of size of the alignment you are interested in that exceeds the size of the array descriptor (of some arbitrary large rank, larger than you will ever use), but not larger than HUGE(INTEGER).

Jim Dempsey

Kevin_D_Intel · ‎09-14-2015

@mecej4 - I captured your feature enhancement in post #25 in a separate request to Development.

(Internal tracking id: DPD200376076)

Warning about misaligned elements in structure