Solved: Aliasing problem when passing the private components of a base type to its child

fedor_R_ · ‎01-02-2016

From what I gather, it's best to avoid any kind of aliasing in Fortran. What would be a proper way of implementing the following?

Say, I have two derived types: base_t and child_t, each one defined in its own separate module. child_t extends base_t (I want to model is-a relationship). base_t is an abstract type and has a deferred procedure, which it calls at some point. The child type must provide a definition for this procedure. The problem is that this procedure is supposed to change a private attribute of the base type which is not accessible to the child (add new coordinates to the coordinates array). I was going to pass it as an argument to this deferred procedure, but after reading the thread on aliasing and derived types (https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/597073) it seems like a bad idea. I originally posted this question there, but following FortranFan's comment I moved it to a separate thread. Next paragraph is my response to the comment.

The consumer is not supposed to be able to add new coordinates. Take a look at the simplified example I added below (tried to make it as short as possible). Generally, I'm trying to avoid a setter procedure, because it would be public, and would be accessible to anyone, which contradicts encapsulation principles.

Here's what a consumer is expected to do.

real, dimension(:), allocatable :: f
class(base_t), allocatable :: b

allocate(child_t::b)
! call b%init with some arguments.
! do some other work
call b%compute_differential_operator(f)

And here's a base_t's module.

module base
    implicit none
    private
    public :: base_t

    type, abstract :: base_t
        private
        real, dimension(:,:), allocatable :: x
        integer :: num_bndr = 0
    contains
        private
        procedure, public :: init ! creates initial x array, left it out in this example
        procedure, public :: compute_differential_operator
        procedure(create_boundary_particles_func_t), deferred :: create_boundary_particles
    end type base_t

    interface
        subroutine create_boundary_particles_func_t (this, x)
            import :: base_t
            implicit none
            class(base_t), intent(in) :: this
            real, dimension(:,:), allocatable, intent(in out) :: x ! aliasing would appear when calling this function
        end subroutine create_boundary_particles_func_t
    end interface

contains

    subroutine compute_differential_operator (this, f)
        implicit none
        class(base_t), intent(in out) :: this
        real, dimension(:), intent(out) :: f
        integer :: num_non_bndr

        if (this%num_bndr <= 0) then
            num_non_bndr = size(this%x, 2)
            call this%create_boundary_particles(this%x) ! FIXME: ALIASING
            this%num_bndr = size(this%x, 2) - num_non_bndr
        else
            num_non_bndr = size(this%x, 2) - this%num_bndr
        end if

        ! Here we somehow fill f array based on the positions
        ! of regular and boundary particles.
    end subroutine compute_differential_operator

end module base

P.S. I'm especially interested in the "right" way to do it, not a way. Suggestions on changing the code structure (like how to deal with the whole inheritance of a type with a private component thing) are welcome too. Thanks for any help in advance. I've originally learnt C++, and when it comes to things like OOP it is sometimes difficult to do things Fortran way rather than simply translate C++ principles into Fortran.

Fedor

IanH · ‎01-03-2016

Some observations (not recommendations):

In addition to your approach in #2, of creating a temporary (not so costly here because of the ALLOCATABLE nature of the data) you could also give both dummy arguments of the procedure with potential aliasing the TARGET attribute and remove INTENT(IN). However, this may result in the compiler generating less performant code. In a similar vein, the value of a pointer data component is not considered part of the value of the object with the component (but the pointer association status is) - depending on details that may obviate the problem, though the use ofa pointer component when you don't really need reference semantics will introduce other complications.

Jim's suggestion of using VALUE also solves the problem, but with the obvious cost of creating a temporary every time the relevant procedure is called, regardless of whether aliasing is an issue for a particular invocation or not. You could also explicitly create a temporary for the actual argument corresponding to the passed argument in the calling code in problematic cases... but you still have the cost of a temporary. If you are going to create temporaries I'd take advantage of the allocatable argument instead.

From a design perspective, it may make sense (...or it may be nonsense...) to split the type hierarchy into two - a base traits type that clients extend, and a data storage type, with private components, that is defined in the same module as the base traits type. This may increase the complexity of some of your procedure interfaces (you need to pass both traits type and data type - but that is what you are basically doing anyway given the nature of the interfaces you show) - if that is a significant issue then you may be able to ameliorate that impact by having a third type that aggregates the two.

You can use submodules to break a large module up into smaller program units. Procedures in a submodule are considered to be part of the ancestor module from the point of view of Fortran's accessibility rules.

Fortran does not have the equivalent of C++'s protected access (the PROTECTED attribute in Fortran does something else) for components in a base type - accessibility in Fortran is only based on modules, not on the type hierarchy.

View solution in original post

fedor_R_ · ‎01-02-2016

I think, the most straightforward solution here would be to add a local allocatable array x inside compute_differential_operator, and do the following.

call move_alloc(this%x, x)
call this%create_boundary_particles(x)
call move_alloc(x, this%x)

But it looks a lot like a workaround, not a natural thing to do.

FortranFan · ‎01-02-2016

Out of curiosity, why do you need a deferred procedure of "create_boundary_particles"? If your simplified example in the original post is any indication, it appears the deferred procedure will only on operate on data x which is private to the base module. So why not make this procedure bound to the base type itself?

In addition, why is important that the attribute of the passed object for base type be INTENT(IN) for the deferred procedure?

Also, why do say, "I'm trying to avoid a setter procedure, because it would be public, and would be accessible to anyone, which contradicts encapsulation principles"? Setters (and getters) complement information hiding and data encapsulation, providing a controlled way to work with private data: how is contradictory?

Anyways, you might know private attribute in Fortran applies to the module, so if the base type and the child are part of the same module, then the child can operate on x. But you do say, "interested in the "right" way to do it, not a way" and having both base type and all the children in the same module may not be practical or the best option. It's doubtful anyone can indeed a suggest the "right" way for you without proper understanding of your actual code.

fedor_R_ · ‎01-03-2016

I'll answer each of your paragraphs in order. Hope, it clarifies things.

I need this deferred procedure, because this base type will have a number of children, each defining its own way of "creating boundary particles", while the interface stays the same.

I guess INTENT(IN) is not that important, although I like to protect myself from myself, and in case I implement another child, say, a year from now a clear indicator that the base type should stay the same would be helpful. Still, even if I made it INTENT(IN OUT), the aliasing problem would still be there (even though the two arguments would have the same INTENTs).

Getters are nice and safe, but I can't say the same about setters. See, x array is an internal array of the base type, and is expected to behave in a certain fashion. Now, if I allow it to be changed it directly through a dedicated public procedure, someone (including me after a year's time) might mistakenly call this setter at the wrong time, and end up with a bug, which could be avoided otherwise. Just like in an airplane you want the pilots to be able to access certain controls, but making them accessible to everyone, including passengers, is a threat to the flight.

Yeah, I'm familiar with the concept. It seems a bit weird, but I got used to it. Unfortunately, in this particular case I can't put everything into the same module because of the number of lines. See, currently my base type's module has about 750 lines, while each of his children (currently two of them) has a separate module of approximately 500 lines. Having a module of 1750 lines = 750 + 2*500 with three substantial derived types is too much for my brain.

It's doubtful anyone can indeed a suggest the "right" way for you without proper understanding of your actual code.

You might be right, but even if I don't get a definite answer it still helps to discuss the matter.

jimdempseyatthecove · ‎01-03-2016

Would attributing the setter dummy with VALUE resolve the aliasing issue? IOW when calling derived type function with base component (that would otherwise present itself with alias issues) that specification of pass by value would eliminate the aliasing issue. (assuming it did not create copy constructor issues).

Jim Dempsey

FortranFan · ‎01-03-2016

fedor R. wrote:

.. I need this deferred procedure, because this base type will have a number of children, each defining its own way of "creating boundary particles", while the interface stays the same. .. Getters are nice and safe, but I can't say the same about setters. ..

Based on what you have shown thus far, I would be inclined to do away with the base abstract type altogether and make x as data of each class that needs it and have Init and compute_differential operator as procedures of such classes.

By the way, I only noticed your C++ comments now: but isn't the situation the same in C++? that member functions of a derived class cannot access the private parts of a base class. So how would you do this using C++ without aliasing (pointers)?

IanH · ‎01-03-2016

Some observations (not recommendations):

In addition to your approach in #2, of creating a temporary (not so costly here because of the ALLOCATABLE nature of the data) you could also give both dummy arguments of the procedure with potential aliasing the TARGET attribute and remove INTENT(IN). However, this may result in the compiler generating less performant code. In a similar vein, the value of a pointer data component is not considered part of the value of the object with the component (but the pointer association status is) - depending on details that may obviate the problem, though the use ofa pointer component when you don't really need reference semantics will introduce other complications.

Jim's suggestion of using VALUE also solves the problem, but with the obvious cost of creating a temporary every time the relevant procedure is called, regardless of whether aliasing is an issue for a particular invocation or not. You could also explicitly create a temporary for the actual argument corresponding to the passed argument in the calling code in problematic cases... but you still have the cost of a temporary. If you are going to create temporaries I'd take advantage of the allocatable argument instead.

From a design perspective, it may make sense (...or it may be nonsense...) to split the type hierarchy into two - a base traits type that clients extend, and a data storage type, with private components, that is defined in the same module as the base traits type. This may increase the complexity of some of your procedure interfaces (you need to pass both traits type and data type - but that is what you are basically doing anyway given the nature of the interfaces you show) - if that is a significant issue then you may be able to ameliorate that impact by having a third type that aggregates the two.

You can use submodules to break a large module up into smaller program units. Procedures in a submodule are considered to be part of the ancestor module from the point of view of Fortran's accessibility rules.

Fortran does not have the equivalent of C++'s protected access (the PROTECTED attribute in Fortran does something else) for components in a base type - accessibility in Fortran is only based on modules, not on the type hierarchy.

FortranFan · ‎01-04-2016

@fedor R.,

Please do report back if you find an approach acceptable to you and decide to implement in your code.

Also, if you can provide a brief sketch of how you would do using C++, it'd be great too. I'm curious as to whether the approach with C++ [where also you have the same OO design with "a deferred procedure (virtual function), .. base type will have a number of children, each defining its own way of "creating boundary particles", while the interface stays the same"] be any different, in principle, from the POINTER component approach that is possible with Fortran; in both cases - C++ and Fortran - close attention would be needed with the pointer component and its finalization, etc.

Good luck,

fedor_R_ · ‎01-04-2016

To Jim Dempsey
I'm not sure I understand you correctly. Here's what I understood: create_boundary_particles(this, x) could have "this" dummy marked VALUE. I've never actually passed a big derived type with a VALUE attribute, but I thought it by default performs a deep copy of the object, including allocating all the allocatables (which is costly). And even if I redefined it I would still have to do that manually, because some of these arrays are needed inside the procedure (they are accessible through the base type's getter procedures). Also, I just googled it to see if I was missing something, and IBM's webpage says I must not specify the VALUE attribute with polymorphic items (don't know if that's credible). As IanH wrote, there is the cost of the temporary. In my case, it's mostly the huge memory size of it.

To FortranFan
I see why you might think so. I provided a small example where the base type indeed looks redundant. But the actual type is 1000 lines of code. Doing away would mean that the derived types get extra 1000 lines each (approximately). Not only it's extra lines, it's also extra amount work when updating or fixing the code. As for C++, the situation is a bit different there. Take a look at this example http://www.tutorialspoint.com/cplusplus/cpp_inheritance.htm - it's surely clearer than anything I could come up with right now.

To IanH
I usually try to avoid TARGETs/POINTERs in Fortran because it's usually possible, and I particularly like Fortran for that. What I noticed in the thread I cited in #1 is your example of "option_one" procedure. It has arguments, as you said, marked TARGET, but the object you pass to this procedure is not marked target. I somehow assumed that I could only pass items defined as TARGET in such cases. Your version is standard, right?
I haven't looked into submodules yet due to the support issues. GCC says it supports them since 6.0, Intel since 16 (if I googled it right). Unfortunately I don't always get to choose the versions of the compilers I have to work with, and currently the oldest versions are gfortran-4.9 and ifort 14. So you can define submodules in seperate files?

Now your design perspective seems great to me at this point. Breaking a complex type into two is good anyway, and it also solves the aliasing. Where would you store the object of this data storage type? I mean, if I have a third type (the ameliorating type) that holds the base/derived type and the storage type, the user of the module would have to provide the object of the data type to each call of the base type's procedures. Or the third type could model each of the base type's calls and do the job itself, but it's a whole lot of new procedures which all do the same (add the extra argument of data storage type). There is the option of making a single global object of data storage type in the module, and it would be sufficient for me at the moment, but it's certainly not good style. How would you approach it?

To FortranFan's new comment
Take a look at my response to your previous comment (in terms of accesibility). As for pointers, C++ is always pointers, but you can always use a smart pointer to free yourself from freeing memory. I'm honestly afraid to write any examples in C++ right now because it's been almost two years since I last wrote serious OOP code in C++. If I went for the POINTER approach here it would be as if I tried to write C/C++ code in Fortran, which is a different language with its on pros and cons. Yeah, I'll surely write a comment when I make a decision.

To everybody
Thanks to all of you for your input! A whole lot of information! And I also learned the word obviate.

Fedor

fedor_R_ · ‎01-07-2016

Since there are no further suggestions, I decided to go with a sort of temporary array.

interface
    subroutine create_boundary_particles_func_t (this, new_x)
            import :: base_t
            implicit none
            class(base_t), intent(in) :: this
            real, dimension(:,:), allocatable, intent(out) :: new_x
    end subroutine create_boundary_particles_func_t
end interface

This create_boundary_particles function was not supposed to change the existing x-values of the base object in the first place, only append new coordinates to it. It needs however to be able to read the existing x-values, which will now be done through a designated getter function (base type). The new values will be written to new_x, which base type will append to its this%x array afterwards. This involves some extra allocation and copying but I don't think it should really affect the performance. On the plus side, it's totally safe (I think), clear, and requires little rewriting.

Fedor

jimdempseyatthecove · ‎01-07-2016

Fedor,

>>it's totally safe

If your application is multi-threaded, and if the append new coordinates of the same object will (could) occur concurrently, then you will have to protect the append with a critical section (or other mechanism such as an OpenMP lock).

Jim Dempsey

fedor_R_ · ‎01-08-2016

jimdempseyatthecove wrote:

If your application is multi-threaded, and if the append new coordinates of the same object will (could) occur concurrently, then you will have to protect the append with a critical section (or other mechanism such as an OpenMP lock).

Thank you, Jim. Will keep that in mind.

Fedor