Re: Killing subroutine memory?

Intel_C_Intel · ‎03-04-2006

Hello,

Imagine a large subroutine (millions of lines!) that uses a lot of memory, but that does not deallocate all memory before exiting. Is it a way to free all the memory used by the subroutine, after the subroutine has terminated? That is, is it possible to start the subroutine in a separate memory space, and then kill this memory after the subroutine has been executed? This will, for example, make it easier to run the many instances of the subroutine in parallell, because each instance of the subroutine is not aware of the other instances of the subroutines,including global memory variables,running in the system. Also, the subroutine can be executed many times after one another, without increasing the memory used by the program, as the memory used by the subroutine is "killed" just after execution. Something like this:

call launch(sub) !launch subroutine sub() in a separate memory region

call kill(sub)! kill memory allocated by sub()

Regards,

Lars Petter Endresen

TimP · ‎03-05-2006

If you allocate memory by standard Fortran methods, according to f95, all allocations are freed automatically on exit from the subroutine. Certainly, a subroutine with millions of source lines is almost guaranteed to exhibit bugs, both in its own code and in the compiler.

Steven_L_Intel1 · ‎03-05-2006

In addition to Tim's advice, be sure that you give the routine the RECURSIVE attribute if it is going to be called in parallel. This will cause local, un-SAVEd variables to be allocated on the stack (which also means your stack usage may go up) and they'll go away on routine exit.

As Tim says, local ALLOCATABLE variables are supposed to be automatically deallocated on routine exit. We recently fixed a bug in that area (in 9.0.030) and there may be more. If you find such a case, report it to us.

Intel_C_Intel · ‎03-05-2006

Hello,

I will try the RECURSIVE attribute. The subroutine has many use statements and is calling a whole bunch of other subroutines. I saw that the second time I called the routine, variables and data defined in the modules were still there. I even tried CreateThread() to start the routine in a separate thread, but still module variables were around the second time. Maybe if use both RECURSIVE and CreateThread(), the module variables will go away (I hope)?

Lars Petter

CreateThread

Steven_L_Intel1 · ‎03-06-2006

If you don't use RECURSIVE, then many variables will be statically allocated. Using RECURSIVE when writing threaded programs is a must.

Intel_C_Intel · ‎03-06-2006

Hello.

Recursive did not help, module variables were still around. Here is a rough sketch of the program.

PROGRAM FISH

CALL FORGETALL

END PROGRAM FISH

SUBROUTINE FORGETALL

USE M_FISH ! CONTAINS MANY VARIABLES AND SUBROUTINES INIT, FISHING

CALL INIT

CALL FISHING

END SUBROUTINE FORGETALL

Now, the second time I call FORTGETALL, it seems that variables initiated in INIT are still around. I want FORGETALLto forget all! Is that possible?

:-)

Regards,

Lars Petter

Message Edited by lpe@scandpowerpt.com on 03-06-2006 09:22 AM

Steven_L_Intel1 · ‎03-06-2006

Module variables are always static and are shared across all your threads. RECURSIVE applies to local variables only. It sounds to me as if you want to avoid all use of global variables. Perhaps have a local variable of derived type that you pass around for context?

Intel_C_Intel · ‎03-06-2006

Module variables are always static and are shared across all your threads.

Aha! That explains a lot. Not easy to get rid of module variables then! The option is then to completely avoid modules and global variablesin codes that are going to be efficiently threaded. Maybe it would have been a good idea to make some sort of extension to Fortran that makes it easy to start a subroutine in a separate memory space? For example this way:

separate subroutine FISH

Meaning that everything memorywise that happens inside FISH is not to be confused with anything else in the program.

Regards,

Lars Petter

Message Edited by lpe@scandpowerpt.com on 03-06-2006 10:27 AM

Steven_L_Intel1 · ‎03-06-2006

Your extension essentially exists. It's called thread-private storage in OpenMP. Consider using OpenMP rather than creating your own threads. You get a LOT of control over which variables are private to threads and which aren't.

Intel_C_Intel · ‎03-06-2006

Hello,

OpenMP sounds great! Do I have to explicitely define all private variables, or is there some way to define everything in the subroutine private?

Lars Petter

From the documentation:

Data Sharing

Data sharing is specified at the start of a parallel region or worksharing construct by using the SHARED and PRIVATE clauses. All variables in the SHARED clause are shared among the members of a team. The application must do the following:

Synchronize access to these variables. All variables in the PRIVATE clause are private to each team member. For the entire parallel region, assuming t team members, there are t+1 copies of all the variables in the PRIVATE clause: one global copy that is active outside parallel regions and a PRIVATE copy for each team member.
Initialize PRIVATE variables at the start of a parallel region, unless the FIRSTPRIVATE clause is specified. In this case, the PRIVATE copy is initialized from the global copy at the start of the construct at which the FIRSTPRIVATE clause is specified.
Update the global copy of a PRIVATE variable at the end of a parallel region. However, the LASTPRIVATE clause of a DO directive enables updating the global copy from the team member that executed serially the last iteration of the loop.

In addition to SHARED and PRIVATE variables, individual variables and entire common blocks can be privatized using the THREADPRIVATE directive.

Message Edited by lpe@scandpowerpt.com on 03-06-2006 11:06 AM

Intel_C_Intel · ‎03-06-2006

Will this work?

PROGRAM FISH

!$OMP PARALLEL DEFAULT(PRIVATE)
!$OMP SECTIONS
!$OMP SECTION
CALL FORGETALL
!$OMP SECTION
CALL FORGETALL
!$OMP END SECTIONS
!$OMP END PARALLEL

END PROGRAM FISH

SUBROUTINE FORGETALL

USE M_FISH ! CONTAINS MANY VARIABLES AND SUBROUTINES INIT, FISHING

CALL INIT

CALL FISHING

END SUBROUTINE FORGETALL

Steven_L_Intel1 · ‎03-06-2006

I am not even close to being an OpenMP expert. I would recommend reading the Optimizing Applications manual chapter on parallel programming, and perhaps check out our forum on Threading for Parallel Processing.

Lars, I think your idea will not work as the module variables are still static.

Intel_C_Intel · ‎03-06-2006

OK.

I will read the Optimizing Applications manual chapter on parallel programming and check out the Threading for Parallel Processing forum. I will have to learn more about these things, in particular as multi-cores seems to be the future. Thank you for the guiding.

Regards,

Lars Petter

jim_dempsey · ‎03-07-2006

Lars,

It sounds like you want something similar to C++ destructor capability. A problem you have with FORTRAN is the use of RETURN inside the routine whereby you forget to deallocate something. It is real difficult to keep this synchronized and GOTO is often not clean enough. A possible work around is to use a shell subroutine that performs the allocation, then calls the original subroutine, then on returndeallocate the temporary memory. On call to original routine you add one arg which is equivilent to the C++ this pointer.

Jim Dempsey

Intel_C_Intel · ‎03-07-2006

Hello,

As the subroutine is so large (millions of lines counting all modules), I do not have the slightest chance to have control of all global (module) variables.In other languages like C++ this may be also tricky, it is always easy to forget to deallocate something. Fortunately we have some Fortran 77code within the subroutine, that can be easily parallelised, as these have no global variables, no common blocks and (of course) no modules. These are essensial a set of timeconsuming and memoryless functions were all input/output is going through the function/subroutine interfaces - should be anideal target for parallelisation. Still I think that it would have been convenient to invent some new functionality, like the proposed "separate" statement, that would make it easier to parallelizestandard spaghetti code with global variables, common blocks, returns and gotos. My experience is that most software is like this anyway. :-) Would it be difficult to implement the "separate" statement?

separate subroutine FISH

In a way this becomessome sort of"program within program" technology.

:-)

Lars Petter

jim_dempsey · ‎03-07-2006

In your M_FISH module, define a user defined type that encompasses all the data for the context of your outer most layer subroutine. Then create a shell routine to perform the allocation/deallocation. This way you don't have to worry about a return forgetting to cleanup allocations.

Code:

! go_fishingSub is original subroutine with one additional argument
! The argument myFishingData is the context information
! Allocation and deallocation is performed in one location in
! a shell routine of the original subroutine name (go_fishing)
! This permits untidy returns.

subroutine go_fishingSub(myFishingData, A, B, C)
    use fishing_data
    use fishing_code
    ! calling args
    type(fishing_data_type) :: pFishingData
    real :: A, B, C
   ...
end subroutine go_fishingSub

! Shell routine using original name
! Allocations performed prior to call to original subroutine
! Deallocation called after
subroutine go_fishing(A, B, C)
    use fishing_data
    use fishing_code
    ! calling args
    real :: A, B, C
    ! pointer to temporal data
    type(fishing_data_type), pointer :: pFishingData
    ! allocate working data set
    allocate(pFishingData)
    ! call working routine
    call go_fishingSub(pFishingData, A, B, C)
    ! return working data
    deallocate(pFishingData)
end subroutine go_fishing

! Note, go_fishsing is OpenMP safe

Also, use the preprocessor to perform name changes. e.g If your code used

integer :: FishingHoleNumber

Which is now inside myFishingData simply define

#define FishingHoleNumber myFishingData%FishingHoleNumber

This way you do not have to edit any of your code.

Jim Dempsey

Intel_C_Intel · ‎03-07-2006

Hello,

With severalthousandFortran modules it will take quite some time to write all the shell routines. But thanks for the suggestions, I will try to see ifthis can be used in places where Fortran 77 (without globals) cannot be used.

Lars Petter

jim_dempsey · ‎03-07-2006

I recently converted a legacy F77 into F90 with OpenMP and went through all the headachs you may be having now (over 700 modules). OpenMP is quite easy to use once you have the hang of it. The following is not obvious from the documentation but is so after you use it a few times:

Code:

type TypeThreadContext

SEQUENCE

type
(TypeObject), pointer :: pObject
type
(TypeTether), pointer :: pTether
type
(TypeFSInput), pointer :: pFSInput
integer
:: LastObjectLoaded
end
type TypeThreadContext
type
(TypeThreadContext) :: ThreadContext
COMMON
/CONTEXT/ ThreadContext

!$OMP THREADPRIVATE(/CONTEXT/)

In my case I have pointers in each thread private area. You could have your common blocks and static data.

You still have an allocation issue and using the shell subroutine mentioned earlier could eliminate memory leaks.

In tackling a conversion project you want to touch as little of the code as possible.

In my conversion effort the use of the preprocessor and #defines meant that I could compile the same source files the old F77 way and the new F90 w/OpenMP way. There were a few exceptions.

Then once the old code was converted and running I could then add features.

Jim Dempsey