Solved: Mixed Language Programming and OpenMP

rostin · ‎10-13-2008

Many apologies if this issue has been addressed, or if this is the wrong forum. I did a little searching and couldn't find any similar questions. I'm using Intel C++ and Fortran compilers in linux. icpc --version and ifort --version both report 10.1 20080801.

I have a code written in C++ and Fortran. The C++ portion contains a section of code that I am attempting to parallelize using openmp. I call some Fortran subroutines within the parallelized region.

First, I tried to build the app by compiling all the C++ source files using icpc with the -openmp option, then compiling the fortran source and linking everything in one step with ifort, also with the -openmp option.

When I tried to run the code, I got a segfault. I tried debugging to find out what's causing the segfault, but the result only confused me - the problem occurs when one user defined fortran subroutine calls another. No statements within the second subroutine are executed. The call itself appears to cause the fault.

I read about the -openmp option in the manual, and I noticed that -parallel also causes the multithreading run time libs to be linked. Out of desperation, I tried compiling the Fortran portion using that option. I stopped getting the segfault, but the results from the fortran subroutines are garbage.

I am pretty confident that the problem is not with my code itself. When I compile it using g++/gfortran with -fopenmp, it runs fine. (Unfortunately, the machine I want to work on normally doesn't have new enough versions of the gnu compilers to support openmp, and I can't upgrade them.)

Thanks in advance for your advice.

jimdempseyatthecove · ‎10-16-2008

Quoting - rostin

Thanks again.

I wrapped the sub in !DEC$ IF (.FALSE.)...!DEC$ ENDIF, but I still got a segfault at the point where the sub is called. I was not able to step into it. I commented out that call and filled the arguments (described below) with dummy data so the code could continue running, and the segfault then occured at a "call" to a different function in the same calling subroutine.

Let me describe my code a little further and add to the list of symptoms..

The collection of Fortran subroutines is used to calculate energy and forces on a configuration of atoms. There are some other subs that do things like read in parameters from disk, read in the atomic configuration from disk, etc. But the sub that is called most often from C++ is called Compute_EAM_Forces. Compute_EAM_Forces accepts three user-defined types. One contains info about the configuration (lattice_type), the second contains parameters for the energy and force equations (meam_type), and the third contains the results of the calculation (results_type).

On the C++ side, I have a class (meam_wrapper) that contains a configuation of atoms and is also an interface to the Fortran subs. Objects of this class have unique lattice_types, but they can share a meam_type (using a class static variable) because it is not normally necessary to have more than one version of the parameters.

My main function contains essentially this:

#pragma omp parallel

{

meam_wrapper *mw_ptr = new meam_wrapper(some initializing info)

double energy = mw_ptr->calc_energy()

delete mw_ptr;

}

Within meam_wrapper::calc_energy, I have a call to the fortran sub compute_eam_forces(lattice, potential, results)

Where lattice, potential, and results are C structures that mimic the Fortran user defined types.

The call that causes the segfault is within Compute_EAM_Forces. It's to a sub called Update_List. Its arguments are two user-defined types, lattice and potential. They are two of the very same objects that are passed into compute_eam_forces.

That's pretty much how it works.

Here's what I've observed.

As I said, when I run this code with OMP_NUM_THREADS=1, everything is fine. With two threads, things are more interesting. At first, I noticed that sometimes one thread was able to successfully calculate the energy before the segfault. The second thread always failed. So, I enclosed the energy calculation in a critical section and added a statement to output omp_get_thread_num. It appears that when Thread 0 is the first to execute the critical section, it succeeds, then Thread 1 causes a segfault. However, when Thread 1 is the first to execute the critical section, it never succeeds. I only know this through observing the output, but I've run the code dozens of times, and this is always the result, without exception.

Rostin,

Is the "main" a C++ function or a Fortran PROGRAM? The reason this is important is the C++ compiler and runtime system,and the Fortran compiler and runtime system may be using different OpenMP libraries. If so then it is recommended that only one of the languages contain (control)the OpenMP directives. The other portion can be called by the multiple threads provided the code is thread safe. Any OpenMP synchronization should pass through the language in control of the OpenMP directives. This is cautionary (may in statement above) because mixing languages from the same vendor may or may not use the same OpenMP libraries, you will have to check the documentation as to compatibility issues.

Make sure you are using the multi-threaded versions of all library functions for all languages.

Another "gotcha" on the Fortran side is often the subroutine local arrays end up being static as opposed to stack

subroutine foo(A)
real :: A
real :: TEMP(100)

In the above it is ambiguous as to if TEMP should be static (SAVE) or on stack. Option switched implicitly make the selection. However, it is safer to explicitly state your requirements. Use

subroutine foo(A)
real :: A
real, automatic:: TEMP(100)

The above is for thread safe programming practice but this does not sound like this is your problem. The interesting indicator is the "master" thread can make it through but the additional thread team members cannot (at least to the extent of your testing). What this indicates is (not exclusive of other potential problems) the ammount of stack for thread 1is different (less than that)from the amount of stack for thread 0. There are "generally" three places for specifying stack size: Compiler option (may vary with vendor), Linker option (may vary with vendor), thread spawning call (in your case this is outside your control), and potentially a forth place may be a vendor specificenvironment variable. Look at your C++ documentation in all places (compiler, linker, environment variables) as well as looking at the OpenMP section of the C++ to see how you specify the stack size for spawned threads. Do not assume that the reserved stack space is equal.

The next thing to check is to make sure that the calling convention is the same between the Fortran subroutine and the C++ call. If it is not the same then the stack pointer may either a) not get restored after call, or b) get "restored" twice (constant added twice). Since you cannot look at the dissassembly

{
meam_wrapper *mw_ptr = new meam_wrapper(some initializing info);
printf("&energy = %pn", &energy);
double energy = mw_ptr->calc_energy();
printf("&energy = %pn", &energy);
delete mw_ptr;
}

(replace p with format specifier for size of pointer (32/64))

If the address of energy is different then the stack pointer got hosed indicating calling convention problem.

The next thing to look at is if the C++ structures passed in contain pointers to C++ allocated arrays. If so, then these C++ allocated arrayse will NOT have a Fortran array descriptor. It then becomes your responsibility on the Fortran side to properly construct the Fortran array descriptor (the documentation will explain how to do this). There is a similar problem with the confusion of string lengths.

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎10-13-2008

Rostin,

Try performing a simple test.

Place a break point at the CALL statement that will fail. Then at break, set the debugger to open a Dissassembly window and follow the code as you use Step Into (I do not use the Mac/Linus debugger so you will have to figure this out for yourself). As you step into the code (the result of the Step Into on the CALL statement) see if you end up at the called subroutine preamble. The preamble is the small section of hidden code used to initialize local variables.

What you are looking for is one of

a) Trash for code - indicates something walked over the code (probably a static initialization in the C++ code)
b) A stack based memory allocation for a local array. Either the argument that specifies the size of the local array is wrong or your reserved stack space is too small (linker or compiler option).

Finding the cause for a) is somewhat unusual. First, you have to determine what location(s) in the subroutine got altered. This could be the first byte of code in the subroutine or some point into the subroutine. To do this, perform the Dissassembly window step into, then using the address of the first instruction of the subroutine perform a memory watch, extend the screen to view as large of code that follows as possible and capture the screen and save to file. Second, restart a debug session by performing a Step Into into the executable. This should show a dissassembly window before the static initialization takes place. Next using the address of the subroutine that fails examine memory again. Now compare the saved screenshot from the current screen shot.

If you find a difference, then using the lowest address in the subroutine where the difference occures, enter this address into a break on data - memory changed. Then continue the program. It should break in the constructor in the C++ code that is stomping on the program.

Potentially your problem is b) and you won't have to do the dissassembly bit above.

Jim Dempsey

rostin · ‎10-14-2008

Thanks very much for your response. If you wouldn't mind, could you please explain a bit more? I am not a very experienced programmer, and I am mostly self-taught, so I don't fully understand the problem you describe in a).

Let's say I am able to do what you suggest and I identify the constructor that is causing the problem. Is there something I should be doing as a programmer to prevent this kind of problem? Shouldn't the compiler or operating system prevent this from happening? If a) is the problem, what can I do about it? I don't just ask that as a practical matter. I hope your response will help me to better understand what might be occuring.

Thanks!

jimdempseyatthecove · ‎10-14-2008

Quoting - rostin

Thanks very much for your response. If you wouldn't mind, could you please explain a bit more? I am not a very experienced programmer, and I am mostly self-taught, so I don't fully understand the problem you describe in a).

Let's say I am able to do what you suggest and I identify the constructor that is causing the problem. Is there something I should be doing as a programmer to prevent this kind of problem? Shouldn't the compiler or operating system prevent this from happening? If a) is the problem, what can I do about it? I don't just ask that as a practical matter. I hope your response will help me to better understand what might be occuring.

Thanks!

Rostin,

To quote a line from Cassablanca "Round up the usual suspects"...

Not seeing your code (complete code) I can only suggest what constitutesthe "usual suspects". For your case:

a) The CALL that triggers the failure may have an argument on the CALLthat creates an overly large stack temporary
b) The subroutine called creates an overly large stack temporary
c) Something stomped on (inadvertantly modified) the code in the subroutine called
d) Something prior to the CALL trashed the stack context of the subroutine making the call but without causing the error to become visible until after or during the CALL

The C++ static data initialization of objects could cause c) under some circumstances (e.g. when an uninitialized pointer containing junk is use in the constructor of a different object)
Calling a C++ function or subroutine with incorrect calling convention could cause d)

Also, not listed above is often programmers write their C/C++ code to assume if a pointer != NULL that it points to a valid object. This requires of the programmer to assert that when a pointer is not valid that it be set to NULL. Fortran by default does not (is not required to) initialze data to NULL's. Uninitialized variables (pointers) may cause your code to assume objects are allocated when they are not.

Jim Dempsey

TimP · ‎10-14-2008

Quoting - rostin

Thanks very much for your response. If you wouldn't mind, could you please explain a bit more? I am not a very experienced programmer, and I am mostly self-taught, so I don't fully understand the problem you describe in a).

Let's say I am able to do what you suggest and I identify the constructor that is causing the problem. Is there something I should be doing as a programmer to prevent this kind of problem? Shouldn't the compiler or operating system prevent this from happening? If a) is the problem, what can I do about it? I don't just ask that as a practical matter. I hope your response will help me to better understand what might be occuring.

Thanks!

Unfortunately, the compiler can't predict run-time stack size requirement. Read previous posts about increasing stack size allowance in your OS, and about the heap-arrays compile option for reducing ifort stack usage.

rostin · ‎10-14-2008

Thanks, Jim and Tim!

To rule out the possibility that all the stack space is being used up, here's what I've tried:

1. Compiling with -heap-arrays

2. ulimit -s unlimited

3. ulimit -s

4. Reducing the number of threads to 2.

5. Putting the offending section of code in a CRITICAL section

None of those things worked.

I tried to follow Jim's instructions to see if part of the code in either the called subroutine or the calling subroutine was being modified. I am restricted to using gdb because the machine is remote, and the graphical debugger (DDT) doesn't seem to have a disassembly window. Using the disassemble command, I dumped out both those subroutines in three places: Before the execution of "very much" code, after the seg fault, and before the execution of any code after a restart. (Specifically, after the segfault, I used the gdb command "start".) I used diff to compare the result in those three cases, and they were all identical. I'm not totally sure that what I'm describing here is what he suggested, but to the best of my understanding, it is.

Sorry to keep posting, but are there any other ideas?

jimdempseyatthecove · ‎10-14-2008

In the subroutine that fails when called...
Immediately after the SUBROUTINE insert

!DEC$ IF (.FALSE.)

and immediately before the END SUBROUTINE place

!DEC$ ENDIF

This will make your subroutine a do nothing subroutine.

Note, you may have to turn off gen interfaces, check interfaces.

If you can step into and back out of the subroutine then start moving the !DEC$ IF (.FALSE.) downwards.

Fist move it down to your code (past all the dummy and variable declarations). Re-compile, run and try the step into, through and back out.

If that fails, then move the !DEC$ half way up through the variable declarations, Repete using a binary search like process until you find the offending statement.

If the data declaration section succeeds, then start moving the !DEC$ IF (.FALSE.) downwards into the code. The code section is a little bit trickyier since it generally has loops and labels. So you may have to accomodate those.

Is there a possiblilty that the subrouting you are looking at is not the one called? (duplicate name or some such thing)

Jim Dempsey

rostin · ‎10-15-2008

Thanks again.

I wrapped the sub in !DEC$ IF (.FALSE.)...!DEC$ ENDIF, but I still got a segfault at the point where the sub is called. I was not able to step into it. I commented out that call and filled the arguments (described below) with dummy data so the code could continue running, and the segfault then occured at a "call" to a different function in the same calling subroutine.

Let me describe my code a little further and add to the list of symptoms..

The collection of Fortran subroutines is used to calculate energy and forces on a configuration of atoms. There are some other subs that do things like read in parameters from disk, read in the atomic configuration from disk, etc. But the sub that is called most often from C++ is called Compute_EAM_Forces. Compute_EAM_Forces accepts three user-defined types. One contains info about the configuration (lattice_type), the second contains parameters for the energy and force equations (meam_type), and the third contains the results of the calculation (results_type).

On the C++ side, I have a class (meam_wrapper) that contains a configuation of atoms and is also an interface to the Fortran subs. Objects of this class have unique lattice_types, but they can share a meam_type (using a class static variable) because it is not normally necessary to have more than one version of the parameters.

My main function contains essentially this:

#pragma omp parallel

{

meam_wrapper *mw_ptr = new meam_wrapper(some initializing info)

double energy = mw_ptr->calc_energy()

delete mw_ptr;

}

Within meam_wrapper::calc_energy, I have a call to the fortran sub compute_eam_forces(lattice, potential, results)

Where lattice, potential, and results are C structures that mimic the Fortran user defined types.

The call that causes the segfault is within Compute_EAM_Forces. It's to a sub called Update_List. Its arguments are two user-defined types, lattice and potential. They are two of the very same objects that are passed into compute_eam_forces.

That's pretty much how it works.

Here's what I've observed.

As I said, when I run this code with OMP_NUM_THREADS=1, everything is fine. With two threads, things are more interesting. At first, I noticed that sometimes one thread was able to successfully calculate the energy before the segfault. The second thread always failed. So, I enclosed the energy calculation in a critical section and added a statement to output omp_get_thread_num. It appears that when Thread 0 is the first to execute the critical section, it succeeds, then Thread 1 causes a segfault. However, when Thread 1 is the first to execute the critical section, it never succeeds. I only know this through observing the output, but I've run the code dozens of times, and this is always the result, without exception.

jimdempseyatthecove · ‎10-16-2008

Quoting - rostin

Thanks again.

I wrapped the sub in !DEC$ IF (.FALSE.)...!DEC$ ENDIF, but I still got a segfault at the point where the sub is called. I was not able to step into it. I commented out that call and filled the arguments (described below) with dummy data so the code could continue running, and the segfault then occured at a "call" to a different function in the same calling subroutine.

Let me describe my code a little further and add to the list of symptoms..

The collection of Fortran subroutines is used to calculate energy and forces on a configuration of atoms. There are some other subs that do things like read in parameters from disk, read in the atomic configuration from disk, etc. But the sub that is called most often from C++ is called Compute_EAM_Forces. Compute_EAM_Forces accepts three user-defined types. One contains info about the configuration (lattice_type), the second contains parameters for the energy and force equations (meam_type), and the third contains the results of the calculation (results_type).

On the C++ side, I have a class (meam_wrapper) that contains a configuation of atoms and is also an interface to the Fortran subs. Objects of this class have unique lattice_types, but they can share a meam_type (using a class static variable) because it is not normally necessary to have more than one version of the parameters.

My main function contains essentially this:

#pragma omp parallel

{

meam_wrapper *mw_ptr = new meam_wrapper(some initializing info)

double energy = mw_ptr->calc_energy()

delete mw_ptr;

}

Within meam_wrapper::calc_energy, I have a call to the fortran sub compute_eam_forces(lattice, potential, results)

Where lattice, potential, and results are C structures that mimic the Fortran user defined types.

The call that causes the segfault is within Compute_EAM_Forces. It's to a sub called Update_List. Its arguments are two user-defined types, lattice and potential. They are two of the very same objects that are passed into compute_eam_forces.

That's pretty much how it works.

Here's what I've observed.

As I said, when I run this code with OMP_NUM_THREADS=1, everything is fine. With two threads, things are more interesting. At first, I noticed that sometimes one thread was able to successfully calculate the energy before the segfault. The second thread always failed. So, I enclosed the energy calculation in a critical section and added a statement to output omp_get_thread_num. It appears that when Thread 0 is the first to execute the critical section, it succeeds, then Thread 1 causes a segfault. However, when Thread 1 is the first to execute the critical section, it never succeeds. I only know this through observing the output, but I've run the code dozens of times, and this is always the result, without exception.

Rostin,

Is the "main" a C++ function or a Fortran PROGRAM? The reason this is important is the C++ compiler and runtime system,and the Fortran compiler and runtime system may be using different OpenMP libraries. If so then it is recommended that only one of the languages contain (control)the OpenMP directives. The other portion can be called by the multiple threads provided the code is thread safe. Any OpenMP synchronization should pass through the language in control of the OpenMP directives. This is cautionary (may in statement above) because mixing languages from the same vendor may or may not use the same OpenMP libraries, you will have to check the documentation as to compatibility issues.

Make sure you are using the multi-threaded versions of all library functions for all languages.

Another "gotcha" on the Fortran side is often the subroutine local arrays end up being static as opposed to stack

subroutine foo(A)
real :: A
real :: TEMP(100)

In the above it is ambiguous as to if TEMP should be static (SAVE) or on stack. Option switched implicitly make the selection. However, it is safer to explicitly state your requirements. Use

subroutine foo(A)
real :: A
real, automatic:: TEMP(100)

The above is for thread safe programming practice but this does not sound like this is your problem. The interesting indicator is the "master" thread can make it through but the additional thread team members cannot (at least to the extent of your testing). What this indicates is (not exclusive of other potential problems) the ammount of stack for thread 1is different (less than that)from the amount of stack for thread 0. There are "generally" three places for specifying stack size: Compiler option (may vary with vendor), Linker option (may vary with vendor), thread spawning call (in your case this is outside your control), and potentially a forth place may be a vendor specificenvironment variable. Look at your C++ documentation in all places (compiler, linker, environment variables) as well as looking at the OpenMP section of the C++ to see how you specify the stack size for spawned threads. Do not assume that the reserved stack space is equal.

The next thing to check is to make sure that the calling convention is the same between the Fortran subroutine and the C++ call. If it is not the same then the stack pointer may either a) not get restored after call, or b) get "restored" twice (constant added twice). Since you cannot look at the dissassembly

{
meam_wrapper *mw_ptr = new meam_wrapper(some initializing info);
printf("&energy = %pn", &energy);
double energy = mw_ptr->calc_energy();
printf("&energy = %pn", &energy);
delete mw_ptr;
}

(replace p with format specifier for size of pointer (32/64))

If the address of energy is different then the stack pointer got hosed indicating calling convention problem.

The next thing to look at is if the C++ structures passed in contain pointers to C++ allocated arrays. If so, then these C++ allocated arrayse will NOT have a Fortran array descriptor. It then becomes your responsibility on the Fortran side to properly construct the Fortran array descriptor (the documentation will explain how to do this). There is a similar problem with the confusion of string lengths.

Jim Dempsey

jimdempseyatthecove · ‎10-16-2008

*** OOPS forward reference, use address of mw_ptr instead

printf("&mw_ptr = %pn", &mw_ptr);
double energy = mw_ptr->calc_energy();
printf("&mw_ptr = %pn", &mw_ptr);

Jim Dempsey

jimdempseyatthecove · ‎10-16-2008

Note, this is the address of the pointer and not the address of the object to which it points. The pointer is on the stack. If the call/return was not using the same protocol then the address of the pointer will change (as well as you may end up returnning to the incorrect address). If you never see the sedcond print statement then the calling convention is wrong and you returned to the incorrect address.

Jim Dempsey

rostin · ‎10-16-2008

Jim,

I think that straightened me out. Before, I was building the app by compiling each of the C++ source files, then compiling the Fortran source file and linking everything together in the final step using ifort. By doing it that way, I could avoid having to explicitly name the various libs the Fortran part of the code needs. Your suggestion about the openmp libs made me try it the other way - linking with icpc instead of ifort.

Just FYI: the main function is in c++, and all the openmp sections are in c++.

That fixed the original segfault issue.

It was also very lucky for me that you mentioned the issue about automatic vs saved arrays. That was the next hurdle, but -automatic took care of it. After that, I still got a segfault, but this time it really was related to stack size. The info here got me going: http://forum.cgd.ucar.edu/archive/index.php/t-22.html

Anyway, hopefully everything will work now. Thanks so much for continuing to think about my problem and make suggestions. :)

jimdempseyatthecove · ‎10-16-2008

You are more than welcome Rostin, forwared the favor when you see someone else with this problem. Also, remember to rank the reply.

Jim Dempsey