Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

can avoid shared object calling wrong subroutine in another shared object?

Greg_T_
Valued Contributor I
859 Views

Hello,

I have a C# program that calls routines in two Fortran shared objects on Linux (run under Mono on Linux, run using DLLs on Windows).  I've found a problem that if both shared objects have been loaded (doing several tasks in the program) a call to a subroutine within the first shared object tries to incorrectly access a different subroutine of the same name in the second shared object.  If only one of the shared objects is loaded (doing only one set of tasks in the program) then either shared object works fine alone.  The same Fortran source code is used on Windows and works there as two DLLs.  To correct the problem I've renamed the subroutine in the first shared object to avoid calling the wrong subroutine in the second shared object.  Needing to rename subroutines in separate shared objects is a surprise to me.

Instead of renaming subroutines, is there a more general approach to limit access to subroutines in shared objects on Linux?  Is there syntax to tell the shared object to look for the subroutine only within the same shared object, or to limit external access to just a few selected routines in the shared object?

The two subroutines with the same name do have a different signature with a different number of arguments, so it seems that finding the correct subroutine by signature is not occurring.

In Windows DLLs the "DLLEXPORT" syntax helps limit access to just the intended externally accessible subroutines in the library.  When compiling on Linux I see a message that DLLEXPORT is ignored. Not needing DLLEXPORT is also explained by Steve in this topic:

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/270148

On Linux can any subroutine in the shared object be accessed externally?  Should I expect to need to avoid subroutines of the same name within all the shared objects used by the program?

Thank you for your help.

Regards,
Greg T.

 

0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
859 Views

Greg,

Let me state this first: I am not an expert on Linux shared object libraries. The following is subject to being incorrect.

From my limited understanding, on Windows, DLLEXPORT creates an equivalent to a (virtual) vector table to the entry points so specified. All other functions linked/loaded into the DLL not DLLEXPORTed are essentially private to the DLL. The executable using the DLL links in a specially constructed version of the vector table such that on first use of an entry, the DLL gets loaded (if not loaded at program image load), and then the vector table entry gets updated to point directly at the appropriate function/variable location.

On Linux, the .so files are equivalent to a Windows .lib file (no vector/dispatch table) and get linked at runtime. This means that all non-static global variables and function entry points are visible. Thus one instance of same named function would be used.

What you can do, is to place the same named functions from each Fortran .so file into a module that is used within the Fortran library (but not external to the library). Assuming the modules have different module names, then these will represent different subroutines.

Additionally, if your code permits, you may be able to take advantage of attributing the subroutine names with PRIVATE.

Jim Dempsey

0 Kudos
mecej4
Honored Contributor III
859 Views

By default, on Linux shared objects (*.so) are created with all symbols exported where the Windows linker must be told which symbols to export. Please see https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html and other related sources of information on finer control of the shared library creation process.

0 Kudos
Greg_T_
Valued Contributor I
859 Views

Thank you both for the information and explanation.  Knowing that the default on Linux is for all the subroutines and functions to be externally available helps. 

I should have mentioned the run time symptoms of the subroutine name collision between the two shared objects.  In one case asubroutine call was just skipped over and went to the next line, which was a bit baffling to see debug output from the command before and after the subroutine call.  In another case there was a core dump crash, which was more helpful as the crash message gave the subroutine name in the other shared object library instead of in the shared object that was being used.

The www.gnu.org reference was very helpful and led me to adding the --version-script option to the gcc linker options in the makefile.  The web search words I was using didn't even get me close to that information; having the correct key words make a big difference in useful search results.  Reading about controlling the shared object external versus private symbols (subroutines and functions) led me to this article that describes using a gcc linker script file to list the routines that should be externally available and keeping all the other routines hidden (ie private):

https://www.gnu.org/software/gnulib/manual/html_node/LD-Version-Scripts.html

Instead of adding syntax in the Fortran source for DLLEXPORT to make a routine public on Windows, it looks like I can use the gcc linker option on Linux to limit access to the subroutines listed in a linker script file.  Here is an example of a makefile using the --version-script option to reference the "exportlist.txt" script text file; this example has 4 source files with two being externally available:

# define variables for directory, file extension, commands
ODIR = obj_64
O = .o
OD = $(ODIR)
RMC = /bin/rm -f
MVC = /bin/mv -f
# define the output shared object file name
LIB_NAME = libMyLib.so
# define Intel Fortran compiler, link and compile options
compiler = ifort
#
# key note: use the gcc linker --version-script option to reference the "exportlist.txt"
# file that lists externally accessible subroutine names and set all others to hidden
# 
options = -shared -fPIC -static-intel -L. -Wl,-rpath,.,--version-script=exportlist.txt
#
#
opt1 = -fPIC -O2 -fpconstant -warn declarations -warn unused -fpp
# define the source files in the project
objects = $(OD)/subA$O \
 $(OD)/subB$O \
 $(OD)/sub1$O \
 $(OD)/sub2$O
# build the project
$(LIB_NAME) : $(objects)
 $(compiler) -o $(LIB_NAME) $(objects) $(options)
 $(MVC) $@ $(ODIR)/$@
# define compile for each source file
$(OD)/subA$O : subA.f90
 $(compiler) -c subA.f90 $(opt1)
 $(MVC) subA$O $@

$(OD)/subB$O : subB.f90
 $(compiler) -c subB.f90 $(opt1)
 $(MVC) subB$O $@

$(OD)/sub1$O : sub1.f90
 $(compiler) -c sub1.f90 $(opt1)
 $(MVC) sub1$O $@

$(OD)/sub2$O : sub2.f90
 $(compiler) -c sub2.f90 $(opt1)
 $(MVC) sub2$O $@

Example of the corresponding "exportlist.txt" file to make subA and subB public and keep sub1 and sub2 hidden:

{
  global: subA; subB;
  local: *;
};

I used the "nm" Linux command to list the external public subroutine symbol names in the shared object file to help check the correct symbols are public (a space before and after the T character for grep):

>nm libMyLib.so | grep ' T '

Using the --version-script option is working so far and I'll test it on both shared object library files.

Regards,

Greg T.

jimdempseyatthecove
Honored Contributor III
859 Views

Greg,

Thanks for posting back your solution. Too often users getting pointers to solutions do not return the favor by posting a complete solution. Your post #4 should help others in the same situation.

FWIW I wish that Intel would construct a compendium, and properly index and TOC it in a Tips document relating to problems and solutions. While much of this information is within this forum, and I imagine not visible on premier.intel.com, the valued information is not easily findable. (unless you know the hard to determine keywords).

Jim Dempsey

0 Kudos
Simon_C
Novice
610 Views

Jim,

This thread came up when I was looking for help and, as so often, you have provided the answer. I'm writing to ask for a little clarification. (I don't really follow Greg_T's solution.)

My problem:

1. I have created a Fortran dll (Windows) and shared object library (Linux) that is called from Matlab on each platform. The code is the same. Only one routine from the library is called by Matlab. The code is identical on both platforms, apart from the single !DIR$ ATTRIBUTES DLLEXPORT directive in the Windows version.

2. Within the Fortran code are some NAG versions of BLAS subroutines which retain the original BLAS names as ENTRY points. Most of the Fortran code - including these versions of the BLAS routines - runs in extended precision.

3.  In Windows, calls of the Fortran dll from the Matlab mex function work OK for the reasons given in your answer. However, on Linux the call of my library hangs the application. A colleague obtained this trace:

 

Stack trace of the code, most recent function (e.g. where we are) at the top, function name is first, file that contains it second:

mkl_blas_cnr_def_xdnrm2 : /usr/local/MATLAB/R2022b/bin/glnxa64.mkl.so

nkl_blas_dnrm2 : /usr/local/MATLAB/R2022b/bin/glnxa64.mkl.so

mkl_blas.dnrm2 : /usr/local/MATLAB/R2022b/bin/glnxa64.mkl.so

e04ucf_ : libmarchemspec.so

solve_routines_mp_calcspeciation_ : libmarchemspec.so

solve_routines_mp_fastpitz_solve_ : libmarchemspec.so

marchem_dll_routines_mp_marchemspec_:libmarchemspec.so

mexFunction:MarChemSpec.c:335

Starting at the bottom is our Matlab mex function MarChemSpec, and then further up the NAG solver routine e04ucf is called from the my shared object library. It is at that point the NAG versions of the BLAS routines start getting called, and it appears that the BLAS names used as entry points have caused the there to be links to external mkl libraries which are then being called.   Not a good thing!

Jim's statement about "all non-static global variables and function entry points are visible" explains this.

My question relates to Jim's suggestions regarding the use of Fortran modules and the PRIVATE attribute. So here goes: If I were to place all of the NAG routines in a module, and make sure that all of the routines that made use of the BLAS names were PRIVATE, would that solve the problem?

Put more generally: in a Linux shared object library are module routines that are private visible outside the library or not? I assume that if the answer is 'no' then I will not have the spurious calls to routines from glnxa64.mkl.so that cause the application to fail.

I'd be grateful for some guidance. It is only gradually that the nature of the problem has become clear to me.

 

Thankyou.

 

 

 

 

0 Kudos
mecej4
Honored Contributor III
859 Views

I second Jim's comments. A lot of useful information remains squirreled away, or among embedded in posts on unrelated topics. It would be very helpful if, after a search produced a list, we could order the list by date (ascending/decending), posting person, forum where posted, etc. As it is, I do not see any particular order in which the list is presented, and the date is not often displayed in the listing. The dates displayed, if displayed at all, can be misleading, because only the date of the first post of a thread is shown. Some of the list items point to posts in, for example, the Brazilian Portuguese forum, where the date format is different, even when the original post was in the EN-US forum.

At the bottom of the list that is presented after a search, we see "1 Next", without any hint on how many "next"s there are to click through. What if I wanted to go to list page 11? Do I have to click ten times to get there?

Here is a valuable document which goes in depth into Greg T.'s questions: https://software.intel.com/sites/default/files/m/a/1/e/dsohowto.pdf .

0 Kudos
Kevin_D_Intel
Employee
859 Views

Thank you for the feedback/suggestions Jim and mecej4. I will pass this back to our support teams/management. mecej4, I also directed your specific forum/search feedback to our Intel Developer Zone (IDZ) development team. Thank you again.

0 Kudos
jimdempseyatthecove
Honored Contributor III
590 Views

I am not sure if the following is correct or not. My assumption is that a .lib file is a container of .obj files that are not linked with other object files contained within that .lib file when they are placed in the .lib file by the librarian. Instead, any cross linkage of the contained .obj's within a .lib file are extracted from the .lib file and linked with the linker. On Windows you may be out of luck.

On Linux .so files, I cannot say if it follows my assumptions about .lib files.

I do have a suggestion if you are up to it. This will work on both Windows and Linux. This is simplified by your needs are:

    Only one routine from the library is called by Matlab.

You make two executables to run as separate processes. The Matlab one contains a shell function for calling the DLL/.so code indirectly. And the other is a shell PROGRAM the calls the DLL/.so code directly. The two executables are written as MPI processes and launched via mpiexec or mpirun or mpi.... You can encapsulate the mpi command line in a .bat or .sh file to simplify the program launch.

 Yes, mpiexec/... can specify different executables for each rank. It is relatively easy to pass data between ranks. I will leave that as an exercise for you. MPI has all the functionality to pass data between ranks (you would be passing function arguments with the shell functions/subroutines). In this manner, each rank process (executable) can be linked with the required version/build library.

Jim Dempsey

 

 

 

0 Kudos
Reply