Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Mixed language - Windows vs. Linux

gib
New Contributor II
1,504 Views

I am developing a Fortran program that calls C functions, that in turn call Fortran (BLAS and LAPACK).  On Windows I use IVF and MSVC, and it all works fine. I have just started porting the program to Linux (on our cluster), and the first run crashed with SIGSEGV.  I have not had time yet to narrow down the crash location, but it occurs to me that it could be an issue of the compilers that are being used.

The Fortran is compiled with ifort, the C object files are currently compiled with gcc and statically linked, while BLAS and LAPACK are dynamic libraries, probably also built with gcc.  Is it a mistake to use gcc with ifort?  Presumably Intel C is also available on the cluster.

Thanks

Gib

0 Kudos
8 Replies
gib
New Contributor II
1,504 Views

The SIGSEGV turned out to be an easy fix.  I repeatedly open and close a log file in the C code, and was not resetting the FILE pointer to NULL.  It seems that MSVC must do it for you, but the Linux compilers (icc or gcc) do not, hence the error.  An example of why it's a good idea to test code on different OSes.

It seems that ifort works happily with either icc or gcc.  If anybody has any comments on why one compiler should be preferred I'd be keen to hear them.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,504 Views

Can you show the MSVC statement that you suspect is closing  NULLing the file pointer. The following C code will not NULL the file pointer (and if it gets NULL'd it is a serious error):

{
  FILE * pFile;
  pFile = fopen ("myfile.txt","wt");
  fprintf (pFile, "fclose example");
  fclose (pFile);
  // pFile is not null here
  pFile = NULL; // You must insert this
  ...
  if(pFile) {
    ...
}

Jim Dempsey

0 Kudos
mecej4
Honored Contributor III
1,504 Views

You should not use the file pointer after you close the file by calling fclose().

The Posix man-page for fclose() at http://pubs.opengroup.org/onlinepubs/009695399/functions/fclose.html says:

After the call to fclose(), any use of stream results in undefined behavior.

Unless your code uses some of the latest features of C/C++, or exposes a bug in one of the compilers, you can use either. Sometimes, the choice is determined more by familiarity with the option set of the compiler rather than the performance of the resulting code.

0 Kudos
gib
New Contributor II
1,504 Views

Hi guys, thanks for responding.  I am not a C expert (or any sort of programming expert), so it doesn't surprise me that I leapt to the wrong conclusion. 

My Fortran program statically links C functions that are defined thus:

#include <stdio.h>
#include <stdlib.h>
#include <string.h> 
#include <math.h>
#include "globheads.h"
#include "defs.h" 
#include "protos.h"
#include "ios.h"  

//#define flog stdout

int ndim[16];
int nnz[16];
int lfil;
int ILUtype;            // 1 = ILUK, 2 = VBILUK, 3 = ILUT, 4 = ARMS
int nBlock;
int *nB = NULL;
int *perm = NULL;
csptr csmat[16];        // matrix in csr format          
iluptr lu[16];          // ilu preconditioner structure
vbsptr vbmat = NULL; 
vbiluptr vblu = NULL;   // vbilu preconditioner structure
arms ArmsSt = NULL;     // arms preconditioner structure
FILE *flog[16]={NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL};

//-----------------------------------------------------------------------
//-----------------------------------------------------------------------
void itsol_create_matrix(int ic, int n, int nz, double *AA, int *JA, int *IA, int *ierr) 
{
    int rsa;
    char fname[64];

    if (!flog[ic]) {
        sprintf(fname,"itsol_glue%d.log",ic);
//      flog = fopen("itsol_glue.log","w");
        flog[ic] = fopen(fname,"w");
    }
    *ierr = 0;
    ndim[ic] = n;
    nnz[ic] = nz;
    csmat[ic] = (csptr)Malloc( sizeof(SparMat), "itsol_create_matrix" );
    rsa = 0;
    if ((*ierr = CSRcs(ndim[ic], AA, JA, IA, csmat[ic], rsa)) != 0) {
        fprintf(flog[ic],"CSRcs error\n");
        return;
    }
    *ierr = 0;
    return;
}

//-----------------------------------------------------------------------
//-----------------------------------------------------------------------
void itsol_free_matrix(int ic, int *ierr)
{
    *ierr = 0;
    fprintf(flog[ic],"itsol_free_matrix: ic: %d\n",ic);
    if (csmat[ic]) {
        fprintf(flog[ic],"cleanCS: %d\n",ic);
        cleanCS( csmat[ic] );
    }
    fflush(flog[ic]);
    fclose(flog[ic]);
    flog[ic] = NULL;    <<<<<<<<<<<<<<<< added for Linux
    return;
}

...

To invoke the solver first itsol_create_matrix() is called, then some other functions that do the work, and finally itsol_free_matrix() is called.  This happens within an OpenMP-parallelised loop over a small number of values of ic, which is why the log file pointer is an array entry (I wanted to be able to see what the solver was doing for each value of ic separately).  The log file is for debugging purposes, and the reason for closing and reopening it was to be able to focus on the most recent solver step.  There might be something wrong with what I'm doing, but it works with IVF and MSVC (without 'flog[ic] = NULL;'), but initially crashed with SIGSEGV on Linux with ifort and icc (or gcc).  By inserting printf statements I determined that the crash was happening at the point of returning from itsol_free_matrix() - it never returned.  I discovered that adding the line to reset the pointer to NULL made the program run OK, without ever really understanding the cause of the problem.

Maybe somebody can see if I'm doing something wrong somewhere else and tell me how how to fix it.

Thanks, Gib

0 Kudos
mecej4
Honored Contributor III
1,504 Views

If your Fortran program is multi-threaded and it calls the C functions the codes of which you posted in #5, I suspect that there will be race conditions because of the many global variables in the C code. With such variables you have to be careful to arrange matters such that each thread writes to only those global variables that conceptually "belong" to that thread.

If you run your program through Intel Inspector, it may detect and tell you about those race conditions.

0 Kudos
IanH
Honored Contributor III
1,504 Views

Setting `flog[ic]` to NULL would be required if within a program you were calling `itsol_create_matrix` and `itsol_free_matrix` more than once with the same value of `ic`.  If your program had such repeated calls (I'm not sure whether this is the case or not based on your description) then perhaps previously the resulting undefined behaviour was simply undetectable.
 

0 Kudos
gib
New Contributor II
1,504 Views

Hi Ian,

That's exactly what I do.  In each time step (of many thousands) the solver is called once for each of the ic values (currently between 2 and 12 of them, depending on the case I'm running).  Each invocation of the solver starts with a call to itsol_create_matrix, and ends with a call to itsol_free_matrix.

The little puzzle (which does not really require solving) is why the Windows program does not require NULLing the pointer, while the Linux version does.  Or perhaps that's the wrong way to describe the situation.

0 Kudos
gib
New Contributor II
1,504 Views

Hi mecej4,

The situation isn't as bad as it looks.  Potentially there are four preconditioners, but after testing I found one was much faster than the others on my problem, and this is the only one I'm using.  All the global variables used for this case are stored in arrays.  If I ever wanted to use one of the other preconditioners I would have to follow the same pattern.  The program runs surprisingly fast on both Windows and Linux (thanks Yousef Saad!).

0 Kudos
Reply