Hi,

Jeremie_V_ · ‎04-01-2018

Dear,

I wrote a Coarray Fortran program for solving sparse linear systems of equations using Preconditioned Conjugate Gradient algorithm. I tested it on 2 datasets on a HPC with SLURM as scheduler. For compilation, I used Intel fortran 17.0.0. Sparse matrix-vector multiplications are done with MKL subroutines.

For one dataset, the Coarray Fortran program always gave the correct answer, when all the images were on the same node or when each image was on a different node (i.e., 1 image per node).

For a second dataset, the same Coarray Fortran program gave the correct answer when all the images were on the same node.

ISSUE: However, when there was 1 image per node, the program diverged and gave wrong answers!

I compiled the same program with the options "-g -check all -traceback" and tested the debug version on the second dataset with 1 image per node. The debug version gave the correct answer, even with 1 image per node!

So, I am lost and cannot find the reason the program goes wrong with a specific dataset and a specific configuration (1 image per node). Using debug options did not help because no warning/error messages were written! Any hints, please?

The code sources can be found on github:

ttps://github.com/jvdp1/pcgcoarray

The dataset could be provided on request.

In advance thank you for the help.

Yours sincerely,

Jeremie

Michael_S_17 · ‎04-05-2018

Hi,
I did not test your coarray program, but believe to remember that ifort 17.0.0 had a bug with character components of a derived type coarray. This bug occurred only when compiling the codes without ifort's -check option; using the -check option for compilation did work well. Maybe your problem is somehow related to this. I further believe to remember that with ifort 17.1 the bug was fixed. (I am a little bit unsure about the versions yet).
Best Regards

Jeremie_V_ · ‎04-09-2018

Dear Michael,

Thank you for your answer. I guess it is related to the mentioned problem. I will try to find a more recent version of ifort, and test it on the same dataset.

The funny thing is that it only happens on this dataset. I tested on other (smaller/bigger) datasets, and the results were always fine (with and without the -check options).

Yours sincerely,

Jeremie

Issue with a distributed memory Coarray Fortran program