Hi, I am running my program with parallel computing and the program needs to read many files within each thread. I save the i/o unit to different numbers and make sure the i/o units for each file are not in conflict for different threads.
However, when I run my program, sometimes the program will show errors below:
forrtl: No such file or directory
forrtl: severe (29): file not found, unit 5012707, file /'foler'/fort.5012707
This error might occur at different times and in different places. I think this is because the file name is changed, I am not sure why the model can open the correct file and connect to an i/o unit. Then, the i/o unit doesn't recognize the file name after that.
I attached my source code and the data that needed.
Can anyone help me with this? Thank you very much.
I can tell you that the filename in the error message is what you get when you have a READ or WRITE to unit 5012707 without OPENing it first. It might be that something closed that unit when it shouldn't.
Which compile options did you choose?
When your program prints the error message, it should also have given you a line number traceback, from which you can ascertain the statements(s) that failed to be executed.
Yeah, I have the them (shown below)
program 00000000005A2CB8 Unknown Unknown Unknown
program 0000000000483507 updateprofile_dai 185 updateprofile.f
program 000000000047F326 hyd_run_ 65 hyd_run.f
program 000000000047ED6F MAIN__ 22 hru_loop.f
libiomp5.so 00001514D0362623 Unknown Unknown Unknown
However, this error occurs at different times and in different places.
As Steve pointed out, I think it might be that something closed that unit. But I am not sure what closed the i/o unit. I cannot figure this out. This only happens with parallel computing and with many threads.
I tried a run with only two threads, the program went through. But two thread is too slow for me.
I am not sure what version I am using since I am compiling it in the supercomputer from my school. I will need to ask.
But do you think the errors occur because of the bug? I am not sure if my source code is correct (I think it is, but just not sure).
If there is a bug related to this, it will be in the run-time library and not the compiler. Sadly, the 2021 oneAPI installers do not update the run-time library (at least on Windows - not sure how it works on Linux.) You may need to install the separate "standalone" run-time installer from https://software.intel.com/content/www/us/en/develop/articles/oneapi-standalone-components.html
Do you think the errors occur because of the bug?
So you think I should know what runtime version I am using, then install another one from the website?
The runtime version for Linux are: APT, YUM and DNF, and Intel oneAPI Runtime Libraries, am I correct?
Thank you very much.
Your code has some bugs, I think.
There are some local variables that are used before they have been set; there are a couple of instances of DO loops whose index variable is REAL.
It is likely that the presence of undefined variables may cause the program to abort in an unpredictable way. Finding and fixing these bugs is not going to be easy since your program is quite complex and a single run may take many hours, and creates, reads and writes thousands of files.
I do not know if a bug in the run-time library is responsible for the behavior you see. I was just saying that the compiler itself is not involved.
I may be wrong, but it appears that there is a 2021 version of Hydrus that is available. Your version is noted as 2009, this means that a lot of errors in Fortran that were not necessarily picked up in compilers in 2009 are now being picked up. Also there are 12 years of bug fixes.
If the two Hydrus are from the same base code, when you publish any results, some one like me reviewing the paper will ask why you did not use the latest one.
Of course your Hydrus may be completely different, but it appears to be very similar.
Siyuliben: In addition to the bugs that I mentioned above, the source code that you posted may have another problem, which it shares with the other codes such as SWMS_2D, etc., written by the same group of authors.
This bug pertains to insufficient accuracy in the calculation of 1/tanh(x) - 1/x for small values of x when the FPU does not promote intermediate results to ten-byte reals, as the X87 did. This inaccuracy my affect subroutine Pecour in file solute.f90.
For details of this bug and a solution, see my post in the PC-Progress user forum.
I am not sure if I could ask this question here, but I saw you also used gfortran to compile the source code.
I tried to use gfortran to compile the program with '-fopenmp' to see if gfortran will work. But when I run the program, it does not create any threads.
The environment was set to 64 using 'set omp_num_threads = 64'
I used 'gfortran -o program.exe *.f *.f90 -fbacktrace -fdollar-ok -fopenmp' to compile the file.
I recommend that you work first on getting rid of the bugs in your source code, using whichever compiler/OS is most effective in catching and fixing bugs. Only after doing that does it make sense to consider using Openmp and other ways of parallelizing the program. I also suspect that your modifications to Hydrus create far more temporary files and do needless I/O to those files.
If your were to describe how you modified the Hydrus original sources and what your objectives are, it would be possible to give more constructive comments.
Thank you, I will fix the bug you mentioned above.
It didn't show errors without parallelizing the program. So I thought the source code was ok.
I will follow your instruction.
I didn't modify much about the original source code. I just changed the i/o units to make it work for parallelizing, and also take some variables out of the HYDRUS subroutine and use those variables to create new input files.