Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

bug with open statement

jvandeven
New Contributor I
6,926 Views

I recently upgraded to VS2015 and IVF Composer 2016.  Since upgrading the Fortran compiler, the application I work with returns error code 30 on an OPEN statement.  A file is created, but cannot be written to.  I have tried various permutations of the OPEN statement including:

call get_unit( file_id )
open ( unit = file_id, file = file_name, status = 'replace', iostat = istat )
if ( istat.ne.0 ) then

    open ( newunit = file_id, file = file_name, status = 'replace', action = 'write', iostat = istat )
    open ( newunit = file_id, file = file_name, status = 'replace', iostat = istat )
    open ( newunit = file_id, file = file_name, iostat = istat )

None of these seems to help.  The affected code writes output to a number of CSV (comma separated varaible) files.  The OPEN statement works fine to begin with, but appears to fail after a sufficiently large number of data points have been written to disk (ie, it will write 10 variables of size X, and 8 variables of size Y, where Y>X).  I am returning to IVF Composer 2015, update 4 - which processes the data without incident - until this problem can be resolved.

I have tried doing a complete rebuild (both after "cleaning" the solution, and deleting the x64 working directory), tried working on different drives, tried rebooting the computer, and tried working on different computers.  None of these quick fixes helped.

Hope that someone is able to identify the root cause of this issue,

Justin.

 

0 Kudos
54 Replies
jimdempseyatthecove
Honored Contributor III
1,900 Views

The Windows error code will be more informative (or more confusing).

(this may be unrelated to your problem) On a different thread in this forum the user had similar problems where the errors would appear on one system and not the other. On the system where the error occurred it ended up being the system administrator installed one of these "continuous backup" programs. intermittently the backup program would open the file that the Fortran program would be writing (or would intend to overwrite) and that this would generate a sharing violation in the Fortran program. A similar situation can occur with Anti-Virus programs. The solution usually is to exclude those files/folders from being mucked with by the backup/AV.

Did you run the fixed IOUNIT test? (use same unit number for all files, assuming non-concurrent file writes).

Jim Dempsey

0 Kudos
jvandeven
New Contributor I
1,900 Views

jimdempseyatthecove wrote:

The Windows error code will be more informative (or more confusing).

(this may be unrelated to your problem) On a different thread in this forum the user had similar problems where the errors would appear on one system and not the other. On the system where the error occurred it ended up being the system administrator installed one of these "continuous backup" programs. intermittently the backup program would open the file that the Fortran program would be writing (or would intend to overwrite) and that this would generate a sharing violation in the Fortran program. A similar situation can occur with Anti-Virus programs. The solution usually is to exclude those files/folders from being mucked with by the backup/AV.

Did you run the fixed IOUNIT test? (use same unit number for all files, assuming non-concurrent file writes).

Jim Dempsey

I have pending deliverables, and was consequently getting a bit nervous.  I have consequently uninstalled the VF 2016.0 compiler, and reinstalled the 2015.4 compiler, and the error is no longer appearing (which is something of a relief).  This means that I have not been able to implement the suggestions you make above.  I will try to install the 2016 compiler tomorrow without removing the 2015 compiler, and will get back to you as soon as I have further progress.

Many thanks,

Justin. 

0 Kudos
jvandeven
New Contributor I
1,900 Views

jimdempseyatthecove wrote:

The Windows error code will be more informative (or more confusing).

(this may be unrelated to your problem) On a different thread in this forum the user had similar problems where the errors would appear on one system and not the other. On the system where the error occurred it ended up being the system administrator installed one of these "continuous backup" programs. intermittently the backup program would open the file that the Fortran program would be writing (or would intend to overwrite) and that this would generate a sharing violation in the Fortran program. A similar situation can occur with Anti-Virus programs. The solution usually is to exclude those files/folders from being mucked with by the backup/AV.

Did you run the fixed IOUNIT test? (use same unit number for all files, assuming non-concurrent file writes).

Jim Dempsey

The GETLASTERROR command returned a code 1450:

ERROR_NO_SYSTEM_RESOURCES
1450 (0x5AA)

Insufficient system resources exist to complete the requested service.

This is odd, as I do not encounter this issue using the 15.4 compiler.  

Interestingly, there is a thread entitled "IOSTAT 30 problems", that mentions the same error code.  In that post Steve Lionel suggests using perfmon to diagnose the cause.  

Unfortunately, this issue exceeds my technical understanding - why is it that past data writes are affecting the resources available after they have been completed and the associated file closed?  

0 Kudos
jvandeven
New Contributor I
1,900 Views

I should also add that I set the file_id = 1 for all instances of the csv_file_write_r subroutine, and this had no effect.

0 Kudos
jvandeven
New Contributor I
1,900 Views

Added to the above, I have now, all with no effect:

  1. fixed the unit number (file_id) to 1
  2. added a sleep step into the code following the error, and then re-tried the open statement
  3. suppressed the /assume:buffered_io optimisation
  4. added a call to the commitqq(unit) function

The description here was somewhat informative, and supports all of the recommendations that Jim and App4619 make above. Unfortunately this problem remains unresolved.

The error is reported when the 16.0 compiler is used, and 59.4MB of data have been written via the csv_file_write_r subroutine that is described above.   It does not appear when the 15.4 compiler is used. 

0 Kudos
andrew_4619
Honored Contributor III
1,900 Views

I word of caution either use use setlasterror(0) or call getlasterror before and after the open statement to ensure the  ERROR_NO_SYSTEM_RESOURCES was in fact generated by the open. I am pretty sure it will be but assumptions often make an ass of me!

 Are you always running the application from VS? If you close down all other applications and a shed load of background applications  and services (e.g. run in safe mode) does the program behave in exactly the same way? This might resolve that the issue is windows being over stressed for some reason which would need further investigation.

You could also try a sleep BEFORE the open as it seems once this error as happened there is no recovery?. 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,900 Views

Possible causes:

https://social.msdn.microsoft.com/Forums/vstudio/en-US/e4414c2e-664d-4ad6-9c93-c08aa5306239/possible-causes-for-errornosystemresources-error-during-fwrite?forum=vclanguage

The reason for the ERROR_NO_SYSTEM_RESOURCES(1450) is that on x86 (32-bit) or IA64 (64-bit) systems, the maximum buffer size is just under 64MB. For X64 systems, the maximum buffer size is just under 32MB. The maximum unbuffered read and write size limits are imposed by the design of the IO manager inside the Windows executive. When an application reads or writes files that are opened with FILE_FLAG_NO_BUFFERING, the IO Manager locks the application's buffer into physical RAM and then maps the virtual addresses into physical addresses to pass to the disk device by making a memory descriptor list (MDL). The buffer size limitation comes from the maximum size MDL that the IO Manager will create. The reason for the difference between platforms is the way the maximum buffer size is calculated from the memory page size and pointer size.

http://cbloomrants.blogspot.com/2009/03/03-12-09-errornosystemresources.html

In terms of File IO, this can hit you in a whole variety of crazy ways :

1. There's a limit on the number of file handles. When you try to open a file you can get an out-of-resources error.

2. There's a limit on the number of Async ops pending, because the Kernel needs to allocate some internal resources and can fail.

3. There's a limit on how many pages of disk cache you can get. Because windows secretly runs everything you do through the cache (note that this is even true to some extent if you use FILE_FLAG_NO_BUFFERING - there are a lot of subtleties to when you actually get to do direct IO which I have written about before), any IO op can fail because windows couldn't allocate a page to the disk cache (even though you already have memory allocated in user space for the buffer).

4. Even ignoring the disk cache issue, windows has to mirror your memory buffer for the IO into kernel address space. I guess this is because the disk drivers talk to kernel memory so you user virtual address has to be moved to kernel for the disk to fill it. This can fail if the kernel can't find a block of kernel address space.

5. When you are sure that you are doing none of the above, you can still run into some other mysterious shit about the kernel failing to allocate internal pages for its own book-keeping of IOs. This is error 1450 (0x5AA) , ERROR_NO_SYSTEM_RESOURCES.

...

So I have made sure I don't have too many handles open. I have made sure I don't have too many IO ops pending. I have made sure my IO ops are not too big. I have done all that, and I still randomly get ERROR_NO_SYSTEM_RESOURCES depending on what else is happening on my machine. I sort of have a solution, which seems to be the standard hack solution around the net - just sleep for a few millis and try the IO again. Eventually it magically clears up and works.

BTW while searching for this problem I found this code snippet : Novell Eclipse FTK file io . It's quite good. It's got a lot of the little IO magic that I've only recently learned, such as using "SetFileValidData" when extending files for async writes, and it also has a retry loop for ERROR_NO_SYSTEM_RESOURCES.

Further investigation reveals that this problem is not caused by me at all - the kernel is just out of paged pool. If I do a very small IO (64k or less) or if I do non-overlapped IO, or if I just wait and retry later, I can get the IO to go through. Oh, and if you use no buffering, that also succeeds.

Hope this leads you to a solution.
Jim Dempsey
0 Kudos
jvandeven
New Contributor I
1,900 Views

Following app4619's suggestion, I checked the error code prior to the call that I know is failing.  This did not change from the start of the I/O block of code through until the suspect call, suggesting that the call is where things go wrong (2 throughout - ERROR_FILE_NOT_FOUND).

I also added the sleep command with a lag of 5s before the open call, again following app4619's suggestion (as soon as the iostat=30 error comes up, there seems to be no going back).  This didn't alter anything either.

The blog by cbloom that Jim quotes above is the same one that I tagged - it is the clearest explanation that I have found, even if the suggested solutions have not worked in my case.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,900 Views

Jvandeven,

I seem to recall some post, some where, (not much help is it) that there was a similar problem that was corrected by changing a Windows parameter that affected file caching and/or file write buffering. Sorry I do not have the link. Searching for "windows file caching problem" of "windows file buffering problem" might yield an informative hit.

Jim Dempsey

0 Kudos
jvandeven
New Contributor I
1,900 Views

Jim - thanks for the additional pointer.  Unfortunately, I have just looked for over an hour and turned up nothing immediately useful.  I have started to think of alternative ways to write the associated output, and will continue to use the 15.4 compiler in the meantime.

Justin.

 

0 Kudos
Steven_L_Intel1
Employee
1,900 Views

Might you be able to reduce your application to something smaller that still shows the problem?

What is the project setting of Fortran > Libraries > Use Runtime Library? 

Are you using the 15.0.4 compiler under VS2015? Do you have an older VS you can try building the 16.0 version with?

0 Kudos
jvandeven
New Contributor I
1,900 Views

Steve Lionel (Intel) wrote:

Might you be able to reduce your application to something smaller that still shows the problem?

What is the project setting of Fortran > Libraries > Use Runtime Library? 

Are you using the 15.0.4 compiler under VS2015? Do you have an older VS you can try building the 16.0 version with?

Hi Steve - I have created a small solution that I attach here.  This solution replicates the error on my system on the fourth attempt to save the data (it saves the third attempt as intended).  I have tested this using all three "format" options (defined by "flag" in the SUBROUTINE csv_file_write_r), and using alternative datasets, and it appears in every case I have considered. I am using both 15.0.4 and 16.0 compilers under VS2015 (OS is Windows 10).  The problem is not encountered with 15.0.4.

The project setting of Fortran > Libraries > Use Runtime Library is "Multithreaded"

I will try to test using VS2010 later today,

Justin.

0 Kudos
jvandeven
New Contributor I
1,900 Views

An update - I have recompiled the example solution I attached in my previous post using the 16.0 compiler in 32bit (I use 64bit by default) through VS2010 on a separate computer.  The same issue I note above was replicated.

0 Kudos
Steven_L_Intel1
Employee
1,900 Views

So far I've been unable to reproduce the problem, using 16.0 under VS2015. I can get it up to 15 iterations without error. The one thing I changed was the path to where the file gets created - I used C:\TEMP. I can't see why that would make a difference but you might want to try it for yourself.

Also, when it comes to issues like this there are sometimes background programs (AV, backup) that grab channels to files and temporarily "lock" them, causing problems. I don't know if that's the case for you.

0 Kudos
jvandeven
New Contributor I
1,900 Views

Steve Lionel (Intel) wrote:

So far I've been unable to reproduce the problem, using 16.0 under VS2015. I can get it up to 15 iterations without error. The one thing I changed was the path to where the file gets created - I used C:\TEMP. I can't see why that would make a difference but you might want to try it for yourself.

Also, when it comes to issues like this there are sometimes background programs (AV, backup) that grab channels to files and temporarily "lock" them, causing problems. I don't know if that's the case for you.

Your message made me think that it is possibly a problem associated with Windows 10, as that OS is on both of the computers I had tested the problem on.  I have re-tested the problem on a Windows 7 (professional) OS, and the same error comes up with the compiler 16.0, on iteration 4.  

Each of the three computers I have now tested on have very different specs (one dual Xeon E5-2670v3 with 64GB of ram, one i5-4200 with 8GB of ram, and one old i7 with 4GB of ram).  They have different OS (Windows 7 and Windows 10).  They have different VS (2010/2015).  One important common factor between the three is that they have the same compiler (Intel Parallel Studio XE 2016, Fortran Composer edition).

I have also tried to fiddle a little with my code.  If I economise on the "write" statements (e.g. when defining the fmat variable in the csv_data_append routine of the solution that I had previously attached), then the error comes up after a larger number of iterations (5 rather than 3 in the version I tested).

0 Kudos
jvandeven
New Contributor I
1,900 Views

I should note that I have also been running this test from a variety of folder locations (including c:\temp), none of which makes any difference.  Furthermore, I have run the test after shutting down as many non-essential processes as I could identify (including real-time virus checking), which also didn't have any effect - in all cases (except the one where I limited use of write statements), the application built with the 16.0 compiler returns a iostat=30 error code, with windows error 1450, ERROR_NO_SYSTEM_RESOURCES.  Could it be Composer that is at fault?  Or the fact that I have the Fortran only version of the compiler (no C++)?

 

0 Kudos
Steven_L_Intel1
Employee
1,900 Views

That you have the Fortran-only version would make no difference whatsoever.

I see in another thread a user is reporting that internal writes are using up handles - there may be a connection there. I will take this up with the I/O library folks in the morning.

0 Kudos
Steven_L_Intel1
Employee
1,900 Views

I instrumented your test program for handle consumption and didn't see a problem there, so it must be something different.

0 Kudos
jvandeven
New Contributor I
1,900 Views

Steve Lionel (Intel) wrote:

I instrumented your test program for handle consumption and didn't see a problem there, so it must be something different.

Were you able to replicate my error prior to testing for handle use?

0 Kudos
Steven_L_Intel1
Employee
1,956 Views

No.

0 Kudos
jvandeven
New Contributor I
1,956 Views

Despite repeated trials, I have failed to find a set-up that gets through the test code without an error.  I have few ideas about how to proceed. Is this an issue that I can pursue through "premier support" (I have never used this service before)?

0 Kudos
Reply