Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Comunicados
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

OMP: System error #1455: The paging file is too small for this operation to complete

SergeyKostrov
Contribuidor valorado II
14.872 Visualizações

The title of the thread describes the problem with MKL and I'll provide additional technical details.

0 Kudos
36 Respostas
SergeyKostrov
Contribuidor valorado II
11.570 Visualizações
1. On an Ivy Bridge system with 32GB of RAM and a 64-bit WIndows 7 platform set the following values for Windows paging file: Initial size (MB): 98304 Maximum size (MB): 131072 2. A test project is attached. 3. Recommended Matrix size is 16,384x16,384 4. Set OMP_NUM_THREADS and KMP_NUM_THREADS to 4 ( for both ) 5. Try to encrease values of these environment variables by two. The error has to be reproduced when OMP_NUM_THREADS and KMP_NUM_THREADS are set to 16 ( for both ) or higher values. 6. Screenshots will be posted. 7. MKL version is 11.0 Let me know if you have questions and thanks in advance.
SergeyKostrov
Contribuidor valorado II
11.570 Visualizações
[ Screenshot 1 ] forttestapp1.jpg
SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
[ Screenshot 2 ] forttestapp2.jpg
SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
[ Screenshot 3 ] pagingfiletoosmall.jpg Note: I wonder if the problem could be possibly related to default values of OMP_STACKSIZE or KMP_STACKSIZE?
Chao_Y_Intel
Moderador
11.569 Visualizações

Hello Sergey,

From the attached picture, it looks the memory (virtual memory) used by the applications is close to limitation. The max size about 130G for system and application almost use that size.

Also from the performance point of view, the virtual memory is far beyond the physical memory (32G) now. It may have much memory swapping, can create some bad performance.

Regards,
Chao

SergeyKostrov
Contribuidor valorado II
11.569 Visualizações

Hi Chao, I've provided example and lots of technical details in order to avoid any talks which I would rate as "speculative". Please try to reproduce the problem. On my side, I'll try to increase VM Max size and don't worry about performance of calculations since this is another issue Not related to the problem. Thanks in advance.

 

Chao_Y_Intel
Moderador
11.569 Visualizações

Sergey, 

I downloaded the code to have a check.  I did not find any MKL function call there. Did I miss something?  

Regards
Chao 

SergeyKostrov
Contribuidor valorado II
11.569 Visualizações

What about MATMUL Fortran call? Doesn't it use MKL ( parallel version ) indirectly?

SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
Hi Chao, Please take a look at new test results. Thanks. For all tests: No of rows N = 16384 No of columns N = 16384 Note: Test from 1 to 8 are with the following Windows paging file settings: Initial size (MB): 65536 Maximum size (MB): 98304 OMP_STACKSIZE - Default value KMP_STACKSIZE - Default value [ Test 1 ] OMP_NUM_THREADS=4 KMP_NUM_THREADS=4 ... Calculated ( in seconds ): 364.6980 ... Number of CPUs used: 4 Number of Threads used: 4 [ Test 2 ] OMP_NUM_THREADS=8 KMP_NUM_THREADS=8 ... Calculated ( in seconds ): 331.7810 ... Number of CPUs used: 8 Number of Threads used: 8 [ Test 3 ] OMP_NUM_THREADS=12 KMP_NUM_THREADS=12 ... Calculated ( in seconds ): 334.0590 ... Number of CPUs used: 8 Number of Threads used: 12 [ Test 4 ] OMP_NUM_THREADS=16 KMP_NUM_THREADS=16 ... Calculated ( in seconds ): 332.1400 ... Number of CPUs used: 8 Number of Threads used: 16 [ Test 5 ] OMP_NUM_THREADS=24 KMP_NUM_THREADS=24 ... Calculated ( in seconds ): 331.1880 ... Number of CPUs used: 8 Number of Threads used: 24 [ Test 6 ] OMP_NUM_THREADS=32 KMP_NUM_THREADS=32 ... Calculated ( in seconds ): 331.8120 ... Number of CPUs used: 8 Number of Threads used: 32 [ Test 7 ] OMP_NUM_THREADS=40 KMP_NUM_THREADS=40 ... Calculated ( in seconds ): 330.6590 ... Number of CPUs used: 8 Number of Threads used: 40 [ Test 8 ] OMP_NUM_THREADS=48 KMP_NUM_THREADS=48 ... OMP: Error #136: Cannot create thread. OMP: System error #1455: The paging file is too small for this operation to complete. OMP: Error #178: Function GetExitCodeThread() failed: OMP: System error #6: The handle is invalid. ... Number of CPUs used: N/A Number of Threads used: N/A
SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
[ Test 9 ] OMP_NUM_THREADS=48 KMP_NUM_THREADS=48 All tests with the following environment variables ( different values ) failed: OMP_STACKSIZE=512K KMP_STACKSIZE=512K Note: Test Failed OMP_STACKSIZE=256K KMP_STACKSIZE=256K Note: Test Failed OMP_STACKSIZE=128K KMP_STACKSIZE=128K Note: Test Failed OMP_STACKSIZE=64K KMP_STACKSIZE=64K Note: Test Failed OMP_STACKSIZE=32K KMP_STACKSIZE=32K Note: Test Failed OMP_STACKSIZE=16K KMP_STACKSIZE=16K Note: Test Failed / 32K used instead
SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
[ Test 10 ] Note: New Windows paging file settings: Initial size (MB): 98304 Maximum size (MB): 131072 OMP_NUM_THREADS=48 KMP_NUM_THREADS=48 OMP_STACKSIZE=512K KMP_STACKSIZE=512K ... Calculated ( in seconds ): 329.5980 ... Number of CPUs used: 8 Number of Threads used: 48 [ Test 11 ] OMP_NUM_THREADS=56 KMP_NUM_THREADS=56 ... Calculated ( in seconds ): 331.2350 ... Number of CPUs used: 8 Number of Threads used: 56 [ Test 12 ] OMP_NUM_THREADS=64 KMP_NUM_THREADS=64 ... OMP: Error #136: Cannot create thread. OMP: System error #1455: The paging file is too small for this operation to complete. OMP: Error #178: Function GetExitCodeThread() failed: OMP: System error #6: The handle is invalid. ... Number of CPUs used: N/A Number of Threads used: N/A [ Test 13 ] Note: New Windows paging file settings: Initial size (MB): 163840 Maximum size (MB): 196608 OMP_NUM_THREADS=64 KMP_NUM_THREADS=64 ... Calculated ( in seconds ): 332.8110 ... Number of CPUs used: 8 Number of Threads used: 64 [ Test 14 ] OMP_NUM_THREADS=80 KMP_NUM_THREADS=80 ... Calculated ( in seconds ): 328.1630 ... Number of CPUs used: 8 Number of Threads used: 80 [ Test 15 ] OMP_NUM_THREADS=96 KMP_NUM_THREADS=96 ... OMP: Error #136: Cannot create thread. OMP: System error #1455: The paging file is too small for this operation to complete. OMP: Error #178: Function GetExitCodeThread() failed: OMP: System error #6: The handle is invalid. ... Number of CPUs used: N/A Number of Threads used: N/A
SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
Consolidated results of processing are as follows: ... Calculated ( in seconds ): 364.6980 Calculated ( in seconds ): 331.7810 Calculated ( in seconds ): 334.0590 Calculated ( in seconds ): 332.1400 Calculated ( in seconds ): 331.1880 Calculated ( in seconds ): 331.8120 Calculated ( in seconds ): 330.6590 Calculated ( in seconds ): 329.5980 Calculated ( in seconds ): 331.2350 Calculated ( in seconds ): 332.8110 Calculated ( in seconds ): 328.1630 ... and, as you can see, there is no any performance impact ( from high values for Virtual Memory ( VM ) ). Actually, only 9GB of memory needed to calculate product of 16384x16384 matricies with MATMUL. My conclusions are as follows: - Issue is resolved when size of VM is increased - Smaller values for OMP_STACKSIZE and KMP_STACKSIZE do not resolve the issue - Possibly there is a problem with libiomp5md.dll and this is Not related to MKL - Possibly there is a problem with scalable_malloc or mkl_malloc functions
Chao_Y_Intel
Moderador
11.568 Visualizações

Sergey,

I have a quick run here, see one similar error report.  MATMUL is the Fortran intrinsic function, not a call to the MKL function, so the problem is not likely to be related to the scalable_malloc/mkl_malloc functions, but be related with the MATMUL function, and this function is auto-paralleled when compiling with /Qparalle witch.

Regards,
Chao

SergeyKostrov
Contribuidor valorado II
11.568 Visualizações
>>...Possibly there is a problem with libiomp5md.dll and this is Not related to MKL... Chao, Could you inform software engineers responsible for libiomp5md.dll about a possible problem with memory allocation? I've provided lots of technical details.
Chao_Y_Intel
Moderador
11.568 Visualizações

Hi Sergey,

I will move this thread to the Fortran forum, so more expert there could help. A quick summary for the problem:

When compiling with /Qparallel for the fortran intrinsic function MATMUL, it will report the bellow errors, even with the small matrix:
  OMP: System error #1455: The paging file is too small for this operation to complete

 The test code in the first comment could show the problem.

 

 

jimdempseyatthecove
Colaborador honorário III
11.568 Visualizações

Sergey,

Look at your screen shot 3. You circled the commit size of your Fortran app and circled the page file size to show the app used less page file than the total size. You forgot to include the commit sizes for devenv, WLTray, explorer, ... This gets you to ~130.583GB, then you must add the pageble portions of the Services (not shown on the tab). Also not shown in your commit number is the size of the failing allocation.

At least the system is taking a graceful exit in reporting to the application that it does not have the resources to create the thread. Some OS's will choke down and crawl at this point. (no error back to the app, no progress for anything else on the system)

Your experiment is pushing the system to break your app. Well, it did. Go out and buy a 256GB SSD and reserve all of that for your page file.

Jim Dempsey

SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
>>...At least the system is taking a graceful exit in reporting to the application... This is Not true and application hangs and I needed to use Windows Task Manager to end it. There is the test project and if you have a couple of free minutes you could try to reproduce the problem.
SergeyKostrov
Contribuidor valorado II
11.569 Visualizações
>>...I will move this thread to the Fortran forum, so more expert there could help... Thanks. However, I consider that the problem is related to libiomp5md.dll and memory allocation. MATMUL function works well and I didn't have any issues or problems. I also would like to stress that only 9GB (!) of memory is needed to calculate product of 16384x16384 matricies with MATMUL. It is Not clear why some function in libiomp5md.dll reserves excessive amount of memory ( see screenshots ).
jimdempseyatthecove
Colaborador honorário III
11.570 Visualizações

Sergey,

If you look at your 8 (9) thread screen shot at 27,320,672
And your 16 (17) before crash at 44,125,712
Subtract the 9 from 17 gives 16,805,040 for 8 additional threads
Divide by 8 gives 2,100,630 required per thread.
Your error message said omp could not add a thread when at 130,282,476 (+2,100,630 = 132,383,106) this exceeds your page file size.

Now, if we take 9 x 2,100,630 = 18,905,670 off the total for 9 threads we get 8,415,002KB for heap and program. This is in line with your 9GB estimate.

What is this extra thread, is it one you are spawning or a watchdog of OpenMP?

Jim Dempsey

SergeyKostrov
Contribuidor valorado II
9.978 Visualizações
>>Your error message said omp could not add a thread when at 130,282,476 (+2,100,630 = 132,383,106) this exceeds >>your page file size. This is correct and the question still remains why does it need so much Committed memory? >>What is this extra thread, is it one you are spawning or a watchdog of OpenMP?.. This is the thread for the main application.
Responder