Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29248 Discussions

32 bit Code Compiled on Win XP Pro 64 with IVF 10 and VS 2005 Will Not Run On Win HPC Server 2008

jregan5
Beginner
1,259 Views
Hi all.

I've got a program that was compiled in 32 bit mode on a 64 bit machine running Windows XP Pro 64 using Intel Visual Fortran 10 and Visual Studio 2005. When I take this code over to a 64 bit machine running Windows HPC Server 2008, the code will not run. No error messages or anything, just nothing happens whether I double click or run from the console (it's a number crunching console program).

The server has IVF 11 and VS 2008 installed on it, and when I try compiling a 32 bit version there, I get the same results (or rather lack thereof). Any clues as to where I should be looking for a solution?
0 Kudos
8 Replies
Steven_L_Intel1
Employee
1,259 Views
First thing I would do is compile with /heap-arrays (Fortran > Optimization > Heap Arrays > 0) I have seen odd behavior like this when a program needs too much stack space.
0 Kudos
jregan5
Beginner
1,259 Views
Thanks for your response, Steve. Unfortunately, setting the Heap Arrays value to 0 had no effect.

Something else interesting (and irritating): when I try to debug the code with IVF11/VS2008 on the Win HPC 2008 server, a window pops up that says "Unable to start program {path and name of program}. Attempt to access invalid address."
0 Kudos
Steven_L_Intel1
Employee
1,259 Views
I wonder if this program has very large arrays that push into the 2GB static code and data limit. Is that possible? Please link a new program and enable the linker's map option. Attach the .map file to a reply here.
0 Kudos
jregan5
Beginner
1,259 Views
I have attached the requested file.
0 Kudos
Steven_L_Intel1
Employee
1,259 Views
Yep - that's the problem. You have some 1,981,906,556 bytes of static data, mostly COMMON (or module variables). This pushes the total static address space used by the program to close enough thast 2GB that Windows can't activate the image. Even 64-bit Windows has this limitation. But it is not quite over 2GB total, so the linker doesn't complain.

My advice is to replace the largest COMMONs and static arrays with ALLOCATABLE variables. The biggest offenders seem to be C_WOE_mp_IWOE and C_IGF_mp_IG.
0 Kudos
jregan5
Beginner
1,259 Views
That seems to have fixed the problem. Thanks!

As a side note, can you post a link (if you know one) about reading that map file?
0 Kudos
Steven_L_Intel1
Employee
1,259 Views
Ok, let's start with the first section. It looks like this:

[plain] Start         Length     Name                   Class
 0001:00000000 0030cbf3H .text                   CODE
 0002:00000000 000001bcH .idata$5                DATA
 0002:000001bc 00000004H .CRT$XCA                DATA
 0002:000001c0 00000004H .CRT$XCF                DATA
 0002:000001c4 00000004H .CRT$XCZ                DATA
...
 0002:0009a9e8 000007c8H .idata$6                DATA
 0002:0009b1b0 00000000H .edata                  DATA
 0003:00000000 0009ebc0H .data                   DATA
 0003:0009ebc0 76217e7cH .bss                    DATA
 0004:00000000 0003bfb8H .trace                  DATA
 0005:00000000 00000020H _RDATA                  DATA[/plain]
These are the "image sections" of the image. An image section is a named collection of storage that is contributed to by the various object files. The first is called .text, and as the "Class" shows, this is where all the actual instruction code goes. This is section 0001, it starts at offset 0 (into the section) and has length (in hex) of 30CBF3 bytes.

The next group is of various initialized data. There are several different named contributions to section 0002, and you can see how the linker laid them out in memory with start addresses and length. Nothing is too remarkable until we get to section 3, which is uninitialized data. There are two contributions to this, one called .data with length 3BfB8 (not very long) and one called .bss with length 76217E7C (yow!). So just looking at this I knew there was going to be trouble.

The next section tells you where each global symbol, either code or data, is laid out in memory. It starts out as:

[plain]  Address         Publics by Value              Rva+Base       Lib:Object

 0000:00000000       ___safe_se_handler_table   00000000     
 0000:00000000       ___safe_se_handler_count   00000000     
 0000:00000000       __except_list              00000000     
 0000:00000000       ___ImageBase               00400000     
 0001:00000000       _CSC_MODS.                 00401000 f   osc21_CSC-custom-modules.obj
 0001:00000008       _OSC_INTERFACES.           00401008 f   osc21_modules_CSC.obj[/plain]

This tells you the image section and relative address, the name, The base address where it actually got put in memory, asnd the name of the object that contributed the symbol. You will note some extra punctuation, or "decoration", added by the compiler to the names in your code.

As you go down this list, the Rva+Base address increases. Nothing gets too exciting until we get to image section 0003, the uninitialized data. Even then, it looks pretty boring (small contributions) until we get here:

[plain] 0003:001a4218       _C_WIFDEF_mp_BDWIF         0094e218     osc21_modules_CSC.obj
 0003:001a4220       _C_WOE_mp_IWOE             0094e220     osc21_modules_CSC.obj
 0003:11f0707c       _for__reentrancy_initialized 126b107c     libifcoremt:for_reentrancy.obj
[/plain]
Note how the address jumps a LOT between the start of _C_WOE_mp_IWOE and a piece of data from the Fortran run-time library. This decorated name means it is a variable named IWOE in module C_WOE (the _mp_ is the separator used for module names.) So this is the first big variable, some 299,249,244 bytes.

This is followed by lots more data, some of it large but not monstrous, and then we get to:

[plain] 0003:53be8760       _C_TPVRGB_mp_REDTPV        54392760     
 0003:53be87e0       _C_WKDATA_mp_SHRAT         543927e0     
 0003:7412e5e0       _C_WKDATA_mp_UNAREA        748d85e0     
 0003:74137320       _C_WATRKO_mp_AOPOS         748e1320     
[/plain]
And here we have SHRAT in module C_WKDATA at 542,400,000 bytes. (I may have misidentified some of the variables in my earlier post.)

None of these, by themselves, is a problem but the accumulation of the sizes puts the total static code and data so close to the 2GB limit that there is no room for the OS data structures and code that also go in this part of the address space, hence the exit.

I did say earlier that even 64-bit Windows has the same 2GB static limit, so even building this for 64-bits might not help, though it is close. And changing some of the arrays to allocatable would certainly help it run 64-bits, but a 32-bit program might still run out of memory for the allocate.
0 Kudos
jregan5
Beginner
1,259 Views
Very informative. Thanks again, Steve.
0 Kudos
Reply