topic KMP_MALLOC vs allocate in Intel® Moderncode for Parallel Architectures

KMP_MALLOC vs allocate

jim_dempsey — Sat, 03 Dec 2005 04:18:03 GMT

When running within an OpenMP thread does the Fortran allocate obtain memory from the thread local heap as does KMP_MALLOC?

Jim Dempsey

Re: KMP_MALLOC vs allocate

Henry_G_Intel — Sat, 03 Dec 2005 07:52:39 GMT

Hi Jim,

The Fortran ALLOCATE intrinsic function does not allocate memory from the thread stack, even if it is called within an OpenMP parallel region. Thread stacks are generally small and it's not always necessary to allocate memory in thread-private storage.

Henry

Re: KMP_MALLOC vs allocate

jim_dempsey — Sat, 03 Dec 2005 12:10:17 GMT

I refer not to stack allocate but rather to heap allocation from a heap local to the default processor for a thread. This applies to NUMA based systems where memory is distributed in multiple nodes and access is not uniform. See:

http://www.microsoft.com/whdc/system/platform/server/datacenter/numa_isv.mspx

for information on MUMA systems

Jim Dempsey

Re: KMP_MALLOC vs allocate

Henry_G_Intel — Fri, 09 Dec 2005 06:16:40 GMT

Jim,

As far as I know, the Intel 9.0 compiler does not generate NUMA-specific code. The Fortran 90 ALLOCATE intrinsic simply allocates memory from the global heap. I'll try to get someone from the compiler team to verify this.

Henry

Re: KMP_MALLOC vs allocate

jim_dempsey — Fri, 09 Dec 2005 23:57:54 GMT

Thanks. When you refer the question include the following additional information.

The platform is WinXP Pro SP2 but installed from my MSDN subscription. i.e. Installation was to WinXP, then Windows Updated (several times) until through SP2. Also modified BOOT.INI to inclued /PAE.

I see no performance difference as I migratea pair ofthreads between processors on a 2-node NUMA system with 4 cores.

I believe I have the system BIOS set to not interlieve the NUMA nodes. Maybe that isn't functioning on the system BIOS. Because if all the memory were allocated on one node then you would expect a performance change as the processing moved from one node to the other (while data remained in the node of allocation).

I am trying to get the most out of the system.

Jim Dempsey