Using Large Pages in C/C++ or Fortran Code under 64-bit Linux
was wondering if an Intel C or Forttan compiler (under 64--bit Linux) can be
instructed to generate a binary that can use large pages (say 2 MiB or 1GiB)
upon loading. Ideally, it should be possible to request, for instance,
to let the system allocate 2MiB pages for stack or heap segments.
this is not possible at compile time, is there any linker option which
can accomplish this ?
Or, can any binary request that it
would like to use large pages via some environment variable upon
execution? I am familiar with UNIX platforms where all of the above can be
accoplished but I was unable to find something similar with Intel
compilers and tools in Linux.
As far as I can tell, applications could gain a lot if it becomes easier to allow them use large pages selectively, especially those which suffer lots of TLB misses. Using large or regular pages should be not be visible to the code but just selectable. Especially for data / heap segments if it is not transparent one has to write their own malloc/calloc like the one you pointed to me. Then for Mulit-threadded code this becomes hairy .....
I am familiar with the Linux approach where SHM or
the shared file mmaped to a process with the large page attribute set
can request large pages for that particular segment. Unfortunately this
does not say how to let code use large pages for stack or data segments.
I think a good place to implement large page allocation is at the
loader: when the exec prepares the address space it could request
regular or large pages to support a particular segment. This will then
make it easily accessible to applications. Is there any plan by Intel to
make this happen earlier ? I know it is realy an OS issue but I am
certain the data and compute -intensive communities will really
Posting once again, just in case: Intel compilers don't provide their own malloc and the like; it's up to you to take care of it if you aren't satisfied with the defaults provided by your linux distro. If your kernel has been built with hugetlbfs enabled, you have the possibility of using a malloc() built so as to allocate from a huge page. Supposedly, this is a normal tactic for implementation of java, for example. When all you dynamic memory requirements can be satisfied from a single huge page, this tactic may be effective. If the application still requires multiple pages in spite of using huge pages, an extremely fast file system such as SSD, as well as careful programming to minimize paging, may be required to make it work.
The main motivation in my case, behind large pages is the minimization of TLB misses and the likely benefits from prefetching, which I believe is restricted to the page size.
In other UNIX systems I am familiar with, large pages can be requested at load time where say the data or stack segments can chose to use large pages vs regular ones. When say 64KiB pages are used vs 4KiB the data access benefits are significant.
Ideally one should be able to guide the exec() what page size to use for the different segments. In this way NO source code changes nor recompilations are necessary.
Cutting down on TLB misses by the use of huge pages doesn't necessarily compensate for the larger overhead involved in page swapping. Software prefetch can be issued across pages; while hardware prefetch stops at page boundaries, it also stops at strides large enough for the TLB effects to be significant. One could wish for a system designed to support 64KB pages, as you said, but huge pages don't often work as well as 64K pages might. The huge pages apparently were introduced for support in the role of graphics buffers.
I am interested in cases where paging does not occur often enough to cause significant problem. I have no experience with large pages in Linux. I have used them in UNIX systems where the results are in favor of a controlled use of them.