- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
I have a program ( written on C++ and compiled by Intel Parallel Studio XE 2018 ), which needs a lot of memory & CPU cores ( MPI ). During execution maximum memory usage for test example of input data is about 110 Gb. When I run it on server with 512 Gb ( or above ), the computation time is124 min. When I take off 10 ( of 12 ) memory module, and total memory reduces to 128 Gb, the computation time reduces to 30 min.
Server Lenovo, 2 x Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz, Windows Server 2016 Standard. The memory modules are DDR4 64Gb ECC 2666MHz, passed all tests.
The same effect I see on HP, Dell, SuperMicro servers.
Can anybody explain this ?
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you run VTune on this? Either "HPC Performance Characterization" or "Memory Access Analysis" should provide relevant information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may be of use: https://lenovopress.com/lp0697.pdf
Of particular interest may be Page 9, Socket Interleave.
Due to your program being MPI, you best performance may be NUMA. If the memory is not set to NUMA, then memory allocations will be distributed across sockets, and, depending on the physical placement, your program could have the unfortunate luck of having its heavily used RAM located on the other socket. IOW bad luck of the draw.
Another potential source might be is depending on the Page Size and total RAM, the number of TLB's required during execution may be fewer in one case and more in the other. TLB's are part of the Virtual to Physical memory address translation. This is not a cache of the data, but think of it as a cache of the page tables. A miss on a TLB requires accessing the page table(s).
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page