- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know the mapping inside the MCDRAMs is not documented. The mapping of cache lines to MCDRAM controllers is easy enough to determine using the hardware performance counters.
Bandwidth testing in "Flat-Quadrant" or "Flat-All2All" modes shows big performance drops when accessing arrays that are separated by a multiple of 64 KiB. This suggests that each of the 8 EDC controllers uses an interleave that results in a bank conflict every 8 KiB, but the details have not been disclosed.
Given measurements from directed benchmarks and knowledge of the size of the MCDRAM (8 banks of 2 GiB each), one can speculate about the lower-level details. Some of these speculations lead to testable hypotheses, but the limited EDC performance counter events make it difficult to disambiguate among possible implementations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ KNL Modes: MCDRAM = Flat - Cluster = Quadrant ] [ NUMA Information ] [guest@xxxx-xxxx ~]$ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 node 0 size: 98178 MB node 0 free: 95051 MB node 1 cpus: node 1 size: 16384 MB node 1 free: 15934 MB node distances: node 0 1 0: 10 31 1: 31 10 [ Test 1.1 - hbw_malloc ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_malloc HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 4.00 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 608 ms ) Iteration: 2 - HBW Memory Processed ( 611 ms ) Iteration: 3 - HBW Memory Processed ( 610 ms ) Iteration: 4 - HBW Memory Processed ( 610 ms ) HBW Memory Released Processing Completed [ Test 1.2 - hbw_malloc ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_malloc HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 8.00 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 1231 ms ) Iteration: 2 - HBW Memory Processed ( 1229 ms ) Iteration: 3 - HBW Memory Processed ( 1229 ms ) Iteration: 4 - HBW Memory Processed ( 1227 ms ) HBW Memory Released Processing Completed [ Test 1.3 - hbw_malloc ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_malloc HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 16.00 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 191376 ms ) Iteration: 2 - HBW Memory Processed ( 194846 ms ) Iteration: 3 - HBW Memory Processed ( 197098 ms ) Iteration: 4 - HBW Memory Processed ( 197608 ms ) HBW Memory Released Processing Completed [ Test 1.4 - hbw_malloc ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_malloc HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 15.36 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 2353 ms ) Iteration: 2 - HBW Memory Processed ( 2331 ms ) Iteration: 3 - HBW Memory Processed ( 2347 ms ) Iteration: 4 - HBW Memory Processed ( 2346 ms ) HBW Memory Released Processing Completed [ Test 2.1 - hbw_posix_memalign ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_posix_memalign HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 4.00 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 614 ms ) Iteration: 2 - HBW Memory Processed ( 614 ms ) Iteration: 3 - HBW Memory Processed ( 607 ms ) Iteration: 4 - HBW Memory Processed ( 608 ms ) HBW Memory Released Processing Completed [ Test 2.2 - hbw_posix_memalign ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_posix_memalign HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 8.00 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 1229 ms ) Iteration: 2 - HBW Memory Processed ( 1231 ms ) Iteration: 3 - HBW Memory Processed ( 1232 ms ) Iteration: 4 - HBW Memory Processed ( 1231 ms ) HBW Memory Released Processing Completed [ Test 2.3 - hbw_posix_memalign ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_posix_memalign HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 16.00 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 191176 ms ) Iteration: 2 - HBW Memory Processed ( 197176 ms ) Iteration: 3 - HBW Memory Processed ( 197110 ms ) Iteration: 4 - HBW Memory Processed ( 199635 ms ) HBW Memory Released Processing Completed [ Test 2.4 - hbw_posix_memalign ] Processing Started KNL Modes : MCDRAM = Flat - Cluster = Quadrant HBW Memory Available HBW Memory Policy : HBW_POLICY_BIND HBW Memory Allocated by : hbw_posix_memalign HBW Memory Allocation Error Code: 0 HBW Memory Allocated : 15.36 GB HBW Memory Initialization HBW Memory Processing Iteration: 1 - HBW Memory Processed ( 2351 ms ) Iteration: 2 - HBW Memory Processed ( 2362 ms ) Iteration: 3 - HBW Memory Processed ( 2358 ms ) Iteration: 4 - HBW Memory Processed ( 2365 ms ) HBW Memory Released Processing Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
~80x slower !?!?!
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since you are asking for more memory than what is available, it is not clear what the system is doing under the covers. Depending on how the system is configured, it might even do something as stupid as swapping pages to/from MCDRAM.
If you intend to use only MCDRAM, then you should use an interface that causes the allocation to fail if sufficient MCDRAM is not available. "numactl --membind=1" will manage this without requiring program changes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Because the policy is HBW_POLICY_BIND, libnuma is instructed to use only the high bandwidth NUMA node. When that is full, no choice but to swap out to the file system. The NUMA node with DRAM is unavailable because of BIND.
Try HBW_POLICY_PREFERRED, which will allocate in other NUMA node when high bandwidth NUMA node is full, and so the DRAM will be used when MCDRAM is exhausted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yet another reason why we run all of our compute nodes with swapping completely disabled....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my tests the Out Of Memory killer is triggered.
I don't get an error code until I ask for more than 96 GB, which is how much DRAM memory is in the system.
To contact the memkind developers try the mailing list, memkind@lists.01.org
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A concept of using Virtual Memory ( VM ) is not new and it is used since times of DEC VAX/VMS OS
Actually it is about twenty years older than that. (1959 on the Atlas machine in Manchester). https://en.wikipedia.org/wiki/Virtual_memory ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From a former memkind developer: Handling out-of-memory conditions in Linux kernel is complicated and strongly depends on system configuration. In most common scenario NULL will be returned by libnuma only if you allocate more memory than amount of free physical memory in system at the time of allocation of virtual memory. Memkind documentation is bit inaccurate in that matter; it is rather nontrivial to write comprehensive explanation which would cover every possible kernel behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah, good old VMS -- those were the days. Miss having OS automatically keep last 5 versions of a file. (Relying on GNU/Emacs for that now.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am also talking about MCDRAM.
I get Out Of Memory killer when I fault slightly more memory than is currently available on the NUMA node with MCDRAM, using either memkind with HBW_POLICY_BIND policy or numactl --membind.
The memkind documentation for HBW_POLICY_BIND is oversimplified. Linux of course does a deferred page allocation, and the kernel behavior can be quite complex when allocating more memory than available on a NUMA node. If for example an application grabs just enough memory to thrash against the kernel''s ~0.5 GB memory in the MCDRAM, the application could become extremely slow.
And so for the Intel Xeon Phi x200 processor it is a best practice to use HBW_POLICY_PREFERRED, It could be argued that BIND is simply there as an option because it always has been, but lacks a practical use for this specific processor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Binding is definitely useful if swapping is disabled, since it prevents you from silently getting pages allocated where you don't want them.
Failover allocation to the wrong NUMA node can be very bad for performance testing or for multi-node (synchronous) production jobs. Better to have the job fail immediately than to either get misleading performance results or waste time by having many nodes waiting on a slow node.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page