On my Xeon Phi 7250 systems, the "energy unit" that I read from MSR_RAPL_POWER_UNIT register works out to 61.04 micro-Joules. This is consistent with what I see on my Xeon E5-2690 v3 systems, and makes sense for a 215 Watt part.
BUT, when I run STREAM from DDR4 memory on this system, it is clear that this unit is much too large. Average DRAM power computed for a 60-second STREAM run using this value for the energy unit is over 103 Watts. This is about 17 Watts per DIMM, which should not be possible. If the DRAM energy unit is really supposed to be 15.3 micro-Joules (as it is on Xeon E5 v3 and Xeon E5 v4), the power consumption would be about 4.5 Watts per DIMM, which is pretty close to what I get from the Micron DDR4 power consumption spreadsheet (using the observed bandwidths and estimates of the page hit rates extrapolated from Xeon E5 v3 results.) The 103 Watt number also makes no sense when compared to the 75 Watt maximum DRAM power listed in the MSR_DRAM_POWER_INFO register.
Normally this model-specific information about the DRAM energy unit would be in Volume 2 of the processor datasheet (as it is for Xeon E5 v3 and Xeon E5 v4), but that document does not appear to have been released yet for the Xeon Phi x200 series....
I put in a request to include the override of the DRAM energy unit for Xeon E5 v3 and v4 into the tacc_stats package (https://github.com/TACC/tacc_stats), and am sufficiently convinced by my measurements to request that it be included for Xeon Phi x200 as well. It would be nice if Intel confirmed or denied this so that I can quit thinking about it....
Have you placed an amp meter (or watt meter) on your system? You may be able to indirectly determine the DDR4 power consumption by varying the number of sticks verses total power consumption while running your Stream benchmark. You will have to figure in CPU power lost waiting for data. Also Micron may have a special DDR4 chip riser that can be used to measure statistics during the run.
It would probably be possible to guesstimate the power consumption from the DIMMs by measuring wall power while running with 1,2,3,4,5,6 DIMMs installed, but that would be a fairly inconvenient project to undertake.
DIMM risers are certainly the ideal way to measure power, but the ones I know about are specialized and expensive pieces of equipment (since they are intended to tap the full-speed signal lines as well as the much easier DC lines).
In this case it really should be a lot easier....
I have (essentially) the same DIMMs running the same workload on a Haswell (where the DRAM energy unit is clearly defined as being a fixed 15.3 micro-Joules per increment) showing about 4.5 Watts/DIMM, while on KNL the corresponding value is 17.2 Watts/DIMM if I use the RAPL energy unit from the MSR_RAPL_POWER_UNIT MSR (or 4.3 Watts/DIMM if I use 15.3 micro-Joules per increment instead).
Micron's power estimation spreadsheets are traditionally loaded with very conservative (high) values, and I can't get it to estimate a power level higher than 6.6 Watts/DIMM at the observed bandwidth levels and worst-case page hit rates. If I replace the IDD/IPP values with the ones from the Samsung DIMMs that we actually have installed in our KNL systems, the spreadsheet estimate drops to 5.4 Watts/DIMM. This is likely still a conservative (high) estimate because no DRAM vendor wants parts to be returned because they consume more power than the datasheet specifies. If I assume that the RAPL energy unit for DRAM is supposed to be 15.3 micro-Joules, then the RAPL estimate is about 4.3 Watts per DIMM, which is effectively exactly what I expect (and what I have measured on Haswell systems).
So it looks like an open-and-shut case of missing documentation, and yet I am still here wasting my time and energy trying to get the issue closed....
I would imagine that TACC has provided you with a dedicated workstation with 1, 2 or 4 KNL's. While you may not get the watts/DIMM, you could get the total watts per vendor DIMM running your Stream Benchmark, with the system configured for each vendor DRAM. Using that data, you can rank the vendor's DIMM relative performance verses difference in watts. Don't forget to get the idle watts as well.
With the number of nodes you have, I would physically test each prospective vendor's DRAMs.
How do you get ~81.5 cores per node? (6400 nodes, 522080 processing cores)
Assuming 6400 KNL's, each with 6 memory sticks, at 4.5 Watts = 172.4 KW, x17.2 = 660 KW
Selecting the wrong memory vendor can be an expensive decision.
Our system vendors usually choose the DRAM supplier, but this is a tiny part of our power budget. Even for STREAM, the DRAM contributes less than 20% of the node power.
Stampede is a mildly heterogeneous system, so the core counts are not obvious....
- All 6400 compute nodes have 2 8-core Xeon E5-2680 processors, accounting for 102,400 cores.
- All 6400 compute nodes have at least one 61-core Xeon Phi SE10P Coprocessor, adding 340,400 more cores (running total 492,800).
- 480 of the compute nodes have two 61-core Xeon Phi SE10P Coprocessors, adding 29,280 more cores (running total 522,080)
- 128 of the compute nodes have an NVIDIA K20 GPGPU, but we don't count those cores in the total.
- Stampede also includes some large memory nodes, login nodes, filesystem nodes, etc, but those cores are not counted in the total.