Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Cycles spent requests to DRAM on Ivy/Sandy Bridge

Matthew_C_6
Beginner
660 Views

Hi,

I was wondering if anyone could point me to the proper hardware counters (and equation) for measuring the fraction of cycles spent servicing memory requests (loads/stores) to DRAM on Ivybridge/Sandybridge?

0 Kudos
3 Replies
McCalpinJohn
Honored Contributor III
660 Views

It depends on what you mean by "cycles servicing memory requests to DRAM"....

For the Xeon E5-2600 and Xeon E5-2600 v2 processors, you can read the memory controller DRAM CAS counters to determine the exact number of memory reads and writes that went to the DRAMs.  Each transaction takes four DRAM (major) cycles, so you can easily compare the amount of time that the DRAM is busy with data transfers against the elapsed wall clock time.

0 Kudos
SudarshanSrinivasan
343 Views

Dear McCalpinJohn, 
Is  this true  ie. each transaction taking four DRAM (major) cycles in the case of other  processors as well ?

I have a Rocket lake 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz  and would like to measure the cycles spent when the DRAM was busy servicing the requests. 

 

Thanks 

Sudarshan S

0 Kudos
McCalpinJohn
Honored Contributor III
157 Views

DDR3 and DDR4 DRAMs are designed for "burst of 8" data transfers operating on the full DIMM width of 64 bits (or 72 bits with ECC), so the transfer size is 64 Bytes (72 Bytes with ECC) for any read or write operation.  These are "double-data-rate" technologies, so a "burst of 8" occupies four DRAM clock cycles (with transfers on both rising and falling edges of each cycle).   It is possible to "chop" a burst in half, but I have never seen this used for cacheable memory accesses in a production system.

DDR5 is designed for "burst of 16" data transfers operating on either of the DIMMs two sub-channels (each with 32 bits for data and either 36 bits or 40 bits for data plus ECC).  This provides the same minimum transfer size of 64 Bytes (data) as DDR3/DDR4, with each access occupying a sub-channel for 8 DRAM clock cycles (16 data transfers), rather than the 4 cycles (8 transfers) of earlier generations.

0 Kudos
Reply