- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

There is a supercomputer of the Moscow State University "Lomonosov 2" with **Intel Haswell Xeon E5-2697 v3** processors. I am trying to find the theoretically possible maximum and the practically achievable maximum for the MEM_UOPS_RETIRED: ALL_LOADS event counter per second. Using 1 core and all 14.

I get the practically achievable value using a simple synthetic test. However, it is not clear what the theoretical maximum value per second can be obtained for both 1 core and 14 cores?

Can you please tell me this theoretical maximum value or how can I calculate it myself?

I would really appreciate any answer. This will help us evaluate the performance of applications running on our supercomputer.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The Haswell core has two load ports, so it is able to retire a maximum of 2 load uops per cycle.

The maximum single-core Turbo frequency is 3.60 GHz, so a single core can retire 7.2 billion load uops per second.

The maximum all-core Turbo frequency for "non-AVX" core is 3.10 GHz, so the 14 cores could reach a maximum of 86.8 billion load uops per second in aggregate. Running 256-bit AVX arithmetic code limits the maximum all-core Turbo frequency to 2.90 GHz, or 81.2 billion load uops per second.

The actual frequency of operation will depend on the number of active cores, the types of instructions being used, and the effectiveness of the cooling system, so it is common to report the uop retirement rate in uops per cycle, rather than uops per second, and also report the average number of active cores and their average frequencies to provide the rest of the context....

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The Haswell core has two load ports, so it is able to retire a maximum of 2 load uops per cycle.

The maximum single-core Turbo frequency is 3.60 GHz, so a single core can retire 7.2 billion load uops per second.

The maximum all-core Turbo frequency for "non-AVX" core is 3.10 GHz, so the 14 cores could reach a maximum of 86.8 billion load uops per second in aggregate. Running 256-bit AVX arithmetic code limits the maximum all-core Turbo frequency to 2.90 GHz, or 81.2 billion load uops per second.

The actual frequency of operation will depend on the number of active cores, the types of instructions being used, and the effectiveness of the cooling system, so it is common to report the uop retirement rate in uops per cycle, rather than uops per second, and also report the average number of active cores and their average frequencies to provide the rest of the context....

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page