- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey all,
I'm trying a simple text-book example on false sharing where four threads continuously write to distinct data on the same shared cache line. Using the General Exploration analysis, I used to see the "Store Bound" value flagged with a high Store Latency and False Sharing child metric.
Now, it looks like this:
Elapsed Time: 0.828s
Clockticks: 10,659,000,000
Instructions Retired: 5,616,800,000
CPI Rate: 1.898
MUX Reliability: 0.998
Front-End Bound: 5.0%
Bad Speculation: 6.1%
Back-End Bound: 72.5%
Memory Bound: 21.3%
L1 Bound: 24.2%
L2 Bound: N/A with HT on
L3 Bound: N/A with HT on
DRAM Bound: N/A with HT on
Store Bound: 0.0%
Core Bound: 51.3%
Divider: 32.1%
Port Utilization: 65.2%
Cycles of 0 Ports Utilized: 51.3%
Cycles of 1 Port Utilized: 41.0%
Cycles of 2 Ports Utilized: 21.7%
Cycles of 3+ Ports Utilized: 5.0%
Retiring: 16.4%
Total Thread Count: 5
Paused Time: 0s
Note how Store Bound got 0.0%... I can increase the workload but that doesn't change anything. The child-metrics also look OK, which makes this even more odd... Store Latency got 76.6% but is supposedly unreliably due to MUX issues or lack of PMU events. Fale Sharing metric is at 13.9% and not unreliable.
Can anyone explain what's going on here? The code for my test case is the following:
#include <cmath>
#include <iostream>
#include <thread>
#include <future>
struct Results
{
virtual ~Results() = default;
virtual unsigned int* data1() = 0;
virtual unsigned int* data2() = 0;
virtual unsigned int* data3() = 0;
virtual unsigned int* data4() = 0;
};
// the two data elements are distinct but share a single cache line
struct Unaligned : Results
{
unsigned int* data1() override
{
return &m_data1;
}
unsigned int* data2() override
{
return &m_data2;
}
unsigned int* data3() override
{
return &m_data3;
}
unsigned int* data4() override
{
return &m_data4;
}
unsigned int m_data1 = 0;
unsigned int m_data2 = 0;
unsigned int m_data3 = 0;
unsigned int m_data4 = 0;
};
// the two data elements are distinct and live on separate cache lines
struct Aligned : Results
{
unsigned int* data1() override
{
return &m_data1;
}
unsigned int* data2() override
{
return &m_data2;
}
unsigned int* data3() override
{
return &m_data3;
}
unsigned int* data4() override
{
return &m_data4;
}
unsigned int m_data1 = 0;
alignas(64) unsigned int m_data2 = 0;
alignas(64) unsigned int m_data3 = 0;
alignas(64) unsigned int m_data4 = 0;
};
void do_something(volatile unsigned int* result)
{
for (size_t i = 0; i < 100000000; ++i) {
*result += sqrt(i);
}
}
int main(int argc, char** /*argv*/)
{
// when any command line argument is given, use the aligned data, otherwise
// use the unaligned data to show the effect of false sharing
std::unique_ptr<Results> results(argc > 1
? static_cast<Results*>(new Aligned)
: static_cast<Results*>(new Unaligned));
{
const auto f1 = std::async(std::launch::async, do_something,
results->data1());
const auto f2 = std::async(std::launch::async, do_something,
results->data2());
const auto f3 = std::async(std::launch::async, do_something,
results->data3());
const auto f4 = std::async(std::launch::async, do_something,
results->data4());
}
std::cout << "random sums: " << *results->data1()
<< ", " << *results->data2()
<< ", " << *results->data3()
<< ", " << *results->data4() << '\n';
return 0;
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah, and I forgot to add: I have seen this on the following hardware, all with VTune 2017 Update 4, running Ubuntu 17.04.
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ping? Can someone please have a look at this and explain me what's going on here?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
When you are saying that you used to see it as store bound - what do you mean? Did you see it with previous VTune versions? Or on other platforms? What has changed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it was with a previous VTune version, but also potentially an older Linux kernel. I don't know the exact version anymore, sorry. Can you or anyone else reproduce this issue?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page