- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I write some code with OpenMP. I use the critical construct in my code, so that a specific storage location will not be simultaneously updated by more than one thread.
I run my code on Supermicro X7QC3 ServerBoard, with 4 quad core intel xeon E7320 processors. The OS is linux. The compiler is gcc 4.3.0.
The main part of my code is following:
#pragma omp parallel for private(j, k, s, e)
for ( i = 0; i < N; i++)
{
...
for ( k = s ; k < e; k++)
{
j = n
...
...
#pragma omp critical
{
m += ... ;
m
}
}
}
But one of my friends says : "In Intel platforms, hardware maintains the memory coherence. It does not need atomic or critical in the code."
So I have this question that should I use critical construct for reduction on the array(m[]) in the code.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's a difference between memory coherence and data races. Intel processors do have a cache coherence protocol that assures predictable updates of memory, but they can't protect programmers from their owncoding errors.
In the example you provided the omp parallel for will partition its workerthreadsacross the span of i so in the innermost block you could probably get away without the critical section for the references to m: each worker thread would operate on its own chunk of m[] and while there may be a little cache flipping where the thread partition boundaries don't line up with the cache line boundaries, there should be little problem.
However, the critical section also modifies m
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I make my code only with compiler option ( -O3 ).
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's a difference between memory coherence and data races. Intel processors do have a cache coherence protocol that assures predictable updates of memory, but they can't protect programmers from their owncoding errors.
In the example you provided the omp parallel for will partition its workerthreadsacross the span of i so in the innermost block you could probably get away without the critical section for the references to m: each worker thread would operate on its own chunk of m[] and while there may be a little cache flipping where the thread partition boundaries don't line up with the cache line boundaries, there should be little problem.
However, the critical section also modifies m
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,Robert Reed (Intel), Thank you very much!
There are morethan onethreads that will update the same m
But, Idoes not really understandcache coherence protocal, I only know that Intel use MESI protocol which can maintain the cache coherence. Ifone datahave been updated by core A, then the data on Core B will be notified.
Is that MESI can't do work formultiple sockets?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,Robert Reed (Intel), Thank you very much!
There are morethan onethreads that will update the same m
But, Idoes not really understandcache coherence protocal, I only know that Intel use MESI protocol which can maintain the cache coherence. Ifone datahave been updated by core A, then the data on Core B will be notified.
Is that MESI can't do work formultiple sockets?
Thanks!
Hi,
If you are sure that multiple threads will access the same memory location through m
While both can be used for mutual exclusion, there are key differences:
- The critical construct can protect any sequence of code enclosed by curly braces. Internally, it takes a lock and ensures that only one thread enters the protected code sequence.
- The atomic construct can only protect simple updates of memory locations (e.g. read, add, write). If two threads access a different memory location through m
both may run without synchronizing. Only if two threads access the same memory location (read access m with the same value for j), atomic will ensure mutual exclusion.
So, my advice would be to use atomic instead of critical.
Cheers
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But, Idoes not really understandcache coherence protocal, I only know that Intel use MESI protocol which can maintain the cache coherence. Ifone datahave been updated by core A, then the data on Core B will be notified.
Is that MESI can't do work formultiple sockets?
MESI is designed to handle multiple memory masters, but rather than me regurgitating details on MESI, I recommend you read the Wikipedia article. Michael has given you good advice on critical versus atomic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider changing
#pragma omp critical
{
m
}
To:
m += ... ;
#pragma ompatomic
m
or something like this:
#pragma omp parallel private(j, k, s, e)
{
mType* _m = (mType*)_alloca(sizeof(mType)*sizeFor_m); // must fit on stack
for(int q = 0; q < sizeFor_m; ++q)
_m = 0;
#pragma omp for
for ( i = 0; i < N; i++)
{
...
for ( k = s ; k < e; k++)
{
j = n
...
...
m += ... ;
_m
}
#pragma omp critical
{
for(int q = 0; q < sizeFor_m; ++q)
m += _m
;
}
}
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know your means. Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a good idea to use private array.
Thank you very much!
zhouyi1999
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert and Michael,
Though my question is not related to this post, I thought you guys might be in a good position to help me.
I am a graduate student at UT, Austin currently working on cache algorithms in multicore architecture. I am in need of a few data points from Intel for my paper. Could you please help me with these questions?
1. Does Intel ensure inclusion property at L2 and L3 levels of caches in its latest multicore processors?
2. How does Intel solve the cache coherency problem? I got to learn about using the QuickPath technology. But, not sure how exactly that helps.
Thanks,
Anil.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last level cache (L3) in the Intel Core i7 is inclusive, as you might have been able to glean from a quick scan of Part 1 of the System Programming Guide (available here):
Intel Architecture uses the MESI protocol to ensure cache coherency, which is true whether you're on one of the older processors that use a common bus to communicate or using the new Intel QuickPath point-to-point interconnection technology. (The SPG also has a section on the MESI protocol as implemented in IA.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Robert,
The information you provided was very valuable.
Do you have any insights on the implementation of inclusion property? Or can you point to to resources which talk about this?
As I understand block requests which incur cache miss at L1 go as requests to L2 and those which incur miss at L2 go as requests to L3. Is my understand correct w.r.t Intel architectures? (Say, Nehalem)
L3 is made inclusive in order to prevent L1 and L2 from wasting resources snooping the main memory. Right? When MESIF protocol takes care of coherency, is there any need for snooping at any level of cache?
If some one writes into a location in main memory, it should be L3 right? In that case, why should L3 snoop?
Thanks Again Robert!
- Anil.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert,
I really appreciate your reponse. I am getting a clear picture now. I had created a new thread for my questions here http://software.intel.com/en-us/forums/showthread.php?t=72135 but, I couldn't get any response there. I guessed that you guys would have subscribed to this thread and you guys seemed super resourceful. That made me intrude this thread :)
Anyways, I'll post my response to your reply at the other location. Please help me out with a few more details.
Thanks,
Anil.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page