Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

_mm_sfence and memory barriers

David_W_11
Beginner
4,807 Views

In another thread in this forum (http://software.intel.com/en-us/forums/topic/305582), there was a comment:

The _mm_?fence thererfor serves to purposes: 1) inform the compiler of the requirement of pending reads or writes not to be moved before or after the specified fence statement. And 2) the compiler is to insert an appropriate processor fence instruction, or lacking that a function call to perform the equivilent fencing behavior.

My question is, is there an authoritative source for #1?  I have yet to find a credible reference that says that _mm_mfence generates a compiler ReadWriteBarrier.

0 Kudos
1 Solution
JenniferJ
Moderator
4,816 Views

DavidW and all,

The Intel Compiler treats the _mm_mfence, _mm_lfence, and _mm_sfence intrinsics as ReadWriteBarrier, ReadBarrier, and WriteBarrier, respectively. Hope this clears any confusions.

Thanks,

Jennifer

View solution in original post

0 Kudos
25 Replies
SergeyKostrov
Valued Contributor II
4,244 Views
>>...My question is, is there an authoritative source for #1?.. Please take a look at: 1. Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z 2. Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B & 3C): System Programming Guide 3. MSDN ... void _mm_lfence(void) Guarantees that every load instruction that precedes, in program order, the load fence instruction is globally visible before any load instruction that follows the fence in program order. void _mm_mfence(void) Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction that follows the fence in program order. ...
0 Kudos
David_W_11
Beginner
4,243 Views

Sergey Kostrov wrote:

void _mm_mfence(void)

Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction that follows the fence in program order.

Hey Sergey, thanks for the response.

I have seen those passages.  However, they don't actually answer the question.  For example, let's say that *all* you wanted to document was the MFENCE opcode.  That text would work very well for that.  Now, what if (as I believe) _mm_mfence performed both MFENCE + _ReadWriteBarrier and you wanted to document that?  That same text would work for that as well.

I see nothing in that text that excludes either case.

0 Kudos
SergeyKostrov
Valued Contributor II
4,244 Views
Hi David, Wouldn't it better to go practical? In 2012 I've created a very small test case ( just with _ReadWriteBarrier function ) and I could post it. However, I'm not sure if it will help you. So, let me know if you need the test case and I'll find it.
0 Kudos
David_W_11
Beginner
4,244 Views

What I was hoping for was some authoritative source I could quote.

As for the sample, that's an interesting question.  After pondering questions about how this works, it seems like the compiler itself must "special case" the _mm_?fence instructions to generate both the ?FENCE opcode, and (as I believe) the implicit ReadWriteBarrier associated with it.  And if the compiler is doing it, then conceivably different compilers could handle this differently.  So your sample may or may not show what you think it does, depending on exactly what compiler I'm using.

So while I appreciate your offer, I don't think your sample is going to get me what I need.

Again, if there were some authorative source that said "calls to _mm_mfence implicitly do a ReadWriteBarrier," then developers would know what to expect, and compiler writers would know what to write.  Instead we have vagueness in an area that is already notoriously complex.

I'm still holding out hope that an Intel Compiler developer may chime in here and at least describe what their product does.

0 Kudos
SergeyKostrov
Valued Contributor II
4,244 Views
>>...Again, if there were some authorative source that said "calls to _mm_mfence implicitly do a ReadWriteBarrier," >>then developers would know what to expect, and compiler writers would know what to write. Instead we have vagueness >>in an area that is already notoriously complex. I agree that sometimes documentation is too fuzzy and I think a forum related to Intel Manuals, User Guides, References, etc, would really help to improve quality of technical information. If you find some reference(s) on Intel web-site for _mm_?fence and _ReadWriteBarrier intrinsic functions take a look on that web-page for a Feedback web-link and post comments / suggestions. I know that all these posts are monitored.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,244 Views

_ReadWriteBarrier intrinsic function is a compiler statement, no different than #pragma, that assures that the compiler does not issue reads or writes that appear in source code on one side of the _ReadWriteBarrier intrinsic function from/to reads or writes that appear in source code on the other side of the _ReadWriteBarrier intrinsic function. No code is inserted (other than that that may be buffered by the compiler). Also note, compiler optimizations (IMHO) are free to remove reads or writes on either side of the _ReadWriteBarrier (provided it is permitted to do so).

I sympathize with you in that the various fences and barriers (software and hardware) should be unambiguously specified with code examples and accompanied with comments that clearly explains in terms that a layman could understand (as opposed to someone who fully understands something assumed by the writer of the example).

Jim Dempsey

0 Kudos
David_W_11
Beginner
4,244 Views

Hello Mr. Cove, I was hoping to hear from you.  Thanks for your response.

Yes, I do understand the purpose for _ReadWriteBarrier().  I don't believe a sample is appropriate on the link that Sergey sent.  Ideally what I'd like to see is for the docs to add one tiny, but hugely clarifying phrase to each of:

  • _mm_lfence: Performs an implicit _ReadBarrier()
  • _mm_sfence: Performs an implicit _WriteBarrier()
  • _mm_mfence: Peforms an implicit _ReadWriteBarrier()

Assuming that they do in fact do so. 

Consider for a moment what happens if they don't:

_mm_mfence();

_ReadWriteBarrier();

If _mm_mfence *doesn't* imply a barrier, then the compiler is free to move statements after it and before the ReadWriteBarrier.  This completely defeats the purpose of creating an MFENCE at all.

In theory you could wrap the _mm_mfence with barriers on both sides.  However compilers don't necessarily see the world like you or I would, so I would not feel 100% confident that this has the same effect.  And besides, the whole thing becomes unnecessary if (as seems likely) the Intel compiler does the sensible thing here. 

But they need to SAY they did it.  Or say they didn't and describe how people should cope.

0 Kudos
SergeyKostrov
Valued Contributor II
4,244 Views
>>...Ideally what I'd like to see is for the docs to add one tiny, but hugely clarifying phrase to each of: >> >>- _mm_lfence: Performs an implicit _ReadBarrier() >>- _mm_sfence: Performs an implicit _WriteBarrier() >>- _mm_mfence: Peforms an implicit _ReadWriteBarrier() >> >>Assuming that they do in fact do so... David, Here is the link: . http://www.intel.com/software/products/softwaredocs_feedback . and please leave your comments.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,244 Views

At issue here is not only compiler documentation rather it is also C++ standards compliance. IOW if vendor XYZ documents and implements implicit _????Barrier() for _mm_?fence() will vendor ABC implement the implicit barrier?

Due to this uncertanty, it appears that you will be required to use a macro defined in some header (you supply) that figures out just what to do dependent on compiler and processor/platform.

Jim Dempsey

0 Kudos
David_W_11
Beginner
4,244 Views

> you will be required to ... figure out just what to do dependent on compiler

It sounds reasonable when you say it fast.  However this thread illustrates the intrinsic problem with the proposal: If no one will tell me what their compiler does (via docs, forums, etc), how do you create a header?

At this point, it's become clear that Intel's chief architect for the compiler team is unlikely to happen by and describe to me how the Intel compiler handles these specific intrinsics.  I have posted a message to the form Sergey suggested.  I'm sure they'll resolve this soon.

Even though we haven't come to a resolution here, I'd like to thank both of you for your responses.

0 Kudos
jimdempseyatthecove
Honored Contributor III
4,244 Views

David, when you are unable to get an answer, construct a proper work around:

#define _MM_LFENCE _mm_lfence();_ReadBarrier
#define _MM_SFENCE _mm_sfence();_WriteBarrier
#define _MM_MFENCE _mm_mfence();_ReadWriteBarrier
...
_MM_LFENCE();

How many times have you had to define INT8 becuase of uncertainty of __int8, int8_t, int8, char?

Jim Dempsey

0 Kudos
David_W_11
Beginner
4,244 Views

Quoting myself from above:

If _mm_mfence *doesn't* imply a barrier, then the compiler is free to move statements after it and before the ReadWriteBarrier.  This completely defeats the purpose of creating an MFENCE at all.

And while I haven't run into the INT8 issue you describe, I did just bump into a case where __LONG32 isn't 32bits.

0 Kudos
SergeyKostrov
Valued Contributor II
4,244 Views
>>...And while I haven't run into the INT8 issue you describe, I did just bump into a case where __LONG32 isn't 32bits. Where have you seen this and please provide technical details on how it is defined? Long type always must be 4 bytes ( 32-bits ) for signed and unsigned values. ... typedef long _RTV64C RTlong; typedef unsigned long RTulong; ... Where _RTV64C is: ... #define _RTV64C __w64 ... CrtPrintf( RTU("\tRTlong - %2d\n"), sizeof( RTlong ) ); // 4 4 4 4 4 4 4 ... and for all 4s from left to right: - WIN32 MSC VS20xx - 4 bytes - WIN32 CE MSC VS20xx - 4 bytes - WIN32 MSC VS98 - 4 bytes - WIN32 ICC - 4 bytes - WIN32 MinGW - 4 bytes - WIN32 BCC - 4 bytes - WIN32 TCC - 4 bytes Note: 23-year-old legacy C/C++ compiler PS: David, I've spent lots of time on making that stuff as portable and as compatible as possible in some software. So, if you saw __LONG32 isn't 32bits then this is very wrong.
0 Kudos
David_W_11
Beginner
4,244 Views

Long type always must be 4 bytes ( 32-bits ) for signed and unsigned values.

While this is true in the Windows world, other OSs running on the i386 (like linux) have chosen different paths.  For example check out the first answer on (http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows).  It gives an excellent explanation about the differences between linux's LP64 and MS's LLP64.

0 Kudos
SergeyKostrov
Valued Contributor II
4,244 Views
>>...While this is true in the Windows world, other OSs running on the i386 (like linux) have chosen different paths... This is Not just for Windows OSs, Desktop or Embedded, and you're mixing different types, that is long and long int. >>...differences between linux's LP64 and MS's LLP64... Microsoft doesn't use LLP64 at all (!) and you're talking about different types for a 64-bit "world". Here is a quote from GCC docs: ... `__LP64__' `_LP64' These macros are defined, with value 1, if (and only if) the compilation is for a target where 'long int' and pointer both use 64-bits... ...
0 Kudos
David_W_11
Beginner
4,244 Views

you're mixing different types, that is long and long int.

This code:

   printf("%d\n", sizeof(long));

Will print 4 whether it is compiled with 32bit or 64bit versions of MSVC.  However, on 64bit linux, you get 8.  There are all kinds of articles that describe this fact (for example http://www.unix.org/version2/whatsnew/lp64_wp.html).

Microsoft doesn't use LLP64 at all

They say they do: http://msdn.microsoft.com/en-us/library/windows/desktop/aa384083%28v=vs.85%29.aspx

There's also wikipedia (http://en.wikipedia.org/wiki/LLP64#64-bit_data_models).

0 Kudos
SergeyKostrov
Valued Contributor II
4,244 Views
>>...Will print 4 whether it is compiled with 32bit or 64bit versions of MSVC... This is absolutely correct for all Windows platforms and for 8 different C++ compilers I use.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,244 Views

We are drivting of the topic of this thread (_mm_sfence and memory barriers).

btw>>Long type always must be 4 bytes ( 32-bits ) for signed and unsigned values.

From Turbo C User's Guide V 2.0 (1988):

"The actual sizes of short, int, and long are dependent upon the implementation; all that C guarantees is that the a variable of type short will not be larger than (that is, will not take up more bytes) than one of type long. In Turbo C, these types occupy 16 bits (short), 16 bits (int), and 32 bits (long)."

Granted, ~1988 this was refering to a compiler targeted to 16-bit platform (re int being 16-bits). The point being in C, the only restriction relating to sizes was type short had to be fewer bytes than type long. No guarantee on sizes at all.

Jim Dempsey

0 Kudos
levicki
Valued Contributor I
4,244 Views

It is important to understand that x86 fencing instructions deal with architectural state preventing the CPU to execute reads and writes on both sides of the instruction out of specified program order -- they do not imply or guarantee in any way that read and write instructions won't be reordered by the compiler.

However, I do agree with you that intrinsic _should_ imply barrier for the compiler code reordering and I suggest submitting a feature request to Premier Support. My personal opinion is that Intel won't change it because they are blindly copying Microsoft in every compiler feature be it good or bad.

0 Kudos
David_W_11
Beginner
3,584 Views

It is important to understand

Actually, I do understand this.  The MFENCE opcode only affects the processor, not the compiler.  However, the (unanswered) question is: What does _mm_mfence affect?  Conceivably it could just be a wrapper for MFENCE.  Or it could also imply a barrier.  Since this is done as a compiler intrinsic every place I've seen, only the compiler writers can say what they did.

Since I started this thread, I have been talking with the GCC people to see what they do.  It turns out that they do an implicit _ReadWriteBarrier with their _mm_mfence (although you have to look hard to find it).

I asked in the MS forum about what they do, but received no useful answers.  I have since opened a bug (https://connect.microsoft.com/VisualStudio/feedback/details/790233/mm-mfence-needs-to-perform-implicit-readwritebarrier) saying that _mm_mfence should have a barrier.  I said that the reason I don't believe it does is that the docs don't say it does.  If (as code inspection suggests) they really do the right thing here, hopefully this will turn into a doc bug.

I do agree with you that intrinsic _should_ imply barrier

I thank you for this.  Not everyone I've talked to seems to feel this way, but to me it's obvious.

I suggest submitting a feature request to Premier Support.

I am not currently on any paid support program with Intel.  Would they accept feature requests from me?  Link?  Since I'm hoping this is just a simple doc omission, I have sent a doc request to the link Sergey provided.  No response so far.

Intel won't change it because they are blindly copying Microsoft

Actually, this may be a good thing, since MS *appears* to be doing the right thing here.  But even if everyone is doing the wrong thing, writing down what you've done seems like the least you could do.

0 Kudos
Reply