Solved: Quote:John D. McCalpin wrote:

Fabio_F_1 · ‎12-09-2014

The manual says that memory writes up to 8 bytes are atomic if aligned.

I ran some multi-threaded tests on a Haswell that seem to indicate that 16/32-byte writes are also atomic when using SSE/AVX intrinsics properly.

So, assuming the memory locations are 16/32 byte aligned, and you are using a single SSE/AVX store instruction, in what cases would the write not be atomic?

McCalpinJohn · ‎12-12-2014

I would assume that Intel would decline to make any guarantees of atomicity in these cases. Providing a guarantee would not provide any direct benefits to Intel, but might result in costs to Intel.

One potential cost is the risk that users will build code depending on this behavior, which may break in future processors. This would require additional support and/or damage Intel's reputation in the market and/or push Intel to incur the expense of supporting the feature in future processors.

Another potential cost is that any official statement might (in combination with other public statements) reveal microarchitectural details that may embolden patent trolls. Patent infringement cases are expensive even if you win.

For this particular issue I don't think Intel would be taking a large risk in making a definitive statement, but this is just one issue of a great many technical issues with similar risks and (lack of) rewards. Intel appears to be well disciplined in avoiding these potential costs.

View solution in original post

McCalpinJohn · ‎12-09-2014

Given the publicly available information on Haswell's L1 Data Cache implementation, it certainly seems plausible that aligned 16-Byte and 32-Byte stores will be executed atomically.

Intel's comments in Section 8.1.1 of Volume 3 of the Intel Architectures Software Developer's Manual (document 325384-052, September 2014) should not be read as a statement that no other operations can be atomic on a particular platform -- instead it should be read as a statement that no other operations are guaranteed to be atomic across all platforms.

Fabio_F_1 · ‎12-12-2014

John D. McCalpin wrote:

Given the publicly available information on Haswell's L1 Data Cache implementation, it certainly seems plausible that aligned 16-Byte and 32-Byte stores will be executed atomically.

Intel's comments in Section 8.1.1 of Volume 3 of the Intel Architectures Software Developer's Manual (document 325384-052, September 2014) should not be read as a statement that no other operations can be atomic on a particular platform -- instead it should be read as a statement that no other operations are guaranteed to be atomic across all platforms.

Thanks John. Any suggestions on how to get an official answer for a platform (e.g. Haswell i7-4770 for 32-byte writes and Nehalem Xeon X5680 for 16-byte writes)?

McCalpinJohn · ‎12-12-2014

I would assume that Intel would decline to make any guarantees of atomicity in these cases. Providing a guarantee would not provide any direct benefits to Intel, but might result in costs to Intel.

One potential cost is the risk that users will build code depending on this behavior, which may break in future processors. This would require additional support and/or damage Intel's reputation in the market and/or push Intel to incur the expense of supporting the feature in future processors.

Another potential cost is that any official statement might (in combination with other public statements) reveal microarchitectural details that may embolden patent trolls. Patent infringement cases are expensive even if you win.

For this particular issue I don't think Intel would be taking a large risk in making a definitive statement, but this is just one issue of a great many technical issues with similar risks and (lack of) rewards. Intel appears to be well disciplined in avoiding these potential costs.

Could Intel confirm if Haswell can write 16/32 byte atomically?