- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a) When in Read ( vers2 == vers1 ) you will have a consistant load of m_a...m_d
(including the initialuninitialized data)
b) Read may experience drop outs. e.g. Read m_vers1==1, m_verse1==3 (miss sequence 2)
c) Multiple readers may read the same packet
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a) When in Read ( vers2 == vers1 ) you will have a consistant load of m_a...m_d
(including the initialuninitialized data)
b) Read may experience drop outs. e.g. Read m_vers1==1, m_verse1==3 (miss sequence 2)
c) Multiple readers may read the same packet
Jim Dempsey
a) In the code above, I omitted parts not related to the subject.First write is always present (initializing). Spinning/sleep is also present in Read cycle. Writes are rare (1 per second), reads arerelatively frequent (tens/hundreds per second).
b), c) It is correct by design.
Maybe you have something to say on the subject?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]class X { protected: volatile DWORD m_vers1; volatile DWORD m_a; volatile DWORD m_b; volatile DWORD m_c; volatile DWORD m_d; volatile DWORD m_vers2; public: X() : m_vers1( 0 ) , m_vers2( 0 ) {} void Write( DWORD a, DWORD b, DWORD c, DWORD d ) { while( m_vers1 != m_vers2 ) continue; m_a = a; m_b = b; m_c = c; m_d = d; ++m_vers1; } void Read( DWORD& ra, DWORD& rb, DWORD& rc, DWORD& rd ) { for ( ;; ) { while( m_vers1 == m_vers2 ) continue; DWORD vers1 = m_vers1; ra = m_a; rb = m_b; rc = m_c; rd = m_d; DWORD vers2 = vers1 - 1; // one behind // swap only if verse2 one behind verse1 if ( CAS(&m_vers2, verse1, &verse2 ) ) break; } } };[/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
onmany platforms.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp] void Write( DWORD a, DWORD b, DWORD c, DWORD d ) { while( m_vers1 != m_vers2 ) continue; m_a = a; m_b = b; m_c = c; m_d = d; ++m_vers1; } void Read( DWORD& ra, DWORD& rb, DWORD& rc, DWORD& rd ) { for ( ;; ) { while( m_vers1 == m_vers2 ) continue; DWORD vers1 = m_vers1; ra = m_a; rb = m_b; rc = m_c; rd = m_d; DWORD vers2 = vers1 - 1; // one behind // swap only if verse2 one behind verse1 if ( CAS(&m_vers2, verse1, &verse2 ) ) break; } } };[/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
onmany platforms.
Best regards,
Sergey
My variables are located in a shared memory. Writer and readers run in different processes.Critical sections are not an interprocess primitives. Second, critical sections are not effective in a single-writer-multiple-readers tasks.
So, I had a choice: implement swmr-interprocess critical section, or think up something simpler for this particular (and similar) tasks...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your initial statement was "...fromdifferent threads...". It wasn't clear that aWriter and Readers are working
in different processes. Thanks for clarification.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your initial statement was "...fromdifferent threads...". It wasn't clear that aWriter and Readers are working
in different processes. Thanks for clarification.
Sorry, when I wrote the post, then focused on the out-of-order execution, not the nuances of use. Code above is a sample only and is intended for better understanding. I am interested in the out-of-order executionprinciple itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a) The writer could write multiple times before a reader could read. Eeven though you have multiple readers and the chance is low for this to happenit will still happen. Possible causes for missed reads (or bunged up data) are: Process time for all readers exceed production time forwriter. Reader threads get preempted or suspended (e.g.page file expansion, I/O block time, etc...).This causes missed packets.
b) The writer could begin a second write while the reader is in the process of reading the prior write. This cause bunged up data packets.
c) Multiple threads could have read the same or bunged up packet.
The code example I provided earlier avoids these issues.
Using a critical section is heavier weight. However, a critical section does contain a SpinWait, which in this case does you no good (you have a polling structure).
To incorporate thread suspension, you will need to use a condition variable in Linux or Event in Windows. This will add overhead in the writer and latency/overhead in the reader. Since you state your write rate is at relativelylong intervals, it would seem like use of condition variable or event should be investigated. Note, if you are using a threading toolkit, it may have alternate means to trigger (enqueue) a reader task.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a) The writer could write multiple times before a reader could read. Eeven though you have multiple readers and the chance is low for this to happenit will still happen. Possible causes for missed reads (or bunged up data) are: Process time for all readers exceed production time forwriter. Reader threads get preempted or suspended (e.g.page file expansion, I/O block time, etc...).This causes missed packets.
b) The writer could begin a second write while the reader is in the process of reading the prior write. This cause bunged up data packets.
c) Multiple threads could have read the same or bunged up packet.
The code example I provided earlier avoids these issues.
Using a critical section is heavier weight. However, a critical section does contain a SpinWait, which in this case does you no good (you have a polling structure).
To incorporate thread suspension, you will need to use a condition variable in Linux or Event in Windows. This will add overhead in the writer and latency/overhead in the reader. Since you state your write rate is at relativelylong intervals, it would seem like use of condition variable or event should be investigated. Note, if you are using a threading toolkit, it may have alternate means to trigger (enqueue) a reader task.
Jim Dempsey
I understand that under the concept of "bunged up" you mean lost (no one read) messages, not the a-b-c-d data inconsistency. So:
a) It is correct by design.
b) It is correct by design.
c) It is correct by design.
Do you even read my last response to you?I am interested in aout-of-order processorexecution principle, not in particular data exchange model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a(at t=1), b(at t=1), c(at t=2), d(at t=2)
The above would occue if thet=2 write occured while reader waspart way through the read of t=1 packet.
The IA32 and Intel64 architecture, upto but not necessarily including the Transactional Memory Extensions (22mm Haswell feature) is designed to preserve write ordering (although adjacent writes to same cache line may be write combined). Should your a and b reside in one cache line and, c and d reside in a different (next) cache line then you should observe a&b updated first, c&d updated second.
With Haswell TSX/HLE, all the updates within the "protected" section will appear to the other threadsto have been updated simulteaneously (e.g. memory bus lock, multiple writes, memory bus unlock).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]class X { protected: volatile DWORD m_vers1; volatile DWORD m_a; volatile DWORD m_b; volatile DWORD m_c; volatile DWORD m_d; volatile DWORD m_vers2; Lock_t ReadLock; // Lock_t provided in toolkit X() : m_vers1( 0 ) , m_vers2( 0 ) { } void Write( DWORD a, DWORD b, DWORD c, DWORD d ) { // assure reader has finished reading prior write while( m_vers1 != m_vers2 ) continue; m_a = a; m_b = b; m_c = c; m_d = d; ++m_vers1; // now one ahead of m_verse2 } void Read( DWORD& ra, DWORD& rb, DWORD& rc, DWORD& rd ) { for ( ;; ) { { // Scope for LockLock LockLock lock(ReadLock); // returns from ctor when we have lock // i.e. entered critical section // test for writer next write if( m_vers1 != m_vers2 ) { // have new data ra = m_a; rb = m_b; rc = m_c; rd = m_d; ++m_verse2; // tell writer we have packet return; // dtor of LockLock lock(ReadLock) releases lock } // if( m_vers1 != m_vers2 ) } // End scope for LockLock, releases lock } // for(;;) } // Read };[/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For additional information look at:
http://software.intel.com/en-us/articles/choosing-between-synchronization-primitives/
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a(at t=1), b(at t=1), c(at t=2), d(at t=2)
The above would occue if thet=2 write occured while reader waspart way through the read of t=1 packet.
To protect from the situation you described,m_vers1/m_vers2 variables are used. Ifa(at t=1), b(at t=1), c(at t=2), d(at t=2), then vers2 will not be equal to vers1. Therefore, Reader will do one more (or several) iteration of the loop, till the data consistency: a(at t=N), b(at t=N), c(at t=N), d(at t=N)and, of course,vers1(at t=N), vers2(at t=N).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I want to avoid the use of synchronization primitives. For this task, I think they are redundant.
By the way, I learned that the instructions prefixed with lock (which are used within a critical section), are also amemorybarrier (Intel Architectures Software Developer's Manual, Volume 3). And thus guarantee the correct order of execution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I want to avoid the use of synchronization primitives. For this task, I think they are redundant.
By the way, I learned that the instructions prefixed with lock (which are used within a critical section), are also amemorybarrier (Intel Architectures Software Developer's Manual, Volume 3). And thus guarantee the correct order of execution.
Thinking further, I came to conclusion: in order to insure against the misunderstanding of my intentions by the compiler, codecan be modified like this:
[cpp]class X { protected: volatile DWORD m_vers1; volatile DWORD m_a; volatile DWORD m_b; volatile DWORD m_c; volatile DWORD m_d; volatile DWORD m_vers2; public: X() : m_vers1( 0 ) , m_vers2( 0 ) {} void Write( DWORD a, DWORD b, DWORD c, DWORD d ) { _InterlockedIncrement( reinterpret_cast< volatile LONG* >( &m_vers2 ) ); m_a = a; m_b = b; m_c = c; m_d = d; _InterlockedIncrement( reinterpret_cast< volatile LONG* >( &m_vers1 ) ); } void Read( DWORD& ra, DWORD& rb, DWORD& rc, DWORD& rd ) { for ( ;; ) { DWORD vers1 = (DWORD)_InterlockedCompareExchange( reinterpret_cast< volatile LONG* >( &m_vers1 ), 0, 0 ); ra = m_a; rb = m_b; rc = m_c; rd = m_d; DWORD vers2 = (DWORD)_InterlockedCompareExchange( reinterpret_cast< volatile LONG* >( &m_vers2 ), 0, 0 ); if ( vers2 == vers1 ) break; } } };[/cpp] But it adds the processor bus locks and thus reduces productivity...- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you wish, you can insert a _mm_sfence() following the ++ in the Write function. The affects this may have on your operation are: Write will be a tad slower, Read will (may) observe the ++ earlier. In the scheme of things, I think you are looking for throughput not latency between write and read.
The CAS (InterlockedCompareExchange) need only be performed once in your Read. Capture the before value of m_verse1, read the data, CAS the m_vers2 with verse1-1 to verse1. if true then return, else loop.
[cpp]for ( ;; ){ DWORD vers1 = m_vers1; ra = m_a; rb = m_b; rc = m_c; rd = m_d; if(_InterlockedCompareExchange( reinterpret_cast< volatile LONG* >( &m_vers2 ), // loc verse1, // xchg verse1-1) // cmp == verse1-1) // == cmp val break; } [/cpp]Here you could optionally insert an _mm_rfence() in front of the capture of verse1.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In this particular task either bandwidth or latency are not the main criteria.I'm concerned about the correctness of the implementation (data consistency) and the absence of synchronization primitives.
I think the topic is closed.Thanks to all!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page