Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Lockfree_mpmc and lockfree ParallelQueue ...

aminer10
Novice
7,283 Views


Hello all,

As you will notice, even inside NASA and other institutions
in France (Ariane...) are making errors in there programs...

Read this:

"Although unexpected computer resets and human errors
continue to frustrate NASA/JPL scientists, both Pathfinder
and Sojourner... Current priorities involve fixing
the software bug that is causing Pathfinder to reset its
computer"

"The trouble experienced by the Mars lander "Mars Pathfinder"
is a classic example of problems caused by priority inversion
in realtime systems"

And don't forget about the french Ariane 5 rocket explosion:

http://www.youtube.com/watch?v=z-r9cYp3tTE

So, i have just updated and corrected a problem with
lockfree_mpmc, if you take a look at lockfree_mpmc.pas
inside my lockfree Parallel Queue zip file:

http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm

You will notice that i have modified the algorithm in the push()
side, and i have used a test like this:

if getlength >= fsize
then
begin
result:=false;
exit;
end;


Now i have tested my new algorithm and it is working very well.

Correctness:

To not make the correctness verification longer, i will concentrate on
the more important parts. If you take a look at the lock-free ParallelQueue
source code , inside lockfree_mpmc.pas you will read this in push side:

function TLockfree_MPMC.push(tm : tNodeQueue):boolean;//stdcall;
var lasttail,newtemp:longword;
begin
if getlength >= fsize
then
begin
result:=false;
exit;
end;
result:=true;
//newTemp:=windows.interlockedincrement(temp);
newTemp:=LockedIncLong(temp);

lastTail:=newTemp-1;
setObject(lastTail,tm);
repeat
if CAS(tail[0],lasttail,newtemp)
then
begin
exit;
end;
sleep(0);
until false;
end;


So, let's say the size of the bounded queue is 1000 , and imagine
that the threads are executing the "if getlength >= fsize " all at the
same time, and imagine that the getlength returns 999, so, the
"if getlength >= fsize" will returns false , and since we have
fSize:=(1 shl aPower) - margin , with a margin of 1000
(margin must be >= to the number of cores) , there will be no
problem(overflow..)...


Now in the pop side, you will read this:

function TLockfree_MPMC.pop(var obj:tNodeQueue):boolean;

var lastHead : longword;
begin
repeat
lastHead:=head[0];
if tail[0]<>head[0]
then
begin
obj:=getObject(lastHead);
if CAS(head[0],lasthead,lasthead+1)
then
begin
result:=true;
exit;
end;
end
else
begin
result:=false;
exit;
end;
until false;
end;


So, as you have noticed, there is a test like this:
if CAS(head[0],lasthead,lasthead+1) , and this test
avoids something like the ABA problem, cause if head
have changed , the CAS() will fail.

I have updated my Threadpool engine with the new lockfree_mpmc,
please download my other modules that use lockfree_mpmc.


Sincerely
Amine Moulay Ramdane.

0 Kudos
60 Replies
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10

When we execute something like a lock instruction in assembler
does the OScan deschedulethe thread even if it isexecuting
inside the lockassembler instruction inside the CAS ?

Nope. Os can interrupt either before LOCK or after LOCK. Modern architectures does not feature interrupts on micro-op level.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10

Dmitry wrote:
>For example, in C/C++ plain loads and stores provide neither atomicity nor memory ordering

I will ask the compiler designer...

When the variables are aligned in 32 bit , are the loads and stored not atomic ?


>memory ordering

Is the CAS (in assembler) that i am usingnot sufficient?
is it necessaryto use the interlocked functions provided by the Windows OS ?


Amine.

> When the variables are aligned in 32 bit , are the loads and stored not atomic ?

It depends on generated machine code.

> Is the CAS (in assembler) that i am usingnot sufficient?

CAS will be sufficient if you use it to do all loads and stores.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10

Must i use a fence inside my CAS in assembler to force the ordering ?

Most likely yes

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10
Am i correct or not, Dmitry?

No, you are not. You are not coding in assembly, so all that does not apply to you. You are coding in Pascal and should consult with Pascal guarantees.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10

Hello,

What is important for us is MEMORY VISIBILITY across threads..

There is nothing you can do with visibility on cache-coherent hardware - it's just as it is. You can only control ordering (that is mutual order of memory accesses as they are seen by different threads).

0 Kudos
aminer10
Novice
1,614 Views

Dmitriy wrote:
>Nope. Os can interrupt either before LOCK or after LOCK. Modern architectures does not feature interrupts >on micro-op level

I mean if you execute something like

1- lock cmpxchg [ecx],edx

or


2- LOCK XADD [ECX], EAX


canthe OSstill deschedule the threadsif it executing
1 or 2 ?




Amine Moulay Ramdane.




0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10

Dmitriy wrote:
>Nope. Os can interrupt either before LOCK or after LOCK. Modern architectures does not feature interrupts >on micro-op level

I mean if you execute something like

1- lock cmpxchg [ecx],edx

2- LOCK XADD [ECX], EAX

canthe OSstill deschedule the threadsif it executing
1 or 2 ?

OS can't de-schedule a thread in the middle of ANY instruction (only before or after, but not in between).

0 Kudos
aminer10
Novice
1,614 Views

Dmitriy wrote:
>OS can't de-schedule a thread in the middle of ANY instruction (only before or after, but not in between).



Thank you Dmitriy.



Amine Moulay Ramdane.


0 Kudos
aminer10
Novice
1,614 Views


Hello Dmitriy,



I was thinking more about this, and here it is:

LOCKed instruction includes previous and subsequent full memory fences.


But LOCK may not synchronize non-temporal stores and WC-memory:

Read this:

--

WC is a weakly ordered memory type. System memory locations are not

cached and COHERENCY is NOT ENFORCED by the processor's bus coherency protocol.

Speculative reads are allowed. Writes may be delayed and

combined in the write combining buffer to reduce memory accesses..

What does this really mean? Writes to WC memory are not cached in the typical sense of the

word cached. They are delayed in an internal buffer that is separate from the internal L1 and

L2 caches. The buffer is not snooped and thus does not provide data coherency. The write

buffering is done to allow software a small window of time to supply more modified data to the

buffer while remaining as non-intrusive to software as possible.

----



And about non-Temporal stores read for example this:

---


Non-Temporal SSE instructions (MOVNTI, MOVNTQ, etc.), don't follow

the normal CACHE-COHERENCY rules. Therefore non-temporal stores must

be followed by an SFENCE instruction in order for their results to be seen by other processors in a timely fashion.


---


And

---

Opcode SFENCE


Instruction: SFENCE


Description:

Guranteed that every store instruction that precedes in program

order the store fence instruction is globally visible before any

store instruction follows the fence is globally visible.

---


So, Dmitiy, what do you think if i add in lockfree_mpmc.pas
source code an sfence before the LOCK , like this, i think
it will work:

asm

mov ecx,Target

mov edx,NewValue

mov eax,Comperand

sfence

lock cmpxchg [ecx],edx

JNZ @@2

MOV AL,01

JMP @@Exit

@@2:

XOR AL,AL

@@Exit:

end;

and

function LockedIncLong(var Target: longword): Integer;

asm

{$IFDEF CPU32}

// --> EAX Target

// <-- EAX Result

MOV ECX, EAX

MOV EAX, 1

sfence

LOCK XADD [ECX], EAX

INC EAX

{$ENDIF CPU32}

{$IFDEF CPU64}

// --> RCX Target

// <-- EAX Result

MOV EAX, 1

LOCK XADD [RCX], EAX

INC EAX

{$ENDIF CPU64}

end;


Amine Moulay Ramdane.






0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
I think that it's the responsibility of a guy who issues non-temporal stores/uses WC memory to issue trailing SFENCE.
I was talking about the following thing. For example, in size() you issue 2 separate memory loads for tail and head - they are unordered, most likely compiler is allowed to emit them in reverse order in machine code.
0 Kudos
aminer10
Novice
1,614 Views

Dmitriy wrote
>think that it's the responsibility of a guy who issues non-temporal stores/uses WC memory to issue trailing >SFENCE.

Anyway, I have added an sfence before the lock, and it doesn't change
anything to the performance (i have tested it)

>I was talking about the following thing. For example, in size()
>you issue 2 separate memory loads for tail and head - they
>are unordered, most likely compiler is allowed to emit them in reverse order in machine code

In wich method , getlength?

Butin the Intel x86 memory model, Loads are not reordered with other loads.

What do you mean ?


Amine Moulay Ramdane.



0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10






Amine Moulay Ramdane.



> In wich method , getlength?

yes.

> Butin the Intel x86 memory model, Loads are not reordered with other loads. What do you mean ?

Once again, you are not coding in x86 assembly so these rules has nothing to do with your situation. You are coding in Pascal, does Pascal guarantees that loads are not reordered with other loads? I think that not.

Consider, Pascal compiler reverses the loads in machine code, then x86 processor COMPLETELY HAS NO MEANS to restore original program order, it does even not know what it was.


0 Kudos
aminer10
Novice
1,614 Views


>> In wich method , getlength?

>yes.


Ok

> Butin the Intel x86 memory model, Loads are not reordered with other loads. What do you mean ?

>Once again, you are not coding in x86 assembly so these rules has nothing to do with your situation. You >are coding in Pascal, does Pascal guarantees that loads are not reordered with other loads? I think that not.

>Consider, Pascal compiler reverses the loads in machine code, then x86 processor COMPLETELY HAS NO >MEANS to restore original program order, it does even not know what it was.



Are we not getting too schizophrenic here ?


But imagine that the too loads are reordered, wthat will be the problem in getlength() ?

i don't thing that there will be a problem...

Do you see anyproblem ifthe compiler reorders the loads ?



Amine Moulay Ramdane.




0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
> Are we not getting too schizophrenic here ?

No. Optimizing compilers do that on a regular basis.

> Do you see anyproblem ifthe compiler reorders the loads?

There are several such loads and stores, not only that two. It's basically impossible to analyze for a human.

0 Kudos
aminer10
Novice
1,614 Views

>> Are we not getting too schizophrenic here ?

>No. Optimizing compilers do that on a regular basis.


So i will take a look at the assembler source code of Freepascal
and delphi and see what's going on ..

But even if i take a look at the current assembler source code of
freepascal and delphi, does thatwarranty that it will be ok
with future Delphi and Freepascal compilers ?



Amine Moulay Ramdane.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10
But even if i take a look at the current assembler source code of
freepascal and delphi, does thatwarranty that it will be ok
with future Delphi and Freepascal compilers ?

Consider, you want to cross a highway. You look at one direction, in another - no cars. You safely cross the highway. Now, is it safe to cross the highway everyday w/o looking in both directions?

0 Kudos
aminer10
Novice
1,614 Views

Dmitriy wrote:
>Consider, you want to cross a highway. You look at one direction,
>in another - no cars. You safely cross the highway. Now, is it safe
>to cross the highway everyday w/o looking in both directions?

I understand.

And don't *FORGET*to look at both directions, before crossing a highway !

But even if you say it Dmitriy, is it still safe at 100%? i mean is it
possible to be sure at 100% that we will not *FORGET*
to look at both directions ? :)


Amine Moulay Ramdane.





0 Kudos
aminer10
Novice
1,614 Views

I wrote:
>I understand.

>And don't *FORGET*to look at both directions, before crossing a highway !

>But even if you say it Dmitriy, is it still safe at 100%? i mean is it
>possible to be sure at 100% that we will not *FORGET*
>to look at both directions ? :)

>Amine Moulay Ramdane.


But anyway,I am here and WEare here to REMEMBERourselves,
and one another,tonot*FORGET* to look at both directionsina highway,
rigth ?


Take care Dmitriy, you are such a nice guy...


Amine Moulay Ramdane.










0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,614 Views
Quoting aminer10

And don't *FORGET*to look at both directions, before crossing a highway !

But even if you say it Dmitriy, is it still safe at 100%? i mean is it
possible to be sure at 100% that we will not *FORGET*
to look at both directions ? :)

Well, taking into account that there are bugs in compilers, OS and even in hardware, it's impossible to be 100% safe. However at least one can make his best to ensure that there are no problems on *his* side. The good way to do this, for example, in C/C++ world is to stick to guarantees provided by ISO C/C++ and POSIX standards; then at least possible problems are on their side (for example, implementers of C/C++ language compiler).


0 Kudos
Reply