Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Is Thread Checker telling the truth?

richard_knight
Beginner
1,755 Views

Hi,

I have a function that's called a lot from within some muiltthreaded code that looks roughly like this (hopefully not too mangled by the web form to read!):

Thing* get_thing()

{

lock();

if (g_references == 0)

g_thing = new Thing();

g_references++;

unlock();

return g_thing;
}

There's also an equivalent release_thing function that decrements the reference count and deletes it on removing the last reference. The lock and unlock are proving to be a rather large overhead. So as an attempt to fix this I came up with this:

Thing* get_thing()

{

bool failed;

do

{

failed =false;

LONG refs =InterlockedCompareExchange(&g_thing_refs, 0, 0);

if (refs== 0)

{

lock();

if (InterlockedCompareExchange(&g_thing_refs, 0, 0) == 0)

g_thing = new Thing();

InterlockedIncrement(&g_things_refs);

unlock();

}

else

{

if (InterlockedCompareExchange(&g_thing_refs, refs + 1, refs) != refs)

failed = true; // Need to try again...

}

} while (failed);

return g_thing;

}

This appears to be working great, and has sped things up no end, however, when I run it thru Thread Checker it tells me there's a race condition between the new Thing() line and the return statement at the end. I can't for the life of me see why tho...

Am also not convinced I need the compare-exchange in:

refs = InterlockedCompareExchange(&g_thi ng_refs, ...)

just to read the value of the variable (reading it should be atomic I thought?), but if I change it to

refs = g_thing_refs;

then again Thread Checker starts complaining about it.

Do I have a genuine problem, or is Thread Checker just being over cautious?

Thanks!

Rich.

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
1,755 Views

I suggest you use Google and search for a Lock-Free, Wait-Free allocation routine.

It might be better to keep the Thing in g_thing around once it is allocated.

Jim

0 Kudos
robert-reed
Valued Contributor II
1,755 Views

The second sample code is seriously broken. A simple question: where is g_thing_refs ever decremented?

Let's play computer and walk through the code. Presuming that g_thing_refs starts at 0, we enter get_thing the first time. The first InterlockedCompareExchange doesn't set a lock; it just gets the current value of g_thing_refs, which is 0. Since refs is 0, we fall into the lock and allocate of a new Thing, a pointer towhich is saved in g_thing. But failed is still false, so we whip around through the do ... while again. The first time through the loop we bumped g_things_refs with the InterlockedIncrement, so the second time, refs now gets set to 1 and we drop into the else clause. refs is equal to g_thing_refs, so failed gets set to true and we return the g_thing.

Second time: let's try to do another get_thing. We drop into the do, failed goes to false, the InterlockedCompareExchange sets refs to 1, we fall into the else clause, the second ICE tells us what we want to hear, that g_thing_refs is still equal to what it was just above, failed gets set to true, and we drop to the return, returning the same object that was returned on the first call, without ever allocating a new one!

Thread Checker is definitely telling the truth here. This code is not doing what you expect it to do.

--Robert

0 Kudos
richard_knight
Beginner
1,755 Views

Still think it's doing the right thing.Think you're reading the failed/!failed condition the wrong way around in the outer loop - or I've typed it into the web form wrong :) It should only redo the loop when the second ICE fails.

I'll have another hunt on Google tho - didn't find anything initially, but sounds like a problem someone else should have solved before...

Rich.

0 Kudos
robert-reed
Valued Contributor II
1,755 Views

You're right. I did interpret the failed variable incorrectly the first time I walked through the code. But that still doesn't change the fact that g_thing_refs only has a value of 0 the first time through, or that you only create one Thing.

So let me try walking through the code once more and see if we can identify a typo or what really happens in this function.

Enter get_thing

failed = false

refs = 0 (first ICE)

lock()

g_thing gets a new Thing because g_thing_refs is 0; I assume that the InterlockedIncrement is supposed to be bumping g_thing_refs, not g_things_refs because otherwise, g_thing_refs will always be 0, the else clause will never get executed, and you'll just have an elaborate version of your first example. Assuming that's atypo, g_thing_refs = 1

unlock()

failed is false so return g_thing

Enter get_thing (second request)

failed = false

refs = 1, the current value of g_thing_refs at the first ICE

refs != 0, drop to the else clause

g_thing_refs == refs (1), so g_thing_refs = refs + 1 (2), failed = true, and we loop.

failed = false

refs = ICE(g_thing_refs) or 2

refs != 0, drop to the else clause

g_thing_refs == refs (2), so g_thing_refs = refs +1 (3), failed = true, and we're stuck in the loop.

Enter get_thing (say a second thread comes along while the first one is spinning)

failed = false

refs = ICE(g_thing_refs), call it n

refs != 0, drop to the else clause

Now we have a race between thread 1 and thread 2 over the value of g_thing_refs. Let's say that thread 1 manages to increment g_thing_refs between the time that thread 2 sets its copy of refs to n and when it reaches the else clause ICE. Because g_thing_refs has been bumped, g_thing_refs != refs (thread 2), failed is false, the loop terminates, and thread 2 gets g_thing, which is still pointing at the only Thing allocated so far.

Now take the opposite tack: say thread 2 reaches the else clause ICE before thread 1 does. Thread 2 sees g_thing_refs == refs (n), and bumps g_thing_refs to n + 1; failed = true and thread 2 loops. Meanwhile, thread 1 comes along and now g_thing_refs is not equal to its value of refs. It doesn't set failed to true. It drops out of the loop, and it returns the g_thing without allocating a new Thing.

So with multiple threads, nothing gets permanently hung, but every return from get_thing returns the same Thing. Only one of them gets allocated because refs is derived from g_thing_refs, which is monotonically increasing. The only time you get a new Thing is when refs (g_thing_refs) equals 0. But that also means that you only enter that critical region once, so it runs real fast. I don't think this is the behavior you want. Once again I have to concludethat Intel Thread Checker is telling the truth.

p.s., the only ICE that really does anything here is the third one; the other two could just as easily be replaced with direct references to g_thing_refs with no loss of function.

--Robert

0 Kudos
richard_knight
Beginner
1,755 Views

Hi Robert,

There is also a release_thing function which I thought was irrelevant to the problem thread checker was seeing - turns out it's not and the race was in that function. I fixed that and thread checker says it's all ok. Given what you've said, and what I've read around on google, I'm still not entirely convinced it is tho, so will probably see if I can restructure things to avoid the issue altogether (this is just some cobbled together example code - the real thing is sadly a lot more complicated).

The reason for the first two ICEs in the original code is that without them, thread checker complains there too. I suspect to stop that I'd need to the use the thread checker API to tell it there's some synchronisation going on that it doesn't otherwise know about.

Thanks,

Rich.

0 Kudos
Reply