Re: DCL^HSI (DCSI-MBR/DCSI-TLS) and DCCI "patterns"

Intel_C_Intel · ‎05-24-2003

FYI,

-------- Original Message --------
Message-ID: <3ECFADCC.8CDF44A6@web.de>
Date: Sat, 24 May 2003 19:37:16 +0200
From: Alexander Terekhov
Newsgroups: comp.programming.threads
Subject: Re: Hyper-threading -- is it threading? Or is it just Hype?
References:
<3EC9F852.E554AA6@web.de>

Alexander Terekhov wrote:
>
> David wrote:
> [...]
> > Dave Solomon
> > Intel Threading Developer Center, Program Manager
>
> Dave, would you please tell your folks to pull off and "fix"
> the following document:
>
>
> (Developing Multithreaded Applications: A Platform Consistent Approach)
>
>
>
> Advice
>
> Use DCL to avoid ....

Date: Sat, 24 May 2003 19:21:46 +0200
Message-Id: <200305241721.h4OHLkQ12946@mailgate5.cinetic.de>
From: "Alexander Terekhov"
To: ", Henry" <@intel.com>
Subject: Re: Hyper-threading -- is it threading? Or is it just Hype?

", Henry" <@intel.com> schrieb am 23.05.03 17:21:52:
>
> Hello Alexander,
> A colleague, Dave Solomon, forwarded me your comment:
>
> > Dave, would you please tell your folks to pull off and "fix" the
following document:
> > Developing Multithreaded Applications: A Platform Consistent Approach.
>
> I'm putting together an errata for version 1.1 of this document. Please
let me know
> what specifically should be fixed. Thanks.

Please take a look at:

http://groups.google.com/groups?threadm=3ECBD751.8DE8AED5%40web.de
(Subject: Re: scoped static singleton question)

>
> Dave probably mentioned this already, but I would also encourage you to
post your
> comments on Intel's threading forum
(www.intel.com/ids/community/threading).
>
> Best regards,
>
> Henry
> Intel KAI Software Lab

regards,
alexander.

Message Edited by intel.software.network.support on 12-09-2005 02:23 PM

jseigh2 · ‎05-28-2003

It seems that Alexander left out a link. Strange, usually he doesn't do that. :) It's The "Double-Checked Locking is Broken" Declaration. It's a pretty good discussion of the pitfalls of trying to use/implement DCL correctly. Also, if you search the Google groups on this topic, you will see that 90% of the participants don't understand the issues, even after it's repeatedly pointed out to them. That alone should tell you how tricky DCL is.

The main problem with DCL is that there is no abstraction layer or api that programmers can use to do a correct implementation. You are pretty much at the mercy of the memory model on your particular platform. And even if you know what to do at some point, the memory models are always changing.

Part of the problem is this technique is not on hardware architects' radar scope. They know enough not to break mutexes but they're not told "don't break DCL".

Joe Seigh

bronx · ‎05-28-2003

> Broken" Declaration[/url]. It's a pretty good
> discussion of the pitfalls of trying to use/implement
> DCL correctly.

in this respect (can't get rid of a synchronization) using the "GOF" Singleton pattern is not a very good idea for any multithreaded application were good performances matter. In most cases lazy creation of the unique instance is useless and you can create all singletons in a single master thread *before* to launch the children threads that will access the "unique instance" ptr (often billions of times...) without any overhead

Intel_C_Intel · ‎05-28-2003

> It seems that Alexander left out a link. Strange,
> usually he doesn't do that. :)

Yeah. Note that the link to the declaration can be found in the article that I've referenced in my posting. ;-)

> It's
> The "Double-Checked Locking is
> Broken" Declaration. It's a pretty good
> discussion of the pitfalls of trying to use/implement
> DCL correctly. Also, if you search the Google groups
> on this topic, you will see that 90% of the
> participants don't understand the issues, even after
> it's repeatedly pointed out to them. That alone
> should tell you how tricky DCL is.

Forget stupid "DCL" term. Please use DCSI (and DCCI).

>
> The main problem with DCL is that there is no
> abstraction layer or api that programmers can use to
> do a correct implementation. You are pretty much at
> the mercy of the memory model on your particular
> platform. And even if you know what to do at some
> point, the memory models are always changing.

I don't see how they are "always changing". You either have a "fully" relaxed model or something close to SC. atomic<> with hoist/sink barriers (in addition to conventional acquire/release stuff) would surely work in ten years from now with no problems whatsoever. Oder?

regards,
alexander.

bronx · ‎05-28-2003

> Forget stupid "DCL" term. Please use DCSI (and
> DCCI).

excuse my ignorance but are these acronyms a creation of your own ? or do you have a pointer to some formal definition of these "DCSI" and "DCCI" thingies ?

Intel_C_Intel · ‎05-28-2003

> > Forget stupid "DCL" term. Please use DCSI (and
> > DCCI).
>
> excuse my ignorance but are these acronyms a creation
> of your own ?

Yeah, sort of "my own". ;-)

> or do you have a pointer to some formal
> definition of these "DCSI" and "DCCI" thingies ?

Sure.

http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=4856

http://groups.google.com/groups?selm=3ECA3A27.C06CBB62%40web.de

http://groups.google.com/groups?selm=3ecb989d%40usenet01.boi.hp.com

regards,
alexander.

bronx · ‎05-28-2003

> Sure.
>
> http://www.opengroup.org/sophocles/show_mail.tpl?sourc
> =L&listname=austin-group-l&id=4856
>
> http://groups.google.com/groups?selm=3ECA3A27.C06CBB62
> 40web.de
>
> http://groups.google.com/groups?selm=3ecb989d%40usenet
> 1.boi.hp.com
>

so at least two peoples on earth (you included) use them already ;-)

Intel_C_Intel · ‎05-28-2003

[... pedigree of "DCSI" and "DCCI" terms ...]

> so at least two peoples on earth (you included)

Nah, and my kids...

> use them already ;-)

Yeah, to tell the truth, my wife doesn't like for some reason (as usual, she said: "bad taste"). ;-)

regards,
alexander.

ClayB · ‎05-28-2003

Alexander, now that we've cleared up the FLAs (four letter acronyms :-), can you give us a rundown on how the DCSI and DCCI codes work and what an improvement they are over the DCL (DCI)? I've poked around the URLs you gave and looked through the code, but all the dots and more dots make my eyes cross. I'm sure others would appreciate a few words of explanation, too.

Maybe you can get your kids to help. ;-)

-- clay

Henry_G_Intel · ‎05-29-2003

Hi Alexander,
Thanks for posting this issue here. After reading through the links that you provided, the main problem with Use a Double-Check Pattern to Avoid Lock Acquisition for One-Time Events is that the initialization variable must have a volatile qualifier in the code examples. The cautionary note about using DCL in Java still appears to be correct.

Henry

Intel_C_Intel · ‎05-30-2003

ClayB wrote:
>
> Alexander, now that we've cleared up the FLAs (four
> letter acronyms :-), can you give us a rundown on how
> the DCSI and DCCI codes work and what an improvement
> they are over the DCL (DCI)? I've poked around the
> URLs you gave and looked through the code, but all
> the dots and more dots make my eyes cross. I'm sure
> others would appreciate a few words of explanation,
> too.

DCSI/DCCI is "kinda useful" for thread-safe lazy inits
of immutable stuff (with a few exceptions... like Joe's
condvars with DCCI ;-) ). Mutable stuff that needs
locking can simply "eagerly" initialize a mutex (or a
read-write lock) and use it not only for access, but
also for thread-safe lazy init of associated stuff
(you'd need locking anyway for mutable stuff, "lock-
free" aside for a moment). Well, again, there are
exceptions here as well, like lazy init of things ala
TSD keys (or read-write locks that simply lack static
initializers) in libraries or components that don't
have "init/fini" calls, for example.

DCSI-TLS is this:

    /* ... */
    mutex                                    stuff_mtx;
    stuff *                                  stuff_shared_ptr;
    thread_specific_ptr stuff_thread_ptr;
    /* ... */
  };

  const stuff & thing::stuff_instance() { // "lazy" one
    stuff * ptr;
    if (0 == (ptr = stuff_thread_ptr.get())) { 
      { mutex::guard guard(stuff_mtx);
        if (0 == (ptr = stuff_shared_ptr))
          ptr = stuff_shared_ptr = new stuff(/*...*/);
      }
      stuff_thread_ptr.set(ptr);
    } 
    return *ptr;
  }

The idea is that each thread should perform
synchronization on first access of shared data which
guarantees proper memory sync./visibility. A possible
implementation of thread_specific_ptr<> can be found
here: <> (Subject: Re: OO
design: Is "errno" Exception?). Note that there's
a "defect" in POSIX with respect to "no_cleanup" TSD
keys. Please also note that Microsoft TLS is so
braindamaged that I was unsure whether it's worth to
even mention it. ;-)

Now, DCSI-MBR is this:

  /* ... */
    mutex           stuff_mtx;
    atomic stuff_ptr;
    /* ... */
  };

  const stuff & thing::stuff_instance() { // "lazy" one
    stuff * ptr;
    // hoist load barrier (with data dependency "hint")
    if (0 == (ptr = stuff_ptr.load_ddhlb())) { 
      mutex::guard guard(stuff_mtx);
      if (0 == (ptr = stuff_ptr.load())) { 
        ptr = new stuff(/*...*/);
        // sink store barrier
        stuff_ptr.store_ssb(ptr);
      }
    } 
    return *ptr;
  }

Here, the idea is that atomic<> should provide required
atomicity and memory sync./visibility via injecting
necessary memory access reordering constraints (for
both compiler and hardware). More info on this can be
found here: <> (Subject: Re:
Is this thread-safe on multi-processor Win32?). Also,
please note that Microsoft interlocked stuff (kinda
revised "Server-2003-and-above" including) is also
braindamaged beyond the limits, so to speak. No wink,
this time.

Now, DCCI doesn't serialize init. Multiple inits are
done concurrently but there's a single "winner". Here
it is:

    /* ... */
    atomic stuff_ptr;
    /* ... */
  };

  const stuff & thi
ng::stuff_instance() { // "lazy" one
    stuff * ptr;
    // hoist load barrier (with data dependency "hint")
    if (0 == (ptr = stuff_ptr.load_ddhlb())) { 
      ptr = new stuff(/*...*/);
      // sink store barrier
      if (!stuff_ptr.attempt_update_ssb(ptr, 0)) { 
        delete ptr;
        // hoist load barrier (with data dependency "hint")
        if (0 == (ptr = stuff_ptr.load_ddhlb()))
          abort();
      }
    } 
    return *ptr;
  }

Questions?

regards,
alexander.

P.S. C/C++ volatiles wont help you here (portably).
Revised Java volatiles WILL work... but they'll inject
kinda "way too much" constraints/barriers (to my taste);
I don't like them. I like atomic<> template that,
hopefully, will be standardized by the C++ standards
committee and/or "hypothetical" POSIX.1++ folks).