Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Using concurrent_hash_map for caching

Sensei_S_
Beginner
3,212 Views

Dear all,

I'd like to use tbb::concurrent_hash_map to implement a cache, but I'd like some advice for optimizing the performances.

Basically, my code needs to check the existence of a std::string, returning an associated std::size_t. The integer is derived from a possibly long computation over a vector:

std::size_t bestMatch(const std::string &s)
{
    std::size_t node_indx = 0;
    int         best_cost = O3optimizer(s, list_.at(0));
    int         curr_cost;
    
    // Check for all strings in the list
    for(std::size_t i = 1; i < list_.size(); i++)
    {
        curr_cost = O3optimizer(s, list_.at(i));
        
        if (curr_cost > best_cost)
        {
            best_cost = curr_cost;
            node_indx = i;
        }
    }
    
    return node_indx;
}

Now, since I need to cache the best matches, I am worried that the tbb::concurrent_hash_map::accessor can have a huge impact on the code. Of course, since I am a newbie on TBB I might have understood how accessors work in a wrong way...

Where should I put the accessor in order to minimize the impact? In particular, if I use the accessor to find if an element is already cached, is the hash map is locked? As far as I understand, not if I use a constant accessor. So I am going with the following code:

std::size_t bestMatch(const std::string &s)
{
    tbb::concurrent_hash_map<std::string, std::size_t>::const_accessor access;

    // Cache find
    if (cache_.find(access, s)) return access->second;

    std::size_t node_indx = 0;
    int         best_cost = O3optimizer(s, list_.at(0));
    int         curr_cost;
    
    // Is the hashmap locked here? It might take some time this loop...
    for(std::size_t i = 1; i < list_.size(); i++)
    {
        curr_cost = O3optimizer(s, list_.at(i));
        
        if (curr_cost > best_cost)
        {
            best_cost = curr_cost;
            node_indx = i;
        }
    }
    
    // New item in the cache
    tbb::concurrent_hash_map<std::string, std::size_t>::accessor writer;
    cache_.insert(writer, s);
    writer->second = node_indx;

    return node_indx;
}

Do you see any possible harmful or performance-related issues?

Thanks & Cheers!

0 Kudos
1 Solution
RafSchietekat
Valued Contributor III
3,212 Views

Of course I couldn't resist having a go at it myself, anyway. :-) So what are we talking about? Have a look at some example code:

[cpp]

typedef tbb::concurrent_hash_map<int, std::string> map_t;

map_t m;

const int k = 42;

std::string v;

/* status quo */ {

    map_t::const_accessor a1;

    map_t::      accessor a2;

    if (m.find(a1, k)) {

        v = a1->second;

    } else if (!m.insert(a2, k)) { // unneeded rehash

        v = a2->second; // unneeded exclusive lock

    } else {

        a2->second = "inserted";

    }

}

/* avoiding clause duplication at the expense of some error-prone contortions */ {

    map_t::const_accessor a1;

    map_t::      accessor a2;

    bool from_insert = false;

    if (m.find(a1, k) || (from_insert = !m.insert(a2, k))) {

        v = (from_insert ? a2 : a1)->second;

    } else {

        a2->second = "inserted";

    }

}

/* with find_else_insert() */ {

    map_t::const_accessor a1;

    map_t::      accessor a2;

    if (m.find_else_insert(a1, a2, k)) {

        v = a1->second;

    } else {

        a2->second = "inserted";

    }

}

[/cpp]

First you see the status quo: if you want to avoid some overhead when there are a sufficient number of reads for each write, just do a find() first, and if you don't get the element that way follow up with an insert(). The problem there is that the element has to be hashed again, which may add up if you're looking closely and if the computation is non-trivial. Also, on the occasional false negative, the insert acquires a write lock that it doesn't need. Plus of course the mere fact of calling two functions instead of one. Still, probably not the cost I first thought it would be.

What's maybe most annoying about this, though, is the duplication of the use/read clause, between the find() and the (failed) insert(). Can we do something about that? Well, yes, but it ain't pretty, because there's this thing with the separate accessors, and it requires knowledge that an accessor derives from a const_accessor, which might not necessarily remain so (right?). You can probably rewrite this somewhat, but only to tailor to your own taste, without being able to unravel the inherent problems.

So what we really want is something nice and clean. Short of a redesign of the accessor API, which has separate const_accessor and accessor types instead of separate states (even if that's really what's going on underneath), how about a single call that cleanly separates the read and write clauses? Of course it should also avoid recomputing the hash value or getting an unnecessarily exclusive lock. And you untangle yourself from depending on the inheritance relationship of the accessor types.

That's what I've implemented, but I want to go over the code once more before I post it here. Plus I want to build up some suspense... :-)

BTW, shouldn't downgrade_to_reader() just return void instead, instead of always boolean true (if I'm not mistaken)? The only exceptions where the return value is used or even changed are test_mutex.cpp, test_mutex_v2.cpp, and test_mutex_native_threads.cpp, and I doubt that it would be needed for backward compatibility (for dynamic linking, not compiling), but I haven't studied it any further yet. This seems an obvious thing, like destructors never throwing, but I'd like to know if I'm mistaken. It's not good if the user gets used to ignoring a return value!

And also, shouldn't an accessor always be release()'d before you can use it again, to help catch errors? Would any code break by changing that at this time?

In response to #5, what's "a very rate cache hit rate" in the first project? I'm not sure if the outcome of the second project is useful if the code is not correct. Please fix the code and analyse again.

(Added) Oh yes, if you don't have a single function, you also may have to spend some time testing the code without and with an initial find(), which is always annoying compared to a solution that is (hopefully) guaranteed to give you the best of both.

View solution in original post

0 Kudos
11 Replies
RafSchietekat
Valued Contributor III
3,212 Views

Only individual elements are locked by an accessor.

Still, in this case it would be better to choose the insert() overload that takes a key-value pair but no accessor, as indicated in the Reference Manual.

0 Kudos
RafSchietekat
Valued Contributor III
3,211 Views

You could do as I've suggested so far (and also limit the scope of "access" or rather "reader", though that would only be for clarity), but then several threads might still be trying to be the first to provide a new value for a new key, which in the general case wastes performance (at least by interfering with other threads running on the same core).

Why not straightforwardly start with an accessor, try to insert the key, return the value if insert() returned false, and otherwise compute and set a new value, keeping write access the whole time?

After that, you could still see whether performance improves by reintroducing the const_accessor. My guess would be that it depends on how many times a key is read, and that it might vary between worse (no reads) and better (many reads).

(2014-07-06 Edited) Removed a very silly suggestion that couldn't possibly work (luckily nobody had read it yet?). One would instead need a new "bool find_or_insert(const_accessor& result_found, accessor& result_inserted, const Key& key)" operation to avoid repeated navigation of the internal data structures.

0 Kudos
RafSchietekat
Valued Contributor III
3,211 Views

Any comment on that? We probably need some data on typical number of reads per write (the possible improvement will probably be negligible if there are many reads per write), and performance of find+insert vs. just insert (at least in that typical case if not over a range of ratios), before "anyone" makes the effort to add such an operation.

0 Kudos
Sensei_S_
Beginner
3,211 Views

Raf, thanks for your help, I've introduced both your solutions in my two projects and compared timings.

So now, I have project #1 with a very rate cache hit rate, so I've noticed that performances weren't that different. I can stay with accessors.

However, in project #2 hit rates are high. Mainly I have an exponential decrease in cache misses, starting from ~200 per second, down to zero. I think I've made some mistakes because sometimes I don't get a valid string from the hash table: hash table contains less items that it should be. But that could be a problem on my side.

I must investigate it through...

0 Kudos
RafSchietekat
Valued Contributor III
3,213 Views

Of course I couldn't resist having a go at it myself, anyway. :-) So what are we talking about? Have a look at some example code:

[cpp]

typedef tbb::concurrent_hash_map<int, std::string> map_t;

map_t m;

const int k = 42;

std::string v;

/* status quo */ {

    map_t::const_accessor a1;

    map_t::      accessor a2;

    if (m.find(a1, k)) {

        v = a1->second;

    } else if (!m.insert(a2, k)) { // unneeded rehash

        v = a2->second; // unneeded exclusive lock

    } else {

        a2->second = "inserted";

    }

}

/* avoiding clause duplication at the expense of some error-prone contortions */ {

    map_t::const_accessor a1;

    map_t::      accessor a2;

    bool from_insert = false;

    if (m.find(a1, k) || (from_insert = !m.insert(a2, k))) {

        v = (from_insert ? a2 : a1)->second;

    } else {

        a2->second = "inserted";

    }

}

/* with find_else_insert() */ {

    map_t::const_accessor a1;

    map_t::      accessor a2;

    if (m.find_else_insert(a1, a2, k)) {

        v = a1->second;

    } else {

        a2->second = "inserted";

    }

}

[/cpp]

First you see the status quo: if you want to avoid some overhead when there are a sufficient number of reads for each write, just do a find() first, and if you don't get the element that way follow up with an insert(). The problem there is that the element has to be hashed again, which may add up if you're looking closely and if the computation is non-trivial. Also, on the occasional false negative, the insert acquires a write lock that it doesn't need. Plus of course the mere fact of calling two functions instead of one. Still, probably not the cost I first thought it would be.

What's maybe most annoying about this, though, is the duplication of the use/read clause, between the find() and the (failed) insert(). Can we do something about that? Well, yes, but it ain't pretty, because there's this thing with the separate accessors, and it requires knowledge that an accessor derives from a const_accessor, which might not necessarily remain so (right?). You can probably rewrite this somewhat, but only to tailor to your own taste, without being able to unravel the inherent problems.

So what we really want is something nice and clean. Short of a redesign of the accessor API, which has separate const_accessor and accessor types instead of separate states (even if that's really what's going on underneath), how about a single call that cleanly separates the read and write clauses? Of course it should also avoid recomputing the hash value or getting an unnecessarily exclusive lock. And you untangle yourself from depending on the inheritance relationship of the accessor types.

That's what I've implemented, but I want to go over the code once more before I post it here. Plus I want to build up some suspense... :-)

BTW, shouldn't downgrade_to_reader() just return void instead, instead of always boolean true (if I'm not mistaken)? The only exceptions where the return value is used or even changed are test_mutex.cpp, test_mutex_v2.cpp, and test_mutex_native_threads.cpp, and I doubt that it would be needed for backward compatibility (for dynamic linking, not compiling), but I haven't studied it any further yet. This seems an obvious thing, like destructors never throwing, but I'd like to know if I'm mistaken. It's not good if the user gets used to ignoring a return value!

And also, shouldn't an accessor always be release()'d before you can use it again, to help catch errors? Would any code break by changing that at this time?

In response to #5, what's "a very rate cache hit rate" in the first project? I'm not sure if the outcome of the second project is useful if the code is not correct. Please fix the code and analyse again.

(Added) Oh yes, if you don't have a single function, you also may have to spend some time testing the code without and with an initial find(), which is always annoying compared to a solution that is (hopefully) guaranteed to give you the best of both.

0 Kudos
RafSchietekat
Valued Contributor III
3,211 Views

Maybe some further cleanup could be done, but here is a tbb42_20140601oss/include/tbb/concurrent_hash_map.h replacement for your consideration. I would be interested in any comparative timing results.

(2014-07-18 Edited) Replaced with more recent contributed version.

(2014-07-25 Added) Please ignore the error in the month of the timestamp in the filename. Furthermore, there's a new version below.

0 Kudos
Alexey-Kukanov
Employee
3,210 Views

Raf,

I have discussed the idea with colleagues, including possible alternatives which we could think of. We did not yet look at the code though.

While your suggestion quite naturally extends the existing API, there is one issue: possibility to use an empty accessor (since one of the two will always be empty) which cannot be detected at compile time. With our current implementation of accessors, such a mistake will result in dereferencing a NULL pointer. So this API is sort of unsafe, or at least requires extra caution.

Also, since the semantics of concurrent_hash_map::insert() is already "find, otherwise add", it seems better to provide yet another overload of insert() instead of a method with a fancy name. Or maybe dangerous_insert, as a precaution to not forget about the risks of its use :)

Commenting on some other points in the discussion:

this thing with the separate accessors, and it requires knowledge that an accessor derives from a const_accessor, which might not necessarily remain so (right?).

The derivation of accessor from const_accessor is officially documented, and we care to keep backward compatibility, so that's unlikely to change.

shouldn't downgrade_to_reader() just return void instead, instead of always boolean true (if I'm not mistaken)? ... It's not good if the user gets used to ignoring a return value!

Might be it was an oversight during the design; but I am not willing to change it now. Also in theory there might be implementations that return false - e.g. a wrapper over a 3rd party lock that does not support downgrading.

shouldn't an accessor always be release()'d before you can use it again, to help catch errors? Would any code break by changing that at this time?

There is an assert in the very first line of lookup() that checks the accessor is not associated with any node; we think it should be sufficient.

    __TBB_ASSERT( !result || !result->my_node, NULL );

 

0 Kudos
RafSchietekat
Valued Contributor III
3,210 Views

Hi Alexey, thanks for your reaction!

Note that my proposal still needs review (both design and implementation), and validation (both correctness and performance: I didn't do any real testing yet and I actually hope somebody else will do that), but I wanted to provide something real for discussion and investigation.

About the double accessor variables: I've tried to make it pretty straightforward (you would also have two in the status-quo example code), but if the user is intent on shooting himself in the foot, like using an accessor before calling a function or in this case using the wrong accessor, there's little you can do about that, I suppose. So I don't think that's a real issue.

The "fancy name" was an attempt to make this self-documenting, because I wanted the normal aka. "then" clause to handle the case of a found element, and the back-up aka. "else" clause to handle the insertion of a missing element, which is the opposite of what insert() would do. First I thought of find_insert(), then find_or_insert(), before coming up with find_else_insert(). Of course you're free to make it just another insert() overload... but in that case don't forget to un-invert the return value!

The existing assertion in lookup() does not actually test user code calling find(), it tests the implementation of find(). What I did was take all those conditional releases that were "distributed" in lookup()'s first-level uses, and consolidate them inside lookup(). Of course then the assertion was no longer testing a precondition, so I removed it. The question is now whether the conditional release(), consolidated or not, should be there at all?

(2014-07-20 Added) In my proposal, I suggested __TBB_implies(x,y) as a function, but, even though it makes little difference here, it would be better to have a macro "#define __TBB_IMPLIES(x,y) (!( x )||/*short-circuit*/( y ))" that would allow lazy evaluation of y, or of course to just use the expanded version directly. I'm also not sure whether search_bucket() might return an invalid but non-NULL node pointer after the non-atomic bucket lock upgrade, so that might need some attention as well (see the assertion two lines below the downgrade).

0 Kudos
RafSchietekat
Valued Contributor III
3,211 Views

New version (as contributed, please ignore find_else_insert in theme): insert() overload instead of new operation find_else_insert(), and some cleanup.

0 Kudos
RafSchietekat
Valued Contributor III
3,211 Views

Here are a few more thoughts for discussion:

The cleanup I did was to make the proposal somewhat more palatable, but maybe it goes in the wrong direction, because I am now of the opinion that the API should perhaps be more orthogonal, with more freedom about how it is used. Maybe it does make sense to have an insert with a const_accessor for inserted and an accessor for found, who knows (that's also an invitation to give an example)? Furthermore, maybe there should be 4 levels, not 3: NULL to allow early detections (where an element can be counted before it is actually inserted in the sense that the inserting thread's accessor is released or maybe even before the inserting thread gets to use the accessor at all, a strange property of the current count()), a new level to disallow such spookiness without the cost of a double CAS (if found to be beneficial), a const_accessor and an accessor. At least the NULL and/or the new level seem appropriate additions, e.g., accessor if inserted and no (const_)accessor if found.

There should be a warning about the danger already currently associated with passing an accessor cast down to a const_accessor. The map will give it a read lock, and the original accessor may still be used to change the element while other threads are reading it, causing a data race. Is there any redeeming feature for the current inheritance relationship? Maybe the (const_)accessor classes should have an operation to probe whether writing is intended, which would free the current implementation from having to find out through overloads, and it would also remove the danger relating to inappropriate casting (by instead getting a write lock for an accessor even if passed as a const_accessor). Getting is_writer() from the underlying spinning_rw_mutex does not work, because the mutex may have been configured incorrectly, but it could be repurposed to get its information from the actual type (polymorphically) and replace lookup()'s "write" parameter.

The double-accessor insert() should probably assert that they are not aliased (unless a suitable use case is identified). Unfortunately C++ does not yet have a keyword "restrict" (I haven't checked if it is in the pipeline for a future version), but at least g++ offers a (non-portable) equivalent and is able to sometimes emit more optimal code, so it seems reasonable to check whether it could also be useful in this case.

At some point exposing an aggregator implementation if not API should probably be considered.

(Edited) Oops, of course an element cannot be found before it is inserted, only detected/counted. And it's a minor point, the emphasis is on also allowing NULL, by using pointers instead of references.

0 Kudos
RafSchietekat
Valued Contributor III
3,211 Views

Any comments about that?

0 Kudos
Reply