Solved: Using concurrent_hash_map for caching

Sensei_S_ · ‎07-04-2014

Dear all,

I'd like to use tbb::concurrent_hash_map to implement a cache, but I'd like some advice for optimizing the performances.

Basically, my code needs to check the existence of a std::string, returning an associated std::size_t. The integer is derived from a possibly long computation over a vector:

std::size_t bestMatch(const std::string &s)
{
    std::size_t node_indx = 0;
    int         best_cost = O3optimizer(s, list_.at(0));
    int         curr_cost;
    
    // Check for all strings in the list
    for(std::size_t i = 1; i < list_.size(); i++)
    {
        curr_cost = O3optimizer(s, list_.at(i));
        
        if (curr_cost > best_cost)
        {
            best_cost = curr_cost;
            node_indx = i;
        }
    }
    
    return node_indx;
}

Now, since I need to cache the best matches, I am worried that the tbb::concurrent_hash_map::accessor can have a huge impact on the code. Of course, since I am a newbie on TBB I might have understood how accessors work in a wrong way...

Where should I put the accessor in order to minimize the impact? In particular, if I use the accessor to find if an element is already cached, is the hash map is locked? As far as I understand, not if I use a constant accessor. So I am going with the following code:

std::size_t bestMatch(const std::string &s)
{
    tbb::concurrent_hash_map<std::string, std::size_t>::const_accessor access;

    // Cache find
    if (cache_.find(access, s)) return access->second;

    std::size_t node_indx = 0;
    int         best_cost = O3optimizer(s, list_.at(0));
    int         curr_cost;
    
    // Is the hashmap locked here? It might take some time this loop...
    for(std::size_t i = 1; i < list_.size(); i++)
    {
        curr_cost = O3optimizer(s, list_.at(i));
        
        if (curr_cost > best_cost)
        {
            best_cost = curr_cost;
            node_indx = i;
        }
    }
    
    // New item in the cache
    tbb::concurrent_hash_map<std::string, std::size_t>::accessor writer;
    cache_.insert(writer, s);
    writer->second = node_indx;

    return node_indx;
}

Do you see any possible harmful or performance-related issues?

Thanks & Cheers!

RafSchietekat · ‎07-16-2014

Of course I couldn't resist having a go at it myself, anyway. :-) So what are we talking about? Have a look at some example code:

[cpp]

typedef tbb::concurrent_hash_map<int, std::string> map_t;

map_t m;

const int k = 42;

std::string v;

/* status quo */ {

map_t::const_accessor a1;

map_t:: accessor a2;

if (m.find(a1, k)) {

v = a1->second;

} else if (!m.insert(a2, k)) { // unneeded rehash

v = a2->second; // unneeded exclusive lock

} else {

a2->second = "inserted";

}

/* avoiding clause duplication at the expense of some error-prone contortions */ {

map_t::const_accessor a1;

map_t:: accessor a2;

bool from_insert = false;

if (m.find(a1, k) || (from_insert = !m.insert(a2, k))) {

v = (from_insert ? a2 : a1)->second;

} else {

a2->second = "inserted";

}

/* with find_else_insert() */ {

map_t::const_accessor a1;

map_t:: accessor a2;

if (m.find_else_insert(a1, a2, k)) {

v = a1->second;

} else {

a2->second = "inserted";

}

[/cpp]

First you see the status quo: if you want to avoid some overhead when there are a sufficient number of reads for each write, just do a find() first, and if you don't get the element that way follow up with an insert(). The problem there is that the element has to be hashed again, which may add up if you're looking closely and if the computation is non-trivial. Also, on the occasional false negative, the insert acquires a write lock that it doesn't need. Plus of course the mere fact of calling two functions instead of one. Still, probably not the cost I first thought it would be.

What's maybe most annoying about this, though, is the duplication of the use/read clause, between the find() and the (failed) insert(). Can we do something about that? Well, yes, but it ain't pretty, because there's this thing with the separate accessors, and it requires knowledge that an accessor derives from a const_accessor, which might not necessarily remain so (right?). You can probably rewrite this somewhat, but only to tailor to your own taste, without being able to unravel the inherent problems.

So what we really want is something nice and clean. Short of a redesign of the accessor API, which has separate const_accessor and accessor types instead of separate states (even if that's really what's going on underneath), how about a single call that cleanly separates the read and write clauses? Of course it should also avoid recomputing the hash value or getting an unnecessarily exclusive lock. And you untangle yourself from depending on the inheritance relationship of the accessor types.

That's what I've implemented, but I want to go over the code once more before I post it here. Plus I want to build up some suspense... :-)

BTW, shouldn't downgrade_to_reader() just return void instead, instead of always boolean true (if I'm not mistaken)? The only exceptions where the return value is used or even changed are test_mutex.cpp, test_mutex_v2.cpp, and test_mutex_native_threads.cpp, and I doubt that it would be needed for backward compatibility (for dynamic linking, not compiling), but I haven't studied it any further yet. This seems an obvious thing, like destructors never throwing, but I'd like to know if I'm mistaken. It's not good if the user gets used to ignoring a return value!

And also, shouldn't an accessor always be release()'d before you can use it again, to help catch errors? Would any code break by changing that at this time?

In response to #5, what's "a very rate cache hit rate" in the first project? I'm not sure if the outcome of the second project is useful if the code is not correct. Please fix the code and analyse again.

(Added) Oh yes, if you don't have a single function, you also may have to spend some time testing the code without and with an initial find(), which is always annoying compared to a solution that is (hopefully) guaranteed to give you the best of both.

View solution in original post

RafSchietekat · ‎07-04-2014

Only individual elements are locked by an accessor.

Still, in this case it would be better to choose the insert() overload that takes a key-value pair but no accessor, as indicated in the Reference Manual.

RafSchietekat · ‎07-04-2014

You could do as I've suggested so far (and also limit the scope of "access" or rather "reader", though that would only be for clarity), but then several threads might still be trying to be the first to provide a new value for a new key, which in the general case wastes performance (at least by interfering with other threads running on the same core).

Why not straightforwardly start with an accessor, try to insert the key, return the value if insert() returned false, and otherwise compute and set a new value, keeping write access the whole time?

After that, you could still see whether performance improves by reintroducing the const_accessor. My guess would be that it depends on how many times a key is read, and that it might vary between worse (no reads) and better (many reads).

(2014-07-06 Edited) Removed a very silly suggestion that couldn't possibly work (luckily nobody had read it yet?). One would instead need a new "bool find_or_insert(const_accessor& result_found, accessor& result_inserted, const Key& key)" operation to avoid repeated navigation of the internal data structures.

RafSchietekat · ‎07-07-2014

Any comment on that? We probably need some data on typical number of reads per write (the possible improvement will probably be negligible if there are many reads per write), and performance of find+insert vs. just insert (at least in that typical case if not over a range of ratios), before "anyone" makes the effort to add such an operation.

Sensei_S_ · ‎07-16-2014

Raf, thanks for your help, I've introduced both your solutions in my two projects and compared timings.

So now, I have project #1 with a very rate cache hit rate, so I've noticed that performances weren't that different. I can stay with accessors.

However, in project #2 hit rates are high. Mainly I have an exponential decrease in cache misses, starting from ~200 per second, down to zero. I think I've made some mistakes because sometimes I don't get a valid string from the hash table: hash table contains less items that it should be. But that could be a problem on my side.

I must investigate it through...

RafSchietekat · ‎07-16-2014