Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2477 Discussions

using parallel_for for iterating over a concurrent_hash_map

itik
Beginner
1,627 Views
can anyone please explain or give a code sample of how to use parallel_for in order to iterate over a concurrent_hash_map?
0 Kudos
12 Replies
Victoria_G_Intel
Employee
1,627 Views

#inclide

#include

"tbbconcurrent_hash_map.h"

#include

"tbb ask_scheduler_init.h"

#include

"tbbparallel_for.h"

typedef tbb::concurrent_hash_map<int, std::string, MyHashCompare> MyHashMap;

MyHashMap MyHash;

void

DoSmthToStr (std::string& str) {

//Process "str" - value from hash map

}

// parallel_for Body

class

HashParallelIteration {

MyHashMap &data;

public

:

HashParallelIteration (MyHashMap& _data): data(_data) {}

void operator() (MyHashMap::range_type& r) const {

for (MyHashMap::iterator i = r.begin(); i != r.end(); ++i) {

// i->first - key; i->second - value;

std::string& str = i->second;

DoSmthToStr (str);

}

}

};

int

main ()

{

const int GrainSize = ;

tbb::task_scheduler_init tbb_init;

//"r" is a range object for MyHash

MyHashMap::range_type r = MyHash.range (GrainSize);

tbb::parallel_for (r, HashParallelIteration (MyHash), tbb::auto_partitioner());

return 0;

}

0 Kudos
Ole_Kristian_Ekseth
1,627 Views
Thanks for the example: I'm working with genomic data for a bio-informatic-research-group at NTNU, Norway: At my disposition I have 25 CPUs on my core system, and appr. 150 CPU's at the secondary. Therefore, in order to utilize this capacity, I have started to implement the algorithms (some takes appr 12 days in a serial version) in TBB

Due to the fact that my datas are nested, I use two hashes to retrive the key in the hird hash: I'm therefore interested in a blocked_range_2d and blocked_range_3d for hashes, but has not been able to implement it:
-- If anyone has the opportunity to give me an example of a blocked_range_2d implementation of a concurrent_hash_map, I would be thankful.


0 Kudos
RafSchietekat
Valued Contributor III
1,627 Views
Nice toys you have there! I think that with hash tables, where vicinity is meaningless, you should not need more than nested loops over ordinary blocked_range: higher-dimensional ranges are meant to minimise block boundary or surface relative to surface or volume.

0 Kudos
robert_jay_gould
Beginner
1,627 Views
Thanks for the example: I'm working with genomic data for a bio-informatic-research-group at NTNU, Norway: At my disposition I have 25 CPUs on my core system, and appr. 150 CPU's at the secondary.

**drool**

Umm, as Raph mentions you don't really need 3 hashes, unless the key-data is different...

Are you using something like numeric-key & nucleotide-key& aminoacid-key (protein folding stuff), or do all keys share the same type (DNA reconstruction)?

Either-way I'm not sure a 2d or 3d iteration space will really benefit you there. Normally this kind of thing goes through iterations (epochs), and you work on one set storing data into the others then work on a different set, I'm not sure if you can realistically work on all three data sets at the same time, unless you are matching DNA of three different sources.

0 Kudos
Ole_Kristian_Ekseth
1,627 Views

**drool**

Umm, as Raph mentions you don't really need 3 hashes, unless the key-data is different...

Are you using something like numeric-key & nucleotide-key & aminoacid-key (protein folding stuff), or do all keys share the same type (DNA reconstruction)?

Either-way I'm not sure a 2d or 3d iteration space will really benefit you there. Normally this kind of thing goes through iterations (epochs), and you work on one set storing data into the others then work on a different set, I'm not sure if you can realistically work on all three data sets at the same time, unless you are matching DNA of three different sources.


Hi, thanks for the answers: since my post, I have been working with the implementation of the above metioned solution*, and after your replies I have realised that my knowledge of the TBB package is limited. I have therefore spent more time than first expected to analyse the documentation.

My challange is, as you both probably are aware of, that the non deterministic behaviour of the parallel system produces different results for every run. -- In my particular case, this causes an segmentation error to be thrown approx. every 7. run. The code, located below, iterates in parallel over approx. 130 000 elements, divided into ranges: If you do have the opportunity, I would be thanfull if you could inform me about what the segmentation error is caused by.

Regards. Ole Kristian Ekseth, NTNU
* If the implementation results in the results we, the research group, expects, they will be published in a paper for bio informatics

[cpp]//(included all necessary libraries)
struct Protein {
  string name;
  string type;
};

typedef concurrent_hash_map Link;

Link mapProtein; 

class map_limit_parallel {
public: map_limit_parallel() {}
  void operator() (Link::range_type& range) const {
    if (!range.empty()) {
      Link::iterator i = range.begin();
      string name = i->second.name;
    }
  }
};

int main() {
  int gsize = 4000;
  //mapProtein contains 130 000 elements
  Link::range_type r = mapProtein.range(gsize);
  /*Executing*/
  parallel_for(r, map_limit_parallel(), auto_partitioner());
  return 0;
}[/cpp]

0 Kudos
RafSchietekat
Valued Contributor III
1,627 Views

Just to be sure,did allother processing on the map "happen before" executing parallel_for? I don't see anything obvious, but you have not included all the code, so my best (wild) guess at this point relates to the requirement that other processing must have stopped, and that all tasks/threads involved in that processing must have synchronised their results somehow.

0 Kudos
Ole_Kristian_Ekseth
1,627 Views
Hi, the code above is a narrowed version: in order to get a runnable script, make a call to the function included at the bottom of this post, and add a "task_scheduler_init_call" at the first line of the main function: this stripped-down-version produced the same error as described in my previous post: if you do have the opportunity, I would be thankful if you could run the script on your system, and give me a word about the results.

Regards, Ole Kristian

[cpp]/*
  To be inserted before the parallel_for execution"
  
  RMK: In order to get the parallel_for to work properly, insert the code " task_scheduler_init init_parallel;" in the start of the main body
*/
void fill_test_data() {
  for (int i = 0;i<130000;i++) {
    char temp[70];
    sprintf(temp, "%d", i);
    Link::accessor p;
    mapProtein.insert(p, temp);
    p->second.name = "King Harald";
    p->second.type="Human";
    p.release();
  }
}
[/cpp]



0 Kudos
RafSchietekat
Valued Contributor III
1,627 Views

Could you perhapsprovide a minimal but fully self-contained code file, including MyHashCompare implementation? Perhaps the TBB version might be relevant as well.

0 Kudos
Ole_Kristian_Ekseth
1,627 Views
Hi, thanks for your intereset in giving me a helping hand:
- The code, together with its Makefile* and an executable is uploaded to the attachement: If you do get any problems reading the file, give me a word!
- The system, a Linux Redhat, uses a 2.6.18-release (the long line is included in the Makefile)
* The makefile is hardcoded: to get it working on your system, update the variables to their appropriate values

Regards,

0 Kudos
Alexey-Kukanov
Employee
1,627 Views
According to your makefile, you use somewhat old release of TBB. I recommend you to switch to a newer version, e.g. the most recent commercially-aligned release. I was unable to reproduce the problem with your test using that version.
0 Kudos
RafSchietekat
Valued Contributor III
1,627 Views

I couldn't do anything with the file (size 88116, MD5=77d783..., "gunzip: stdin: not in gzip format"), but I'll try again later on another system.

0 Kudos
Ole_Kristian_Ekseth
1,627 Views
Thanks for your comment: your problem opening the file was caused by a misspelling from my point of view regarding the file extension:
-- The file is of the bz2-type, and therefore if you enter tar -xjf iterate.gzip on your command line*, the content of the compressed file will appear

After a tip from Alexey, I replaced the newest stable release (tbb21_20080605oss_lin) with the newest commercial release (tbb21_017oss_src); the results, after my first few tests, is that it solved the problem.

During the next days I will continue the testing, and see if I manage to produce the bugs who I, as described, previously had.

-- If I have the opportunity to meet you at a conference in the future, give me word, and I'll owe you a beer ;)

Regards,
---
*This might not be true on your system: check, as always :), the manual
0 Kudos
Reply