Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2465 Discussions

concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration

levicki
Valued Contributor I
981 Views
In addition to not having a sort() method it seems that concurrent_hash_map() doesn't support sorted insertion?

Can anyone from Intel confirm this, and give a valid reason why such a feature existing in standard hash_map) would be left out in TBB?

Furthermore, looking at the source code, accessors are thread-safe, but what about iterators?

For example, how to advance through a sorted list of items in one thread which is controlled by user input (navigation keys) and add/remove items in sorted order from another thread while keeping everything thread-safe?

I am aware that I am not doing any parallel computation using concurrent_hash_map(), and therefore I am perhaps not using it as intended, but I would still like to have a thread-safe hash_map().

0 Kudos
7 Replies
RafSchietekat
Valued Contributor III
981 Views
"In addition to not having a sort() method it seems that concurrent_hash_map() doesn't support sorted insertion?"
C++'s map is essentially an ordered tree, TBB's concurrent_hash_map is essentially a hash table. std::map doesn't support hashing, and access has logarithmic complexity, so it's not a one-sided deficiency: you just can't have it both ways at the same time, I guess.

"Can anyone from Intel confirm this, and give a valid reason why such a feature existing in standard hash_map) would be left out in TBB?"
If you don't mind my giving this a try: C++ is getting an "unordered_map" (say what? OMG!), and maybe a concurrent skip list could be considered in TBB's future (but you should check for yourself what has been written about this before, or maybe somebody could remind us).

"Furthermore, looking at the source code, accessors are thread-safe, but what about iterators?"
Do you have an example that makes sense with a hash map? And even so, complexity and performance considerations may require separate implementations without or with concurrent iterators, i.e., concurrent iterators don't come cheap.

"For example, how to advance through a sorted list of items in one thread which is controlled by user input (navigation keys) and add/remove items in sorted order from another thread while keeping everything thread-safe?"
In this situation a conventional locked std::map would do fine: users are so slow you might as well go to sleep waiting for them to do anything, you don't need the performance of a concurrent_hash_map(), let alone anything even faster, all with their own trade-offs between user features and performance.

"I am aware that I am not doing any parallel computation using concurrent_hash_map(), and therefore I am perhaps not using it as intended, but I would still like to have a thread-safe hash_map()."
It looks like you should write your own wrapper around std::map to hide the lock.
0 Kudos
levicki
Valued Contributor I
981 Views
Hello, thanks for taking the time to reply.

However, I am not sure I understand what you mean.

1. Why are you comparing C++ map to the TBB hash_map when C++ also has a hash_map?

2. As far as I know C++ hash_map can be kept sorted or at least I have managed to do it with Microsoft implementation when I was experimenting with it. Perhaps that was an error or a pure luck?

As for the example, lets say we have a key container:

[cpp]typedef struct _Key {
	wchar_t		FileName[MAX_PATH];
	DWORD		NameLength;
} Key;
[/cpp]

And a value container:

[cpp]typedef struct _Value {
	ULONGLONG	FileSize;
	FILETIME	FileTime;
	BOOL		Selected;
} Value;[/cpp]

Now wrap those in classes:

[cpp]class	CKey
{
public:
	CKey(const wchar_t *FileName)
	{
		wcscpy_s(_Data.FileName, MAX_PATH, FileName);
		_Data.NameLength = wcslen(_Data.FileName);
	}

	~CKey(void)
	{
	}

	// operator < supporting natural sort order
	bool operator<(const CKey& x) const
	{
		return (StrCmpLogicalW(_Data.FileName, x._Data.FileName) < 0);
	}

	friend wostream& operator<<(wostream& o, const CKey& x)
	{
		o << L"FileName     : " << x._Data.FileName << endl;
		return o;
	}

	Key	_Data;
};
[/cpp]
And the value as well:

[cpp]class	CValue
{
public:
	CValue(void)
	{
	}

	~CValue(void)
	{
	}

	friend wostream& operator<<(wostream& o, const CValue& x)
{
FILETIME lt;
SYSTEMTIME st;

FileTimeToLocalFileTime(&x._Data.FileTime, <);
FileTimeToSystemTime(<, &st);

o << L"FileTime : " << st.wYear << L"-" << st.wMonth << L"-" << st.wDay << L" " << st.wHour << L":" << st.wMinute << L":" << st.wSecond << L"." << st.wMilliseconds << endl;
o << L"FileSize : " << x._Data.FileSize << endl;
o << L"Selected : ";
if (x._Data.Selected) {
o << L"TRUE";
} else {
o << L"FALSE";
}
o << endl;
return o;
}

private: Value _Data; }; [/cpp]
I have left out CValue methods for the sake of brevity. Now we define types for the list:

[cpp]typedef hash_map TList;
typedef hash_map::iterator TIter;
typedef pair TPair;
typedef pair TInsert;
[/cpp]
And extend stdext with our custom hash function:

[cpp]namespace stdext {
	// (One-at-a-Time Hash method by Bob Jenkins)
	size_t hash_value(const CKey& x)
	{
		size_t h = 0;

		for (size_t i = 0; i < x._Data.NameLength; i++) {
			h += x._Data.FileName;
			h += (h << 10);
			h ^= (h >> 6);
		}

		h += (h << 3);
		h ^= (h >> 11);
		h += (h << 15);

		return h;
	}
};
[/cpp]
Lets say that we have a list class:

[cpp]class CImageList
{
public:
	CImageList(void)
	{
	}

	~CImageList(void)
	{
	}

	BOOL Add(const wchar_t *FileName)
	{
		CKey	key(FileName);
		CValue	val;
		TInsert	RetVal;

		val.SetSelected(TRUE);
		// insert does not invalidate iterators!
		RetVal = _List.insert(TPair(key, val));

		return RetVal.second;
	}

	void Dump(void)
	{
		ENTER_CS;
		wcout << L"BEGIN" << endl;
		for (TIter i = _List.begin(); i != _List.end(); i++) {
			wcout << i->first;
			wcout << i->second;
		}
		wcout << L"END" << endl;
		LEAVE_CS;
	}
private:
	TList			_List;
};
[/cpp]
Again, I left out some of the methods, we just need add to check. Then we use the list class:

[cpp]	CImageList	all;

	all.Add(L"z100.jpg");
	all.Add(L"z2.jpg");
	all.Add(L"z1.jpg");
	all.Dump();
[/cpp]
This worked for me -- it somehow magically sorted the elements on insertion so I got the output:

z1
z2
z100

when iterating.

Now my questions:

1. Do you have any ideas if what I did was by accident, or it really works that way?
2. How to make this code thread-safe except by rolling my own locking?

Finally, what you said about users being slow is true -- what is fast is the file adding/removing part which is done in another thread by watching a folder which being browsed by the user. Files can be added by a download manager, archiver (on unpacking) and removed by user (program allows to select and delete multiple files from the list). Lets say that I need fast add/remove for up to 10,000 elements, naturally sorted list and sequential iterators and all of it should be thread-safe. If there is a better way to accomplish this than the hash_map() I would be very glad to learn how.

0 Kudos
RafSchietekat
Valued Contributor III
981 Views
"Why are you comparing C++ map to the TBB hash_map when C++ also has a hash_map?"
Not officially yet, but it does seem widely supported, so my answer turned out to be somewhat pedantic, sorry.

"As far as I know C++ hash_map can be kept sorted or at least I have managed to do it with Microsoft implementation when I was experimenting with it. Perhaps that was an error or a pure luck?"
If this is not luck (a 17% chance with 3 entries), it still seems suboptimal and probably nonportable.

Intuitively I would presume that a locked std::map is up to the task of tracking file system directory events as well as user events, so I would start with that and then see/profile what happens, unless someone has a better idea.
0 Kudos
levicki
Valued Contributor I
981 Views
Quoting - Raf Schietekat
1. Not officially yet, but it does seem widely supported, so my answer turned out to be somewhat pedantic, sorry.

2. If this is not luck (a 17% chance with 3 entries), it still seems suboptimal and probably nonportable.

3. Intuitively I would presume that a locked std::map is up to the task of tracking file system directory events as well as user events, so I would start with that and then see/profile what happens, unless someone has a better idea.

1. No problem, I know that it is not part of the C++ standard, that is why Microsoft has moved it to stdext but I don't care as long as it works.

2. Portability is a non-issue for me, this is a Windows application we are talking about which has other dependencies much more serious than a simple C++ class. As for suboptimal -- perhaps, but perhaps still optimal enough for my needs?

3. I see... but my key is the filename -- wouldn't map() be slower on add/remove/find operations than a hash_map()?

I was hoping that Intel's concurrent_hash_map() will be just a concurrent implementation of MSVC hash_map() (i.e. that it will have this sorting functionality I need) so I get thread safety by using it without having to reinvent the wheel. I Guess I was wrong, it seems I will have to roll my own after all. Too bad that I will have to waste time on this part of the code instead of working on the application features.

0 Kudos
RafSchietekat
Valued Contributor III
981 Views
"I see... but my key is the filename -- wouldn't map() be slower on add/remove/find operations than a hash_map()?"
Complexity is logarithmic resp. constant, so from a certain size a map would become slower, but I don't have any numbers for that.
0 Kudos
levicki
Valued Contributor I
981 Views
Quoting - Raf Schietekat
"I see... but my key is the filename -- wouldn't map() be slower on add/remove/find operations than a hash_map()?"
Complexity is logarithmic resp. constant, so from a certain size a map would become slower, but I don't have any numbers for that.

Guess I will have to stick with hash_map() then, too bad that I will have to implement locking myself.

My other question regarding concurrent_hash_map() not being sorted still stands.
0 Kudos
RafSchietekat
Valued Contributor III
981 Views
"My other question regarding concurrent_hash_map() not being sorted still stands."
And I'm curious about the exact cost of this hybrid map vs. purely ordering or purely hashing (stdext::hash_map::insert() is documented as O(log N) (just like std::map), vs. C++0x's std::unordered_map::insert()'s O(1), not considering any constant factors), and how that works out for different use cases. It also seems potentially annoying, if not blocking, to have to come up with a total order on the elements if no natural total order exists. But I grant that it may be useful sometimes (as a third, separate option), like it may be useful to make your own poor man's hybrid of a locked std::map and tbb::concurrent_hash_map.
0 Kudos
Reply