- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I was working out details of distributed reference counting scheme (http://groups.google.com/group/lock-free/browse_frm/thread/46d00a0f543a758e) I needed a component which is basically similar to tbb::concurrent_vector, i.e. concurrent subscripting and growth. Self-established goal was to achieve lowest possible overhead and maximum possibility of inlining of subscripting, because subscripting is invocked on every reference counting operation.
The variant I came out is:
While tbb::concurrent_vector implementation is:
According compiler code is:
As you may see, my variant cuts shift, bitwise and and substraction instructions, as well increases possibility of inlining, plus a bit smaller preassure on register file.
The trick is to direct segment pointer N elements before actual memory block. This way one does not need to do that "(1 << high_bit) & ~1" offset every time. Each segment pointer points to where array would begin if all the data would situated in single continous memory block. I.e. if first N segments allocated within single memory block, first N segment pointers will be equal.
Are you interested in incorporating this into TBB? However, I suspect there may be some problems with binary interface compatibility, right?
The variant I came out is:
[bash]int subscript1(int** root, size_t idx) { unsigned long high_bit; _BitScanReverse(&high_bit, idx); return root[high_bit][idx]; }[/bash]
While tbb::concurrent_vector implementation is:
[bash]int subscript2(int** root, size_t idx) { unsigned long high_bit; _BitScanReverse(&high_bit, idx); idx -= (1 << high_bit) & ~1; return root[high_bit][idx]; }[/bash]
According compiler code is:
[bash] while (x) x += subscript1(root, y); 00401014 0F BD D0 bsr edx,eax 00401017 89 54 24 08 mov dword ptr [esp+8],edx 0040101B 8B 14 96 mov edx,dword ptr [esi+edx*4] 0040101E 03 0C 82 add ecx,dword ptr [edx+eax*4] 00401021 75 F1 jne main+14h (401014h) [/bash]
[bash] while (x) x += subscript2(root, y); 00401015 0F BD C8 bsr ecx,eax 00401018 BF 01 00 00 00 mov edi,1 0040101D D3 E7 shl edi,cl 0040101F 8B D8 mov ebx,eax 00401021 89 4C 24 0C mov dword ptr [esp+0Ch],ecx 00401025 8B 0C 8E mov ecx,dword ptr [esi+ecx*4] 00401028 83 E7 FE and edi,0FFFFFFFEh 0040102B 2B DF sub ebx,edi 0040102D 03 14 99 add edx,dword ptr [ecx+ebx*4] 00401030 75 E3 jne main+15h (401015h)[/bash]
As you may see, my variant cuts shift, bitwise and and substraction instructions, as well increases possibility of inlining, plus a bit smaller preassure on register file.
The trick is to direct segment pointer N elements before actual memory block. This way one does not need to do that "(1 << high_bit) & ~1" offset every time. Each segment pointer points to where array would begin if all the data would situated in single continous memory block. I.e. if first N segments allocated within single memory block, first N segment pointers will be equal.
Are you interested in incorporating this into TBB? However, I suspect there may be some problems with binary interface compatibility, right?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting Dmitriy Vyukov
The trick is to direct segment pointer N elements before actual memory block. This way one does not need to do that "(1 << high_bit) & ~1" offset every time. Each segment pointer points to where array would begin if all the data would situated in single continous memory block. I.e. if first N segments allocated within single memory block, first N segment pointers will be equal.
Are you interested in incorporating this into TBB? However, I suspect there may be some problems with binary interface compatibility, right?
Are you interested in incorporating this into TBB? However, I suspect there may be some problems with binary interface compatibility, right?
Yes, it is a good idea, thank you! And yes, binary compatibility may be a problem but we know how to solve it :) Probably, we could incorporate it some time after 3.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nice trick! As Anton noted, we won't have time to get it into our next release but we're surely look into adding it later.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am happy you like it. ConcRT guys just rejected my comment with the suggestion. They copied concurrent_vector from TBB, right? Ok, it's their problem. I think it's nice to have primary method of primary data structure in fundamental library as fast as possible, even if we are talking about a handful of instructions.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page