- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consult the Intel Intrinsics Guide https://software.intel.com/sites/landingpage/IntrinsicsGuide/#=undefined&cats=Compare&expand=651
Check the Compare box. You have vector compares, returning a mask, for 8, 16, 32 and 64 bit lanes.
With those, you can write your own vmemcmp-like int compar function.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
Consult the Intel Intrinsics Guide https://software.intel.com/sites/landingpage/IntrinsicsGuide/#=undefined&cats=Compare&expand=651
Check the Compare box. You have vector compares, returning a mask, for 8, 16, 32 and 64 bit lanes.
With those, you can write your own vmemcmp-like int compar function.
Jim Dempsey
Thanks for reply. Function mentioned by me is used by our custom hashtable implementation, so vector instruction would not help there. We probably would have to replace our hashtables with something else to take advantage of them. With current implementation dedicated non-vector compare instruction would be better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for int64 we have to use expression "a < b ? -1 : a > b"
Why can't you use subtraction for int64?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andysem wrote:
for int64 we have to use expression "a < b ? -1 : a > b"
Why can't you use subtraction for int64?
It may cause int range overflow, e.g. INT64_MIN - 1 == INT64_MAX.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why can't you use subtraction for int64?
I think he is after some way of doing this without using branches (which will be badly predicted). It should be relatively simple to use cmov, though, something like
static inline int compare(int a, int b) { int one = 1; int minusone = -1; int res; __asm__ volatile ("cmpl %1,%2\n" "movl $0,%0\n" "cmovll %3,%0\n" "cmovgl %4,%0": "=r"(res): "r"(a),"r"(b),"r"(one),"r"(minusone) ); return res; }
Whether that is better than the code the compiler generates by default is unclear, of course, and I haven't looked at IACA to investigate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cownie, James H (Intel) wrote:
Why can't you use subtraction for int64?
I think he is after some way of doing this without using branches (which will be badly predicted). It should be relatively simple to use cmov, though, something like
static inline int compare(int a, int b) { int one = 1; int minusone = -1; int res; __asm__ volatile ("cmpl %1,%2\n" "movl $0,%0\n" "cmovll %3,%0\n" "cmovgl %4,%0": "=r"(res): "r"(a),"r"(b),"r"(one),"r"(minusone) ); return res; }Whether that is better than the code the compiler generates by default is unclear, of course, and I haven't looked at IACA to investigate.
Yes, branchless code is one of my goals, and gcc actually generates such code (see below). However dedicated instruction would be faster here. Additionally it could allow compilers to perform some additional optimizations - ability to get -1 as a result of comparison may be handy in some cases.
cmp(long, long): xor eax, eax cmp rdi, rsi mov edx, -1 setg al cmovl eax, edx ret
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Function mentioned by me is used by our custom hashtable implementation, so vector instruction would not help there.
Why do you say that.
Your original thread post requests a memcmp-like integer compare. This implies you have a list of integers that you wish to compare. Hash tables, other than for collision lists do not have "lists" in need of searching. So, we need to ask: what is your real requirement (be specific).
If your problem is that you have a list of hash keys that are to be searched for a match (as opposed to having a key being used to construct an index), then the vector compare, producing a bitmask, is the correct choice to use.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
>>Function mentioned by me is used by our custom hashtable implementation, so vector instruction would not help there.
Why do you say that.
Your original thread post requests a memcmp-like integer compare. This implies you have a list of integers that you wish to compare. Hash tables, other than for collision lists do not have "lists" in need of searching. So, we need to ask: what is your real requirement (be specific).
If your problem is that you have a list of hash keys that are to be searched for a match (as opposed to having a key being used to construct an index), then the vector compare, producing a bitmask, is the correct choice to use.
Jim Dempsey
Part of our hashtable implementation are AVL trees, and this compare function is used to traverse them.
Everything is written in C so we cannot use templates, code uses callback functions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I must be misunderstanding something. From my experience, a hash table is used to avoid trees and traversals. You perform a hash from an item being stored (e.g. name), and that hash is the location of the data (or an indication of a collision). IOW there is no searching.
If you are using a binary tree structure (and using the hash code as "plain key" for storage), then for each node (bucket) of keys you would do a binary search or sequential search depending on if the data (keys) stored in the node/bucket were sorted or not (design decision).
If your design uses a binary search in the node/bucket, then you do not need a memcmp-like instruction/function.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our implementation is a mix of these two approaches - our "hashtable" is a set of AVL trees. Code first uses hash function to select one of trees, then it traverses it with help of compare function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
I must be misunderstanding something. From my experience, a hash table is used to avoid trees and traversals. You perform a hash from an item being stored (e.g. name), and that hash is the location of the data (or an indication of a collision). IOW there is no searching.
A typical hash table-based container consists of an array of buckets, and each of the buckets is also a data structure, such as a list or a tree. You first map the hash value onto the array of buckets (which is fast) and then you search for the element in the bucket (which can be slow if you have lots of collisions). Having a bucket that simply stores one element is possible, but only works if you don't have collisions at all, which is hard to guarantee most of the time.
I believe, Daniel says that in his case buckets are implemented as AVL trees.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Daniel F. wrote:
Quote:
andysem wrote:
for int64 we have to use expression "a < b ? -1 : a > b"
Why can't you use subtraction for int64?
It may cause int range overflow, e.g. INT64_MIN - 1 == INT64_MAX.
1. While signed integer overflow is UB in C, it is not in x86, so you can use assembler or cast the values to unsigned integers. Overflows still need to be handled though to produce the correct result. It should be possible with some bit twiddling but I'm not sure the code will be faster than the naive approach.
2. memcmp operates on unsigned bytes, so it doesn't apply to your case. An instruction would have to provide at least two variants - for signed and unsigned comparison.
3. Although I admit that three way comparison (https://en.wikipedia.org/wiki/Three-way_comparison) is fairly common in high level programming languages, I'm not sure the hardware implementation would be beneficial in terms of performance or power consumption. Note that with the branch instruction the CPU is able to exploit branch prediction to hide the comparison latency while with a dedicated instruction the following instructions will just form a dependency chain. I don't know how cmov instructions are implemented, but there is opinion (http://yarchive.net/comp/linux/cmov.html) that these instructions are not very useful because of this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, this is my structure - buckets implemented as AVL trees.
Ad.2. In my case I need to compare two ints in order to determine place for them in AVL tree, so I can use either signed or unsigned comparison or any other strict ordering. But in other cases this may be important. Good catch.
Ad.3. In my case code calls comparison function as a callback, so in fact there are two comparison: first one to determine ration between two numbers (inside compare function), and then calling code performs 2nd one to interpret result of 1st one. Unfortunately C language is less inline-friendly, C++ is far more flexible there. I do not know how much this affects branch prediction.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page