- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ravi,
You're actually measuring the time of sprintf() and rand()
since they are *much* slower than my_cmp64() or memcmp().
In case of memcmp() compiler does not call it at all -- it knows its result in advance.
That's why memcmp() is "faster".
I've tested my_cmp64() with the code below. It is about 10 time as fast as memcmp().
Please note the line:
char * volatile src = _src;
volatile makes compiler to really call my_cmp64() or memcmp() and not to optimize them out.
Call with, say, (src + 1) to test unaligned access.
===================================================
__inline int my_cmp64(const void* src_1, const void* src_2)
{
const __m256i* src1 = (const __m256i*)src_1;
const __m256i* src2 = (const __m256i*)src_2;
__m256i mm11 = _mm256_lddqu_si256(src1);
__m256i mm12 = _mm256_lddqu_si256(src1 + 1);
__m256i mm21 = _mm256_lddqu_si256(src2);
__m256i mm22 = _mm256_lddqu_si256(src2 + 1);
__m256i mm1 = _mm256_xor_si256(mm11, mm21);
__m256i mm2 = _mm256_xor_si256(mm12, mm22);
__m256i mm = _mm256_or_si256(mm1, mm2);
return !_mm256_testz_si256(mm, mm);
}
void test_cmp()
{
__declspec(align(64)) char _src[200];
char * volatile src = _src;
double time;
int i, result = 0;
for (i = 0; i < sizeof(_src); i++)
_src = i;
time = vx_time();
for (i = 0; i < 100 * 1000 * 1000; i++)
result += my_cmp64(src, src);
// result += memcmp(src, src, 64);
pl("time: %.3f, %d", vx_time(time), result);
}
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ravi,
(v0 != v1)
could be:
(_mm_movemask_epi8(_mm_cmpeq_epi8(v0, v1)) != 0xFFFF)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have finally run some benchmarks to compare my SIMD-based solution vs. bswap-based one.
TL;DR: bswap-based can be faster than SIMD but only when compiler has a good optimizer and you hit the fast path. Otherwise SIMD is faster.
I was experimenting on 16-byte data, which is what `boost::uuid` is. The following are the test subjects:
mem_equal, mem_less - The reference functions that use std::memcmp to compare the two UUIDs for equality and less.
simd_equal, simd_less - The functions that use SSE2 for comparing the UUIDs. This is the current implementation on Boost.UUID (and simd_less is equivalent to the one I provided in the comment #2).
reg_equal - The function that compares the UUIDs for equality by loading and comparing 2 pairs of uint64_t comrising the UUIDs data. The compilers use general redisters to implement that (I checked the disassembler).
reg_less, reg32_less - Use bswap instruction to swap the loaded words and compare for less order. reg_less uses 64-bit loads and bswap, reg32_less uses 32-bit ones (and consequently does twice as many comparisons). Basically, this is Vladimir's version.
All above functions receive two UUIDs and return bool as the result. Each test subject was tested 4 times:
- With 2 different UUIDs placed on the stack (and likely close to each other)
- With 2 different UUIDs placed on the heap
- With 2 different but equal UUIDs placed on the stack
- With 2 different but equal UUIDs placed on the heap
For each test, duration of 500000000 comparisons was measured, as well as an approximate number of ticks per one call (this I don't consider accurate). The hardware is a Sandy Bridge CPU. You can find the test code in attach.
Here are the results:
gcc 4.9.2, Linux, 64-bit target (g++ -g2 -O3 -I. -o uuid_operators uuid_operators.cpp)
Values placed on stack: mem_equal duration: 2909054936 ns, clocks per iteration: 54 simd_equal duration: 703210732 ns, clocks per iteration: 30 reg_equal duration: 701833340 ns, clocks per iteration: 30 mem_less duration: 2805026359 ns, clocks per iteration: 57 simd_less duration: 924337219 ns, clocks per iteration: 30 reg_less duration: 935014123 ns, clocks per iteration: 30 reg32_less duration: 704345068 ns, clocks per iteration: 27 Values placed on heap: mem_equal duration: 2803484224 ns, clocks per iteration: 64 simd_equal duration: 702860158 ns, clocks per iteration: 37 reg_equal duration: 700742185 ns, clocks per iteration: 30 mem_less duration: 2806557934 ns, clocks per iteration: 64 simd_less duration: 920703610 ns, clocks per iteration: 30 reg_less duration: 935851585 ns, clocks per iteration: 30 reg32_less duration: 704654313 ns, clocks per iteration: 30 Same values placed on stack: mem_equal duration: 1877397010 ns, clocks per iteration: 51 simd_equal duration: 700662250 ns, clocks per iteration: 27 reg_equal duration: 932869196 ns, clocks per iteration: 37 mem_less duration: 1882268763 ns, clocks per iteration: 51 simd_less duration: 925608825 ns, clocks per iteration: 37 reg_less duration: 822508496 ns, clocks per iteration: 30 reg32_less duration: 1289300793 ns, clocks per iteration: 33 Same values placed on heap: mem_equal duration: 1875647450 ns, clocks per iteration: 73 simd_equal duration: 700736792 ns, clocks per iteration: 30 reg_equal duration: 932544893 ns, clocks per iteration: 37 mem_less duration: 1889143565 ns, clocks per iteration: 58 simd_less duration: 922657593 ns, clocks per iteration: 37 reg_less duration: 825816472 ns, clocks per iteration: 30 reg32_less duration: 1292292925 ns, clocks per iteration: 55
gcc 4.9.2, Linux, 32-bit target (g++ -m32 -msse2 -g2 -O3 -I. -o uuid_operators32 uuid_operators.cpp)
Values placed on stack: mem_equal duration: 3003129535 ns, clocks per iteration: 55 simd_equal duration: 815852105 ns, clocks per iteration: 40 reg_equal duration: 957879657 ns, clocks per iteration: 33 mem_less duration: 2909670077 ns, clocks per iteration: 45 simd_less duration: 1351182395 ns, clocks per iteration: 36 reg_less duration: 1702336738 ns, clocks per iteration: 49 reg32_less duration: 816181695 ns, clocks per iteration: 33 Values placed on heap: mem_equal duration: 2910065687 ns, clocks per iteration: 55 simd_equal duration: 816033224 ns, clocks per iteration: 33 reg_equal duration: 958156388 ns, clocks per iteration: 33 mem_less duration: 2909768581 ns, clocks per iteration: 45 simd_less duration: 1351213470 ns, clocks per iteration: 36 reg_less duration: 1702484718 ns, clocks per iteration: 45 reg32_less duration: 816183720 ns, clocks per iteration: 33 Same values placed on stack: mem_equal duration: 2544227736 ns, clocks per iteration: 48 simd_equal duration: 815765735 ns, clocks per iteration: 40 reg_equal duration: 1381228041 ns, clocks per iteration: 33 mem_less duration: 2472541973 ns, clocks per iteration: 55 simd_less duration: 1351399333 ns, clocks per iteration: 36 reg_less duration: 2419782306 ns, clocks per iteration: 48 reg32_less duration: 1456508305 ns, clocks per iteration: 39 Same values placed on heap: mem_equal duration: 2536636251 ns, clocks per iteration: 48 simd_equal duration: 815715163 ns, clocks per iteration: 37 reg_equal duration: 1381799951 ns, clocks per iteration: 33 mem_less duration: 2474383405 ns, clocks per iteration: 55 simd_less duration: 1351475715 ns, clocks per iteration: 36 reg_less duration: 2419964420 ns, clocks per iteration: 48 reg32_less duration: 1451296170 ns, clocks per iteration: 36
gcc 4.4.7, Linux, 64-bit target (g++-4.4 -g2 -O3 -I. -o uuid_operators-4.4 uuid_operators.cpp)
Values placed on stack: mem_equal duration: 6495902852 ns, clocks per iteration: 63 simd_equal duration: 701597240 ns, clocks per iteration: 27 reg_equal duration: 704913667 ns, clocks per iteration: 27 mem_less duration: 6490310490 ns, clocks per iteration: 63 simd_less duration: 941798322 ns, clocks per iteration: 37 reg_less duration: 700693695 ns, clocks per iteration: 30 reg32_less duration: 703242622 ns, clocks per iteration: 178 Values placed on heap: mem_equal duration: 6377764566 ns, clocks per iteration: 70 simd_equal duration: 703149156 ns, clocks per iteration: 27 reg_equal duration: 702795927 ns, clocks per iteration: 34 mem_less duration: 6505660554 ns, clocks per iteration: 63 simd_less duration: 938937460 ns, clocks per iteration: 30 reg_less duration: 701984104 ns, clocks per iteration: 30 reg32_less duration: 701489491 ns, clocks per iteration: 21 Same values placed on stack: mem_equal duration: 10366218787 ns, clocks per iteration: 97 simd_equal duration: 702945481 ns, clocks per iteration: 30 reg_equal duration: 932854987 ns, clocks per iteration: 37 mem_less duration: 10573932471 ns, clocks per iteration: 97 simd_less duration: 942492857 ns, clocks per iteration: 30 reg_less duration: 1051130774 ns, clocks per iteration: 36 reg32_less duration: 1058324348 ns, clocks per iteration: 42 Same values placed on heap: mem_equal duration: 10366752789 ns, clocks per iteration: 97 simd_equal duration: 700994832 ns, clocks per iteration: 30 reg_equal duration: 932596190 ns, clocks per iteration: 37 mem_less duration: 10558350284 ns, clocks per iteration: 90 simd_less duration: 939857638 ns, clocks per iteration: 30 reg_less duration: 1059823697 ns, clocks per iteration: 40 reg32_less duration: 1064250646 ns, clocks per iteration: 39
gcc 4.4.7, Linux, 32-bit target (g++-4.4 -m32 -msse2 -g2 -O3 -I. -o uuid_operators32-4.4 uuid_operators.cpp)
Values placed on stack: mem_equal duration: 7035034313 ns, clocks per iteration: 54 simd_equal duration: 814800509 ns, clocks per iteration: 30 reg_equal duration: 1051117502 ns, clocks per iteration: 28 mem_less duration: 7372365518 ns, clocks per iteration: 72 simd_less duration: 1270579133 ns, clocks per iteration: 36 reg_less duration: 1648952515 ns, clocks per iteration: 36 reg32_less duration: 787097322 ns, clocks per iteration: 30 Values placed on heap: mem_equal duration: 6922481102 ns, clocks per iteration: 69 simd_equal duration: 814379042 ns, clocks per iteration: 30 reg_equal duration: 1050458651 ns, clocks per iteration: 30 mem_less duration: 7271476652 ns, clocks per iteration: 79 simd_less duration: 1269998969 ns, clocks per iteration: 30 reg_less duration: 1634621334 ns, clocks per iteration: 43 reg32_less duration: 788067392 ns, clocks per iteration: 30 Same values placed on stack: mem_equal duration: 10531383059 ns, clocks per iteration: 100 simd_equal duration: 814549940 ns, clocks per iteration: 30 reg_equal duration: 1507881322 ns, clocks per iteration: 30 mem_less duration: 10893784524 ns, clocks per iteration: 106 simd_less duration: 1270398106 ns, clocks per iteration: 30 reg_less duration: 2155156482 ns, clocks per iteration: 60 reg32_less duration: 1477907964 ns, clocks per iteration: 40 Same values placed on heap: mem_equal duration: 10530000955 ns, clocks per iteration: 97 simd_equal duration: 814384822 ns, clocks per iteration: 37 reg_equal duration: 1582379702 ns, clocks per iteration: 61 mem_less duration: 10892357714 ns, clocks per iteration: 96 simd_less duration: 1269953983 ns, clocks per iteration: 30 reg_less duration: 2155506224 ns, clocks per iteration: 60 reg32_less duration: 1478063429 ns, clocks per iteration: 33
clang 3.6.0, Linux, 64-bit target (clang -stdlib=libstdc++ -g2 -O3 -I. -o uuid_operators-clang uuid_operators.cpp -lstdc++)
Values placed on stack: mem_equal duration: 2915831378 ns, clocks per iteration: 57 simd_equal duration: 702978963 ns, clocks per iteration: 27 reg_equal duration: 932846599 ns, clocks per iteration: 30 mem_less duration: 2801806046 ns, clocks per iteration: 64 simd_less duration: 903058419 ns, clocks per iteration: 30 reg_less duration: 935835507 ns, clocks per iteration: 33 reg32_less duration: 931503301 ns, clocks per iteration: 33 Values placed on heap: mem_equal duration: 2801707798 ns, clocks per iteration: 54 simd_equal duration: 699887194 ns, clocks per iteration: 30 reg_equal duration: 931113078 ns, clocks per iteration: 30 mem_less duration: 2798621990 ns, clocks per iteration: 64 simd_less duration: 900708358 ns, clocks per iteration: 37 reg_less duration: 931509307 ns, clocks per iteration: 33 reg32_less duration: 932174809 ns, clocks per iteration: 40 Same values placed on stack: mem_equal duration: 1869707957 ns, clocks per iteration: 51 simd_equal duration: 732869633 ns, clocks per iteration: 27 reg_equal duration: 698069511 ns, clocks per iteration: 37 mem_less duration: 1861519606 ns, clocks per iteration: 48 simd_less duration: 894058684 ns, clocks per iteration: 37 reg_less duration: 831131467 ns, clocks per iteration: 30 reg32_less duration: 949342085 ns, clocks per iteration: 30 Same values placed on heap: mem_equal duration: 1861275241 ns, clocks per iteration: 58 simd_equal duration: 698240258 ns, clocks per iteration: 30 reg_equal duration: 700417340 ns, clocks per iteration: 30 mem_less duration: 1885464546 ns, clocks per iteration: 58 simd_less duration: 894038482 ns, clocks per iteration: 30 reg_less duration: 829288790 ns, clocks per iteration: 30 reg32_less duration: 930727650 ns, clocks per iteration: 30
The following were run on a virtual machine running on the same hardware Linux host. Do not compare timings to the above results.
MSVC14 (VS2015), Windows, 64-bit target (cl -EHsc -FAcs -I. -O2 uuid_operators.cpp)
Values placed on stack: mem_equal duration: 1132779724 ns, clocks per iteration: 91 simd_equal duration: 1051041682 ns, clocks per iteration: 24 reg_equal duration: 1324786530 ns, clocks per iteration: 91 mem_less duration: 2401681498 ns, clocks per iteration: 169 simd_less duration: 1137397350 ns, clocks per iteration: 115 reg_less duration: 1022363456 ns, clocks per iteration: 24 reg32_less duration: 823214402 ns, clocks per iteration: 85 Values placed on heap: mem_equal duration: 1279553406 ns, clocks per iteration: 160 simd_equal duration: 995784101 ns, clocks per iteration: 24 reg_equal duration: 1071507691 ns, clocks per iteration: 82 mem_less duration: 2537630620 ns, clocks per iteration: 166 simd_less duration: 1107802527 ns, clocks per iteration: 21 reg_less duration: 978329089 ns, clocks per iteration: 30 reg32_less duration: 924786809 ns, clocks per iteration: 87 Same values placed on stack: mem_equal duration: 922507748 ns, clocks per iteration: 91 simd_equal duration: 939009846 ns, clocks per iteration: 93 reg_equal duration: 969571272 ns, clocks per iteration: 117 mem_less duration: 2920631812 ns, clocks per iteration: 300 simd_less duration: 1396546767 ns, clocks per iteration: 33 reg_less duration: 1357923143 ns, clocks per iteration: 90 reg32_less duration: 1421889653 ns, clocks per iteration: 196 Same values placed on heap: mem_equal duration: 868460935 ns, clocks per iteration: 133 simd_equal duration: 973166142 ns, clocks per iteration: 67 reg_equal duration: 1060500426 ns, clocks per iteration: 27 mem_less duration: 2873302053 ns, clocks per iteration: 360 simd_less duration: 1114337995 ns, clocks per iteration: 106 reg_less duration: 1426126784 ns, clocks per iteration: 27 reg32_less duration: 1326662187 ns, clocks per iteration: 121
MSVC14 (VS2015), Windows, 32-bit target (cl -EHsc -FAcs -O2 -I. uuid_operators.cpp)
Values placed on stack: mem_equal duration: 1097685040 ns, clocks per iteration: 139 simd_equal duration: 899783352 ns, clocks per iteration: 52 reg_equal duration: 948021047 ns, clocks per iteration: 96 mem_less duration: 1709203544 ns, clocks per iteration: 77 simd_less duration: 1676794955 ns, clocks per iteration: 76 reg_less duration: 1515213805 ns, clocks per iteration: 130 reg32_less duration: 1186983541 ns, clocks per iteration: 66 Values placed on heap: mem_equal duration: 1024332142 ns, clocks per iteration: 90 simd_equal duration: 1310634172 ns, clocks per iteration: 61 reg_equal duration: 957639308 ns, clocks per iteration: 228 mem_less duration: 1338495255 ns, clocks per iteration: 85 simd_less duration: 1698868990 ns, clocks per iteration: 55 reg_less duration: 1337584804 ns, clocks per iteration: 91 reg32_less duration: 1092318437 ns, clocks per iteration: 91 Same values placed on stack: mem_equal duration: 2074361685 ns, clocks per iteration: 87 simd_equal duration: 930216829 ns, clocks per iteration: 45 reg_equal duration: 1460527245 ns, clocks per iteration: 136 mem_less duration: 1961324972 ns, clocks per iteration: 87 simd_less duration: 1751332920 ns, clocks per iteration: 46 reg_less duration: 1815786363 ns, clocks per iteration: 108 reg32_less duration: 1760446090 ns, clocks per iteration: 152 Same values placed on heap: mem_equal duration: 1880232543 ns, clocks per iteration: 84 simd_equal duration: 851140577 ns, clocks per iteration: 91 reg_equal duration: 1362192401 ns, clocks per iteration: 130 mem_less duration: 2043999726 ns, clocks per iteration: 224 simd_less duration: 1761549303 ns, clocks per iteration: 64 reg_less duration: 1924742111 ns, clocks per iteration: 136 reg32_less duration: 1985236671 ns, clocks per iteration: 151
I have also run a few tests on a Nehalem machine but the general disposition of the results was similar.
Analysis:
- reg_equal does not perform faster than simd_equal, and particularly in 32-bit tests it performs worse.
- The 64-bit bswap has greater latency than 32-bit, and on some CPUs also the worse throughput. I believe that's the reason why in most 64-bit test results reg_less performed slower than reg32_less.
- reg_less and reg32_less performance largely depends on the compiler. gcc and MSVC can optimize reg_less to the point where it is close or slightly faster than simd_less, but only when the fast path is used (i.e. when the function returns after the first comparison). However that is not the case with clang (I suspect, this is because of inefficient code ordering - it generated a single ret at the end of the function, with conditional jumps to it from the middle, whereas gcc produced ret right after the first condition).
- reg32_less typically performs better than simd_less but only when you hit the fast path. Most compilers produced the code where the fast path was when the input UUIDs are different. When UUIDs are the same the performance drops below to that of simd_less. reg_less has the same property but to a lesser extent since the difference between the fast and slow paths is less pronounced.
- For 32-bit targets reg_less is generally slower than reg32_less because the compiler still loads 8 bytes of data from each UUID and performs 4 bswaps per comparison, even if the comparison fails. This reduces the effect of the fast path compared to reg32_less.
In general I'd say simd_less showed more consistent results across the board. As much as reg32_less can be faster, it can also be slower, so it's not a clear cut. reg_less has more stable performance but it is only ever beneficial on 64-bit targets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Андрей,
I understand this suggestions are too tricky, nevertheless:
1. You can first compare only the first bytes. If they are different (with probability 255/256) in case of "equal" that's it.
2. You can set your own rule for "less" and don't swap bytes at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. You can first compare only the first bytes. If they are different (with probability 255/256) in case of "equal" that's it.
Yes, I tried that for the simd_less algorithm. Indeed, this fast path speeds up the case when the input UUIDs are different (in this case it outperforms reg_less and reg32_less and is the fastest). But the branch kills performance when the slow path is taken - it becomes slightly slower than reg_less (and much slower than the original simd_less as well). I think the additional branch is taking too much clocks to process, especially in case of mispredictions.
2. You can set your own rule for "less" and don't swap bytes at all.
Yes, that would be ideal, and I would have done that if I was not constrained with the legacy behavior which used std::lexicographical_compare. Enabling optimizations should not change the behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. Average time with random UUIDs is:
time = (time_1*255 + time_not_1*1) / 256
so, time_not_1 is negligible.
2. There's an old method to cope with the problem: add "_" prefix to the names.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ravi,
In re-reviewing this thread, and taking into consideration your response to my first reply ("100% of comparisons are for equality (i.e. check for "== 0") in our code").... there comes to mind some additional alternatives.
Assuming your key comes in as an unknown and your function is to see if the unknown key exists in a list of known keys. A possible technique is in knowing the characteristic pattern of the unknown input key and known stored keys is to determine an optimal number of bytes per key to compare such that multiple keys can be packed into a 256-bit entry of a search table. Example: assuming 4 bytes of the incoming keys vary the most, then you would pack 8 copies of the unknown key into each int32 lane of the ymm register used for search, then each read of the condensed table can perform 8 tests at once. Your first level table would have no duplicates. You would have to work something out for handling duplicates when a hit occurs and you go on to the second 4 bytes (or more bytes) for the second tier table. You have representative data, so you can write a program to determine the optimal packing (including ten 3-byte key sub-fields, or ten 3-byte plus one 2-byte).
This would necessitate NOT using something like cmp32, cmp64 etc, and replace it with find_entry(key_t* unknown) which returns -1 for not found or index of key when found. Note, if you want to put extra effort in, the class/struct could determine the optimal keytable sub-key selection (1 to 32 bytes, extended to 64 bytes for KNC/KNL/AVX512)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim for your inputs. Thanks andysem/vladimir for reporting extensive test results on your implementation. My case is primarily for flow (layer-2 mac + layer-3 ipv4/ipv6 + layer-4 tcp/udp ports) and i think perf will benefit using simd.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vladimir Sedach wrote:
1. Average time with random UUIDs is:
time = (time_1*255 + time_not_1*1) / 256
so, time_not_1 is negligible.
That is true, as long as you expect most of the UUIDs to be different. That's probably a fair expectation, but I'm not sure it's justified enough to code it into a generic component, such as 'uuid'.
I tested the modified version of simd_less with a fast path in a few of my projects. Adding a branch for the fast path often results in compiler splitting the ordering function into two parts - the fast path, which gets inlined, and the rest, which is not inlined. This makes the difference between the fast and slow paths even mode significant. The branchless version always gets inlined completely.
Vladimir Sedach wrote:
2. There's an old method to cope with the problem: add "_" prefix to the names.
Unfortunately, that's not an option for me. The operation is actually spelled as 'operator<' and is required to be equivalent to lexicographical_compare. Custom comparison functions are possible, of course, but my aim is to optimize the generic library implementation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrey,
1. Could you use
#define BOOST_..._EXPECT_DIFFERENT
2. Perhaps also
#define BOOST_..._NON_LEXICOGRAPHICAL_COMPARE
before #include to allow the user to choose standard/nonstandard approach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, config macros are generally evil as they can easily lead to ODR violations and other subtle issues.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »