- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
simple question: what constraint does one use for mask registers in inline-asm? "k" does not work and "r" would be wrong.
More specific question: How does one efficiently test whether a mask returned from a __m512d compare is all true? ICC generates quite a lot of code for either the use of the _m512_kortestc intrinsic or a simple compare to 0xff. That's why I wanted to wrap this into a function that does the right thing via inline-asm, but without the constraint... (I still could make use of the constraint even if there is a good solution here that doesn't involve inline-asm.)
Cheers,
Matthias
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you saying that you cannot say:
kortest %k1, %k2
in your asm code? (or k3, k4, k5, k6, k7 but not k0)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On related note, is it possible to post a complete assembler example, with asm and everything ?
Thank you !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what I'm looking for:
[cpp] bool isFull(__mask8 k) { __mmask16 kk; asm("kmerge2l1l %[in],%[out]" : [out]"=k"(kk) : [in]"k"(k)); return _mm512_kortestc(kk, kk); } [/cpp]In inline-asm, the constraints are the "k" or "=k" that you put in the operands list. Using %k1 explicitely in the asm string works just fine, of course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Someone much more knowledgeable than I showed me how to do this:
[cpp]
bool isFull(__mmask8 k) {
__mmask16 kk;
asm("kmerge2l1l %[in],%[out]" : [out]"=k"(kk) : [in]"k"((__mmask16)k));
return _mm512_kortestc(kk, kk);
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ouch, alright this actually compiles. But the result is even worse than using the intrinsic for kmerge2l1l. Look at a __m512d compare with following isFull:
vcmpeqpd (%rsi),%zmm0,%k0
kmov %k0,%edx
movzbl %dl,%edx
kmov %edx,%k1
kmerge2l1l %k1,%k2
kmov %k2,%ecx
kmov %ecx,%k3
kortest %k3,%k3
What I want to have is this:
vcmpeqpd (%rsi),%zmm0,%k0
kmerge2l1l %k0,%k1
kortest %k1,%k1
It appears ICC won't let me do this. The problem is that ICC inserts the mov to GPR and back to mask register for any cast from __mmask8 to __mask16. Now, since even inline asm doesn't grok __mmask8 there's nothing I can do...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page