Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1098 Discussions

Bug in SDE emulation of AVX-512 _mm512_cmp_pd_mask?


The following intrinsic are incorrect?

  • MM256_CMP_PD
  • MM256_CMP_PS


#include <immintrin.h>
#include <math.h>
#include <stdio.h>

int main() {
  double aaa[] = {-3.200000, 99.378500,  89.770000, 65.000000,
                  NAN,       -88.654000, NAN,       0.000000};
  double bbb[] = {NAN,       15.600000, -6.200000, 2.000000,
                  41.200000, 14.000000, NAN,       -88.654000};
  __m512d a = _mm512_loadu_pd(aaa);
  __m512d b = _mm512_loadu_pd(bbb);
  __mmask8 x = _mm512_cmp_pd_mask(a, b, _CMP_NLT_US);
  printf("%u\n", x);


Should print 233. But SDE emulator print 142.

(compiled using Intel icx compiler and Intel sde emulate tigerlake)


icx -march=tigerlake -o test_cmp test_cmp.c
sde64 -tgl -- ./test_cmp


Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  20
  On-line CPU(s) list:   0-19
Vendor ID:               GenuineIntel
  Model name:            12th Gen Intel(R) Core(TM) i7-12700H
    CPU family:          6
    Model:               154
    Thread(s) per core:  2
    Core(s) per socket:  10
    Socket(s):           1
    Stepping:            3
    BogoMIPS:            5376.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm cons
                         tant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt t
                         sc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_sh
                         adow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xge
                         tbv1 xsaves umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm serialize flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
  Hypervisor vendor:     Microsoft
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   480 KiB (10 instances)
  L1i:                   320 KiB (10 instances)
  L2:                    12.5 MiB (10 instances)
  L3:                    24 MiB (1 instance)
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

0 Kudos
1 Reply

I copied your test case with GCC (10.1) as:

    % gcc cmppd.c -o cmppd -march=tigerlake

And I run it with Intel SDE version 9.7 (latest) and it printed 223.


0 Kudos