12-16-2021 10:07 AM
Can somebody from Intel comment on what I'm seeing here? https://github.com/zlib-ng/zlib-ng/pull/1065 It sure seems that all mobile SKUs suffer a large performance degradation on 512 bit wide vectors (integer) while all desktop class get an appreciable benefit. I'm going crazy trying to figure out what's happening. The fast adler32 checksum code I've written is in that pull request, and it's handily beating ISA-L as well as IPP even without AVX512, but I'm seeing some really strange performance characteristics here for all the "client" class of CPUs suggesting that either power delivery limits are the issue (I tried messing with PL1 and PL2 with RAPL to no avail) or there's some weird bug in the micro architecture.