Solved: Question about CRC32 instruction polynomial

levicki · ‎05-27-2008

I was wondering if someone from the CPU design team could tell me why the upcoming SSE 4.2 instruction CRC32 seems to support only fixed Castagnoli polynomial?

Supporting at least CRC-32-IEEE 802.3 polynomial (0x04C11DB7) would be advantageous (MPEG-2, PNG, V.42, etc).

When we are at it, there is an error in document order #253666-026 on page 258 where Castagnoli polynomial value has one extra "1" at the beginning making it a 36-bit number.

Steven_K_Intel · ‎09-30-2008

I worked in the lab team that proposed the CRC32 instruction for Nehalem.CRC32usesa fixed polynomial to make the instruction a stateless two operand instruction,minimizeinstruction latency andminimize gate count.Castagnoli showed that thispolynomial has nice mathematical properties that make it optimal vs. otherchoices for data sets less than 64KB and verynearly optimalinother cases. This poly is used for some networking protocols such as RDMA and SCTP that are difficult to offload in hardware.

The33-bitpolynomial with the leading 1 is the mathematically correct form, though it's common to omit the implied upper '1' bit.

Hope this answers your question.

View solution in original post

SHIH_K_Intel · ‎05-28-2008

My understanding of the intent for CRC32 implementing CRC32-C is to accelerate software stack in the upper layers of the data protocol traffic. I don't know whether CRC32-C or CRC-IEEE 802.3 apply to broader swaths of data traffic. But it looks like 802.3 standards are more relevant to the physical layer.
Although I'm not privy to the discussions that decided our CRC32, or whether 802.3 were a factor in those discussions. It seems intuitive to me that CPU folks might be more interested in adding CPU capability to accelerate software managing the upper layers of the protocol traffic.

As to the 33-bit polynomial constant for CRC32, it seems there are folks who refer to only the low 32-bits, since the MSB (bit 32) is always 1 andcan be implied. But it is also common for other folks to refer to the full 33 bits of the polynomial constant.

levicki · ‎05-28-2008

As for additional bit, I haven't seen 33-bit constants defined in code so far, everyone uses 32-bit constants. That is why I asked. I also wrote 36-bit because I think in nibbles :-)

As for not supporting other common polynomials, my guess is that CRC32 tables for different polynomials would use too much real estate inside of a CPU core, but I was hoping to hear official answer (if there is one) as to why that instruction wasn't made to be more flexible?

It also wouldn't hurt if we got MD5 instruction as well.

Steven_K_Intel · ‎09-30-2008

I worked in the lab team that proposed the CRC32 instruction for Nehalem.CRC32usesa fixed polynomial to make the instruction a stateless two operand instruction,minimizeinstruction latency andminimize gate count.Castagnoli showed that thispolynomial has nice mathematical properties that make it optimal vs. otherchoices for data sets less than 64KB and verynearly optimalinother cases. This poly is used for some networking protocols such as RDMA and SCTP that are difficult to offload in hardware.

The33-bitpolynomial with the leading 1 is the mathematically correct form, though it's common to omit the implied upper '1' bit.

Hope this answers your question.

levicki · ‎10-06-2008

Thanks for the detailed answer!

doug_mtview · ‎08-14-2009

CRC32c has been adopted by newer protocols that are attempting to be Jumbo frame compliant. The CRC-IEEE 802.3 polynomal has a smaller hamming distance where error detection rates drop off beyond the 12,176 bits (1522 bytes) indicated by the maximum frame size set in 802.3 standards. The CRC used in the Ethernet physical layer induces little need for software to replicate, since it is generally considered inadaquate for Jumbo frames. Protocols like SCTP, iSCSI, and RDMA use the CRC32c polynomial instead, because of its improved hamming distance. Intel offers 1Gb NIC (82576) and 10Gb NIC (X520) that off-load SCTP checksum calculations, and i7-core processors have the CRC32c function contained within the SSE4 math coprocessor.

There is a serious problem with TCP or UDP protocols related to their simple checksums. Errors caused by defective bus drivers of NICs or Memory modules produce repeditive errors for the same bit. Simple checksums allow these errors to be self cancelling, and causes a failure to detect rate approaching 2%!

Since much of the hardware being used on the Internet is controlled and maintained by a third-party, one defective device might produce a significant reduction in data integrity beyond the control of the user. Protocols like SCTP can better handle Jumbo frames, and detect errors generated at various bus interfaces. For most software applications, CRC-IEEE should not be used. CRC32c should be used instead.