Solved: TensorFlow numerical risk with OneDNN setting

shuoli · ‎08-19-2022

Hi,

We are excited about teh potential performance gain on TensorFlow with OneDNN Setting. However, we are trying to evaluate the risk of numerical difference of using OneDNN.

Does OneDNN guarantee deterministic result for matrix multiplication, convolution, pooling and batch normalization operations for floating numbers, in both single-thread and multi-thread environment? We observed that some other library like libeigen produces different results for matrix multiplication on Intel IceLake CPU vs. earlier generations CPU because of L1 cache size difference. Does Tensorflow with OneDNN setting have the same problem?

Based on https://github.com/oneapi-src/oneDNN/issues/789, it seems OneDNN doesn’t support features such as CNR in MKL, is it still true?

Ying_H_Intel · ‎08-24-2022

Hi !

Thank you a lot to sharing the good result : about potential performance gain on TensorFlow with OneDNN Setting

Regarding the numerical consistent issue, you are right, it is related to ISA, instruction execution order and floating point computation nature. MKL have environment variables to control the consistent result on same machine with fixed instruction execution order. oneDNN is based on JIT, so still haven't such feature.

Only CPU Dispatcher Control — oneDNN v2.7.0 documentation (oneapi-src.github.io) may help someway about aligning the instruction between different generation machines.

Thanks

Ying H.

Intel AI Support

View solution in original post

Rahila_T_Intel · ‎08-24-2022

Hi,

Thank you for posting in Intel Communities.

We are working on this internally and will share you the updates.

Thanks

Ying_H_Intel · ‎08-24-2022

Hi !

Thank you a lot to sharing the good result : about potential performance gain on TensorFlow with OneDNN Setting

Regarding the numerical consistent issue, you are right, it is related to ISA, instruction execution order and floating point computation nature. MKL have environment variables to control the consistent result on same machine with fixed instruction execution order. oneDNN is based on JIT, so still haven't such feature.

Only CPU Dispatcher Control — oneDNN v2.7.0 documentation (oneapi-src.github.io) may help someway about aligning the instruction between different generation machines.

Thanks

Ying H.

Intel AI Support