I did a test a performance my embedded board, which has a Xeon D 1.7GHz(8-cores, 12M cache) and DDR 32GB.
A test tool was Intel Optimized LINPACK Benchmark in MKL 2018. When running runme_xeon64, a result was about 145 GFLOPS.
I expected about 400GFLOPS or more. I would like to know the result is reasonable, and then how to record for maximum performance with the test tool.
For more complete information about compiler optimizations, see our Optimization Notice.