<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Ying, thanks for the quick in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182043#M29392</link>
    <description>&lt;P&gt;Hi Ying, thanks for the quick response, appreciate that.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I used one thread to run the models:&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR style="font-size: 13.008px;" /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; tensorflow::SessionOptions sess_opts;&lt;/SPAN&gt;&lt;BR style="font-size: 13.008px;" /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sess_opts.config.set_intra_op_parallelism_threads(1);&lt;/SPAN&gt;&lt;BR style="font-size: 13.008px;" /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sess_opts.config.set_inter_op_parallelism_threads(1);&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I used GCC to compile tensorflow. I used tensorflow/contrib/makefile with some modifications, mainly adding these defines "&lt;SPAN style="color: rgb(3, 47, 98); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;-DINTEL_MKL -DINTEL_MKL_ML -DEIGEN_USE_MKL_ALL -DMKL_DIRECT_CAL -DEIGEN_DONT_PARALLELIZE"&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;I didn't use bazel because I need a static library.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I just realized that DINTEL_MKL_ML will cause it to use the version from mkl instead of mkldnn, which tensorflow is supposed&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;to use. I removed it and got much worse peformance on&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;Mobilenet_v2_1_4_22 (I think it's mainly caused by &lt;/SPAN&gt;_MklFusedBatchNorm, the mkldnn version is way too slow&lt;SPAN style="font-size: 12px;"&gt;).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Anyway, I run tensorflow benchmark_model to get the logs for you:&lt;BR /&gt;
	-&amp;nbsp;&lt;/SPAN&gt;benchmark_model --graph=testdata/mobilenet_v1_1.0_224_quant_frozen.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,224,224,3 --output_layer=MobilenetV1/Predictions/Reshape_1 --num_threads=1&lt;BR /&gt;
	&lt;SPAN style="font-size: 12px;"&gt;-&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;benchmark_model --graph=testdata/mobilenet_v2_1.4_224_frozen.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,224,224,3 --output_layer=MobilenetV2/Predictions/Reshape_1 --num_threads=1&lt;/SPAN&gt;&lt;BR /&gt;
	- benchmark_model --graph=testdata/ssd_mobilenet_v2_coco_2018_03_29_frozen.pb --show_flops --input_layer=image_tensor --input_layer_type=uint8 --input_layer_shape=1,1920,1080,3 --output_layer=num_detections,detection_classes,detection_scores,detection_boxes --num_threads=1&lt;/P&gt;

&lt;P&gt;It loooks me the main culprit is op&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;Conv2D (replaced by&amp;nbsp;_MklConv2D and&amp;nbsp;_MklConv2DWithBias using MKL?)&lt;/SPAN&gt;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Conv2D&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;_MklConv2D&amp;nbsp; &amp;nbsp; &amp;nbsp; _MklConv2DWithBias&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;mobilenet_v1_1.0_224_quant&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;19.303&amp;nbsp;ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;24.379 ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;7.905 ms&lt;BR /&gt;
	mobilenet_v2_1.4_224&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 24.969&amp;nbsp;ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;41.942 ms&lt;BR /&gt;
	ssd_mobilenet_v2_coco&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 108.692 ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;48.872 ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;143.936 ms&lt;/P&gt;</description>
    <pubDate>Mon, 07 May 2018 23:42:16 GMT</pubDate>
    <dc:creator>Liu__Chao</dc:creator>
    <dc:date>2018-05-07T23:42:16Z</dc:date>
    <item>
      <title>Tensorflow performance w/ MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182041#M29390</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am trying to use tensorflow-1.8.0 compiled with MKL-2018.2.199 enabled. I use it to run mobilenet image classification and obj detection models. I compared the performance w/ MKL and w/o MKL. In general, w/ MKL is much slower in most cases. I am posting here to see whether I did sth. wrong or this is what I should expect..&lt;/P&gt;

&lt;P&gt;All the following comparison numbers were collected from running the corresponding inference models on an i7-5557U CPU. I also run the tests on other CPUs and got similar results. NOTE: the time in 1-4 is per 16 frames. 5-6 is per a 320x180 frame.&lt;/P&gt;

&lt;P&gt;1. Mobilenet_v2_1_4_224&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;w/ MKL&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;1463 ms&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp; w/o MKL&amp;nbsp;&amp;nbsp;2486&amp;nbsp;ms (this is good)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;2.&amp;nbsp;Mobilenet_v2_1_0_96&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;w/ MKL&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;481 ms&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp; &amp;nbsp; w/o MKL&amp;nbsp; 276 ms (~1 time slower!)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;3.&amp;nbsp;Mobilenet_v1_1_0_224_quant&amp;nbsp; &amp;nbsp; &amp;nbsp; w/ MKL&amp;nbsp; 903 ms&amp;nbsp; &amp;nbsp; w/o MKL&amp;nbsp; 664 ms (~50% slower)&lt;/P&gt;

&lt;P&gt;4. Mobilenet_v1_1_0_128_quant&amp;nbsp; &amp;nbsp; &amp;nbsp; w/ MKL&amp;nbsp; 469 ms&amp;nbsp; &amp;nbsp; w/o MKL&amp;nbsp; 233 ms&amp;nbsp; (~1 time slower)&lt;/P&gt;

&lt;P&gt;5.&amp;nbsp;ssd_mobilenet_v1_coco&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; w/ MKL&amp;nbsp; 142 ms&amp;nbsp; &amp;nbsp; w/o MKL&amp;nbsp; 116 ms&lt;/P&gt;

&lt;P&gt;6.&amp;nbsp;ssd_mobilenet_v2_coco&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; w/ MKL&amp;nbsp; 212 ms&amp;nbsp; &amp;nbsp; &amp;nbsp;w/o MKL 130 ms&lt;/P&gt;

&lt;P&gt;I used "-DINTEL_MKL -DINTEL_MKL_ML -DEIGEN_USE_MKL_ALL -DMKL_DIRECT_CALL -march=native -mtune=native" to compile tensorflow .&lt;/P&gt;

&lt;P&gt;You can find the code &lt;A href="https://github.com/YijinLiu/tf-cpu"&gt;here&lt;/A&gt;. The benchmark data is &lt;A href="https://github.com/YijinLiu/tf-cpu/blob/master/benchmark/classify.cc#L405"&gt;here&lt;/A&gt; and &lt;A href="https://github.com/YijinLiu/tf-cpu/blob/master/benchmark/obj_detect.cc#L457"&gt;here&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Sun, 06 May 2018 06:58:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182041#M29390</guid>
      <dc:creator>Liu__Chao</dc:creator>
      <dc:date>2018-05-06T06:58:58Z</dc:date>
    </item>
    <item>
      <title>Hi Yjl,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182042#M29391</link>
      <description>&lt;P&gt;Hi Yjl,&lt;BR /&gt;
	&lt;BR /&gt;
	​Thank you for the reports.&amp;nbsp;As i understand with mkl, most of test has&amp;nbsp;performance issues.&amp;nbsp;&amp;nbsp;Let's focus on&amp;nbsp;three of them , for example1,&amp;nbsp;&amp;nbsp;3.&amp;nbsp; and 6.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;1. Could you please do export MKL_VERBOSE=1&amp;nbsp;&amp;nbsp; and run them, copy the result here.&lt;BR /&gt;
	​2. export MKLDNN_VERBOSE=1&amp;nbsp; and run them&amp;nbsp; and copy the result here.&lt;/P&gt;

&lt;P&gt;then let's consider the build processing and configuration.&lt;/P&gt;

&lt;P&gt;1. How do you compile the tensorflow and run the benchmark.&amp;nbsp; Could you elaborate&amp;nbsp;the steps of your build&amp;nbsp;tensorflow&amp;nbsp;&amp;nbsp;&amp;nbsp;, for example, ​I suppose you are using GNU GCC compiler, right?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/intel-optimized-tensorflow-installation-guide" target="_blank"&gt;https://software.intel.com/en-us/articles/intel-optimized-tensorflow-installation-guide&lt;/A&gt;&lt;BR /&gt;
	&lt;A href="https://ai.intel.com/tensorflow-optimizations-intel-xeon-scalable-processor/" target="_blank"&gt;https://ai.intel.com/tensorflow-optimizations-intel-xeon-scalable-processor/&lt;/A&gt;&lt;BR /&gt;
	&lt;BR /&gt;
	​2. how do you run the benchmark.&amp;nbsp; like How do you do the threading setting?&amp;nbsp;&lt;BR /&gt;
	&lt;A href="https://www.tensorflow.org/performance/performance_guide" target="_blank"&gt;https://www.tensorflow.org/performance/performance_guide&lt;/A&gt;&lt;BR /&gt;
	&lt;BR /&gt;
	Best Regards&lt;/P&gt;

&lt;P&gt;​Ying&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 07:03:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182042#M29391</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-05-07T07:03:35Z</dc:date>
    </item>
    <item>
      <title>Hi Ying, thanks for the quick</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182043#M29392</link>
      <description>&lt;P&gt;Hi Ying, thanks for the quick response, appreciate that.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I used one thread to run the models:&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR style="font-size: 13.008px;" /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; tensorflow::SessionOptions sess_opts;&lt;/SPAN&gt;&lt;BR style="font-size: 13.008px;" /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sess_opts.config.set_intra_op_parallelism_threads(1);&lt;/SPAN&gt;&lt;BR style="font-size: 13.008px;" /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sess_opts.config.set_inter_op_parallelism_threads(1);&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I used GCC to compile tensorflow. I used tensorflow/contrib/makefile with some modifications, mainly adding these defines "&lt;SPAN style="color: rgb(3, 47, 98); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;-DINTEL_MKL -DINTEL_MKL_ML -DEIGEN_USE_MKL_ALL -DMKL_DIRECT_CAL -DEIGEN_DONT_PARALLELIZE"&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;I didn't use bazel because I need a static library.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I just realized that DINTEL_MKL_ML will cause it to use the version from mkl instead of mkldnn, which tensorflow is supposed&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;to use. I removed it and got much worse peformance on&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;Mobilenet_v2_1_4_22 (I think it's mainly caused by &lt;/SPAN&gt;_MklFusedBatchNorm, the mkldnn version is way too slow&lt;SPAN style="font-size: 12px;"&gt;).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Anyway, I run tensorflow benchmark_model to get the logs for you:&lt;BR /&gt;
	-&amp;nbsp;&lt;/SPAN&gt;benchmark_model --graph=testdata/mobilenet_v1_1.0_224_quant_frozen.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,224,224,3 --output_layer=MobilenetV1/Predictions/Reshape_1 --num_threads=1&lt;BR /&gt;
	&lt;SPAN style="font-size: 12px;"&gt;-&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;benchmark_model --graph=testdata/mobilenet_v2_1.4_224_frozen.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,224,224,3 --output_layer=MobilenetV2/Predictions/Reshape_1 --num_threads=1&lt;/SPAN&gt;&lt;BR /&gt;
	- benchmark_model --graph=testdata/ssd_mobilenet_v2_coco_2018_03_29_frozen.pb --show_flops --input_layer=image_tensor --input_layer_type=uint8 --input_layer_shape=1,1920,1080,3 --output_layer=num_detections,detection_classes,detection_scores,detection_boxes --num_threads=1&lt;/P&gt;

&lt;P&gt;It loooks me the main culprit is op&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;Conv2D (replaced by&amp;nbsp;_MklConv2D and&amp;nbsp;_MklConv2DWithBias using MKL?)&lt;/SPAN&gt;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Conv2D&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;_MklConv2D&amp;nbsp; &amp;nbsp; &amp;nbsp; _MklConv2DWithBias&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;mobilenet_v1_1.0_224_quant&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;19.303&amp;nbsp;ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;24.379 ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;7.905 ms&lt;BR /&gt;
	mobilenet_v2_1.4_224&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 24.969&amp;nbsp;ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;41.942 ms&lt;BR /&gt;
	ssd_mobilenet_v2_coco&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 108.692 ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;48.872 ms&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;143.936 ms&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 23:42:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182043#M29392</guid>
      <dc:creator>Liu__Chao</dc:creator>
      <dc:date>2018-05-07T23:42:16Z</dc:date>
    </item>
    <item>
      <title>More information: </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182044#M29393</link>
      <description>&lt;P&gt;More information:&amp;nbsp;&lt;BR /&gt;
	I don't think it's related to how I built tensorflow. I run&lt;BR /&gt;
	&lt;EM&gt;&lt;SPAN style="font-size: 1em;"&gt;bazel run --config=mkl --config=opt --config=monolithic&amp;nbsp;&lt;/SPAN&gt;&lt;/EM&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;EM&gt;//tensorflow/tools/benchmark:benchmark_model&lt;/EM&gt;&lt;BR /&gt;
	and got similar results. The interesting thing is that&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;EM&gt;OMP_NUM_THREADS=1 bazel-bin/tensorflow/tools/benchmark/benchmark_model&lt;/EM&gt;&lt;BR /&gt;
	is two times faster than&lt;BR /&gt;
	&lt;EM&gt;&lt;SPAN style="font-size: 13.008px;"&gt;bazel-bin/tensorflow/tools/benchmark/benchmark_model&lt;/SPAN&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Again, all these tests were run on a&amp;nbsp;i7-5557U CPU.&lt;/P&gt;</description>
      <pubDate>Tue, 08 May 2018 19:10:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182044#M29393</guid>
      <dc:creator>Liu__Chao</dc:creator>
      <dc:date>2018-05-08T19:10:55Z</dc:date>
    </item>
    <item>
      <title>Hi Yjl,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182045#M29394</link>
      <description>&lt;P&gt;Hi Yjl,&lt;BR /&gt;
	&lt;BR /&gt;
	​Thank you for the details, just quick review and seems the build was mkl-dnn enabled mainly and 1 thread used. We will look into here.&lt;/P&gt;

&lt;P&gt;And could you&amp;nbsp;​please sumbit your issue to&amp;nbsp; &lt;A href="https://github.com/intel/mkl-dnn/issues" target="_blank"&gt;https://github.com/intel/mkl-dnn/issues&lt;/A&gt;? where our developer may check the problem directly with ready environment.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;BR /&gt;
	​Ying&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 01:32:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182045#M29394</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-05-09T01:32:48Z</dc:date>
    </item>
    <item>
      <title>Filed https://github.com</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182046#M29395</link>
      <description>&lt;P&gt;Filed &lt;A href="https://github.com/intel/mkl-dnn/issues/234&amp;nbsp;" target="_blank"&gt;https://github.com/intel/mkl-dnn/issues/234&amp;nbsp;&lt;/A&gt;; ..&lt;/P&gt;

&lt;P&gt;More discovery:&lt;/P&gt;

&lt;P&gt;1.&amp;nbsp; _MklFusedBatchNorm is slower than FusedBatchNorm&amp;nbsp; &amp;nbsp;102ms VS 88ms&lt;/P&gt;

&lt;P&gt;2.&amp;nbsp;_MklConv2DWithBias is slower than&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;Conv2D + BiasAdd&amp;nbsp; &amp;nbsp; 42ms VS&amp;nbsp; 25+3ms&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;3. MKL introduced several extra operations that are pretty expensive, like _MklInputConversion and&amp;nbsp;_MklToTf&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 23:00:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Tensorflow-performance-w-MKL/m-p/1182046#M29395</guid>
      <dc:creator>Liu__Chao</dc:creator>
      <dc:date>2018-05-09T23:00:28Z</dc:date>
    </item>
  </channel>
</rss>

