<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Thanks for the additional in Intel® Distribution of OpenVINO™ Toolkit</title>
    <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121903#M7367</link>
    <description>&lt;P&gt;Thanks for the additional data. &amp;nbsp; Fortunately the dev team has already recognized this behavior from internal testing. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;So, to go back to your original post, the issue is in IE (your option #3 above) but we should expect improvements in the next release.&lt;/P&gt;

&lt;P&gt;Will it work for you to proceed with the current performance limitations until the next release is available?&lt;/P&gt;</description>
    <pubDate>Mon, 12 Jun 2017 19:51:23 GMT</pubDate>
    <dc:creator>Jeffrey_M_Intel1</dc:creator>
    <dc:date>2017-06-12T19:51:23Z</dc:date>
    <item>
      <title>Inference Engine's classification sample batch performance</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121899#M7363</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I'd like to ask a question about the classification sample of the Inference Engine. For fp16 precision and batch processing, the average running time looks pretty bad. Why is that?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;So far I have only tested with the Alexnet. I first ran the model optimizer (&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;MO&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;) to set the precision and batch size, and then I ran the classification sample with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;the xml and weights generated by MO. See below for details.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;EM style="font-size: 13.008px;"&gt;&lt;STRONG&gt;Test 1, precision = fp32, batch size = 1.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;ModelOptimizer -w ./bvlc_alexnet.caffemodel -p FP32 -d ./deploy_alexnet.prototxt -f 1 -b 1 --target APLK -i&lt;BR /&gt;
	Start working...&lt;BR /&gt;
	Framework plugin: CAFFE&lt;BR /&gt;
	Target type: APLK&lt;BR /&gt;
	Network type: CLASSIFICATION&lt;BR /&gt;
	Batch size: 1&lt;BR /&gt;
	Precision: FP32&lt;BR /&gt;
	Layer fusion: true&lt;BR /&gt;
	Output directory: Artifacts&lt;BR /&gt;
	Custom kernels directory:&amp;nbsp;&lt;BR /&gt;
	Network input normalization: 1&lt;BR /&gt;
	Writing binary data to: Artifacts/AlexNet/AlexNet.bin&lt;/P&gt;

&lt;P&gt;./classification_sample -i ice-creams-227x227.bmp -m ./Artifacts/AlexNet/AlexNet.xml -l ./synset_words.txt -d GPU&lt;BR /&gt;
	InferenceEngine:&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 1.0&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. 2778&lt;BR /&gt;
	****&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 0.1&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. manual-01121&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;Description ....... clDNNPlugin&lt;BR /&gt;
	Average running time of one iteration: 14 ms&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;SPAN style="font-weight: 700;"&gt;Test 2, precision = fp16, batch size = 1.&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;DIV&gt;
	&lt;P&gt;ModelOptimizer -w ./bvlc_alexnet.caffemodel -p FP16 -d ./deploy_alexnet.prototxt -f 1 -b 1 --target APLK -i&lt;BR /&gt;
		Start working...&lt;BR /&gt;
		Framework plugin: CAFFE&lt;BR /&gt;
		Target type: APLK&lt;BR /&gt;
		Network type: CLASSIFICATION&lt;BR /&gt;
		Batch size: 1&lt;BR /&gt;
		Precision: FP16&lt;BR /&gt;
		Layer fusion: true&lt;BR /&gt;
		Output directory: Artifacts&lt;BR /&gt;
		Custom kernels directory:&amp;nbsp;&lt;BR /&gt;
		Network input normalization: 1&lt;BR /&gt;
		Writing binary data to: Artifacts/AlexNet/AlexNet.bin&lt;/P&gt;

	&lt;P&gt;./classification_sample -i ice-creams-227x227.bmp -m ./Artifacts/AlexNet/AlexNet.xml -l ./synset_words.txt -d GPU&lt;BR /&gt;
		InferenceEngine:&amp;nbsp;&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 1.0&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. 2778&lt;BR /&gt;
		****&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 0.1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. manual-01121&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;Description ....... clDNNPlugin&lt;BR /&gt;
		Average running time of one iteration: 9 ms&lt;/P&gt;
&lt;/DIV&gt;

&lt;P style="font-size: 13.008px;"&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;SPAN style="font-weight: 700;"&gt;Test 3, precision = fp32, batch size = 8.&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;DIV&gt;
	&lt;P&gt;ModelOptimizer -w ./bvlc_alexnet.caffemodel -p FP32 -d ./deploy_alexnet.prototxt -f 1 -b 8 --target APLK -i&lt;BR /&gt;
		Start working...&lt;BR /&gt;
		Framework plugin: CAFFE&lt;BR /&gt;
		Target type: APLK&lt;BR /&gt;
		Network type: CLASSIFICATION&lt;BR /&gt;
		Batch size: 8&lt;BR /&gt;
		Precision: FP32&lt;BR /&gt;
		Layer fusion: true&lt;BR /&gt;
		Output directory: Artifacts&lt;BR /&gt;
		Custom kernels directory:&amp;nbsp;&lt;BR /&gt;
		Network input normalization: 1&lt;BR /&gt;
		Writing binary data to: Artifacts/AlexNet/AlexNet.bin&lt;/P&gt;

	&lt;P&gt;./classification_sample -i ice-creams-227x227.bmp -i tiger-eyes-227x227.bmp -i cat.bmp -i tiger-227x227.bmp -m ./Artifacts/AlexNet/AlexNet.xml -l ./synset_words.txt -d GPU&lt;BR /&gt;
		InferenceEngine:&amp;nbsp;&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 1.0&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. 2778&lt;BR /&gt;
		****&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 0.1&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. manual-01121&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp; &amp;nbsp;Description ....... clDNNPlugin&lt;BR /&gt;
		Average running time of one iteration: 52 ms&lt;/P&gt;

	&lt;P&gt;&lt;STRONG&gt;&lt;EM style="font-size: 13.008px;"&gt;&lt;SPAN style="font-weight: 700;"&gt;Test 4, precision = fp16, batch size = 8.&lt;/SPAN&gt;&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;

	&lt;DIV&gt;
		&lt;P&gt;ModelOptimizer -w ./bvlc_alexnet.caffemodel -p FP16 -d ./deploy_alexnet.prototxt -f 1 -b 8 --target APLK -i&lt;BR /&gt;
			Start working...&lt;BR /&gt;
			Framework plugin: CAFFE&lt;BR /&gt;
			Target type: APLK&lt;BR /&gt;
			Network type: CLASSIFICATION&lt;BR /&gt;
			Batch size: 8&lt;BR /&gt;
			Precision: FP16&lt;BR /&gt;
			Layer fusion: true&lt;BR /&gt;
			Output directory: Artifacts&lt;BR /&gt;
			Custom kernels directory:&amp;nbsp;&lt;BR /&gt;
			Network input normalization: 1&lt;BR /&gt;
			Writing binary data to: Artifacts/AlexNet/AlexNet.bin&lt;/P&gt;

		&lt;P&gt;./classification_sample -i ice-creams-227x227.bmp -i tiger-eyes-227x227.bmp -i cat.bmp -i tiger-227x227.bmp -m ./Artifacts/AlexNet/AlexNet.xml -l ./synset_words.txt -d GPU&lt;BR /&gt;
			InferenceEngine:&amp;nbsp;&lt;BR /&gt;
			&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 1.0&lt;BR /&gt;
			&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. 2778&lt;BR /&gt;
			****&lt;BR /&gt;
			&amp;nbsp;&amp;nbsp; &amp;nbsp;API version ............ 0.1&lt;BR /&gt;
			&amp;nbsp;&amp;nbsp; &amp;nbsp;Build .................. manual-01121&lt;BR /&gt;
			&amp;nbsp;&amp;nbsp; &amp;nbsp;Description ....... clDNNPlugin&lt;BR /&gt;
			Average running time of one iteration: &lt;STRONG&gt;321&lt;/STRONG&gt; ms&lt;/P&gt;

		&lt;P style="font-size: 13.008px;"&gt;&lt;SPAN style="font-size: 13.008px;"&gt;The classification top 10 results were consistent and seemingly correct, so I snipped them for clarity. At precision fp32, the average running time looked normal.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;However, at fp16, &lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;the average running time for batch size 8 is 40 times that for batch size 1, much worse than doing no-batch 8 times.&lt;/SPAN&gt;&lt;/P&gt;

		&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I repeated my tests with different batch sizes, and saw similar performance trend. For fp16, the batch performance looked really bad.&lt;/SPAN&gt;&lt;/P&gt;

		&lt;P&gt;I can think of three possible causes for this behavior:&lt;/P&gt;

		&lt;P&gt;1. I was doing something wrong.&lt;/P&gt;

		&lt;P&gt;2. There was something wrong with the classification sample.&lt;/P&gt;

		&lt;P&gt;3. There was something wrong with the Inference Engine.&lt;/P&gt;

		&lt;P&gt;Could you look into this issue?&lt;/P&gt;

		&lt;P&gt;Thanks,&lt;/P&gt;

		&lt;P&gt;-Robby&amp;nbsp;&lt;/P&gt;
	&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Jun 2017 22:00:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121899#M7363</guid>
      <dc:creator>RSun9</dc:creator>
      <dc:date>2017-06-07T22:00:01Z</dc:date>
    </item>
    <item>
      <title>Hi Robby,</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121900#M7364</link>
      <description>&lt;P&gt;Hi Robby,&lt;/P&gt;

&lt;P&gt;I've replicated the higher than expected execution time for batch size 8 for FP16 for AlexNet classification, and we are investigating.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However, with a few more data points you&amp;nbsp;should be able to see that even with the strange behavior&amp;nbsp;for that&amp;nbsp;single combination the expected general pattern is there:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Higher batch sizes provide better performance.&lt;/LI&gt;
	&lt;LI&gt;Lower batch sizes allow you to&amp;nbsp;trade performance for lower latency.&lt;/LI&gt;
	&lt;LI&gt;FP16 on GPU is roughly 2x performance&amp;nbsp;vs FP32&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;For reference, I gathered a snapshot (NOT an official benchmark!) of Alexnet classification rates by running the classification sample&amp;nbsp;on my test machine which has the CVSDK beta installed.&amp;nbsp; It has an i7-6770HQ processor with Iris Pro Graphics 580.&lt;/P&gt;

&lt;P&gt;Do you see similar patterns if you test more batch sizes?&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;FP16&amp;nbsp;&amp;nbsp;&amp;nbsp; Avg ms&amp;nbsp; ms/img&amp;nbsp; imgs/sec
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 42&amp;nbsp;&amp;nbsp;&amp;nbsp; 42.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 23.8
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 42&amp;nbsp;&amp;nbsp;&amp;nbsp; 21.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 47.6
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 80&amp;nbsp;&amp;nbsp;&amp;nbsp; 20.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 50.0
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 144&amp;nbsp;&amp;nbsp;&amp;nbsp; 18.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 55.6
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 60&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.8&amp;nbsp;&amp;nbsp; 266.7
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 65&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2.0&amp;nbsp;&amp;nbsp; 492.3
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 64&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 105&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1.6&amp;nbsp;&amp;nbsp; 609.5
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 128&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 203&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1.6&amp;nbsp;&amp;nbsp; 630.5


FP32&amp;nbsp;&amp;nbsp;&amp;nbsp; Avg ms&amp;nbsp; ms/img&amp;nbsp; imgs/sec
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 18&amp;nbsp;&amp;nbsp;&amp;nbsp; 18.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 55.6
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 18&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 9.0&amp;nbsp;&amp;nbsp; 111.1
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 25&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.3&amp;nbsp;&amp;nbsp; 160.0
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.0&amp;nbsp;&amp;nbsp; 250.0
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 51&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.2&amp;nbsp;&amp;nbsp; 313.7
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 103&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.2&amp;nbsp;&amp;nbsp; 310.7
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 64&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 198&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.1&amp;nbsp;&amp;nbsp; 323.2
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 128&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 397&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.1&amp;nbsp;&amp;nbsp; 322.4


CPU FP32 Avg ms&amp;nbsp; ms/img&amp;nbsp; imgs/sec
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 29&amp;nbsp;&amp;nbsp;&amp;nbsp; 29.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 34.5
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 28&amp;nbsp;&amp;nbsp;&amp;nbsp; 14.0&amp;nbsp;&amp;nbsp;&amp;nbsp; 71.4
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 37&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 9.3&amp;nbsp;&amp;nbsp; 108.1
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 60&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7.5&amp;nbsp;&amp;nbsp; 133.3
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 101&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.3&amp;nbsp;&amp;nbsp; 158.4
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 167&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5.2&amp;nbsp;&amp;nbsp; 191.6
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 64&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 350&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5.5&amp;nbsp;&amp;nbsp; 182.9
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 128&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 693&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5.4&amp;nbsp;&amp;nbsp; 184.7
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jun 2017 01:13:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121900#M7364</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2017-06-08T01:13:00Z</dc:date>
    </item>
    <item>
      <title>Hi Jeffrey, thanks for the</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121901#M7365</link>
      <description>&lt;P&gt;Hi Jeffrey, thanks for the confirmation.&lt;/P&gt;

&lt;P&gt;I have only tested a few other batch sizes. In my tests, fp16 seemed to under-perform fp32 in other batch sizes too, but my data points were limited. I'll see if I have time to run more tests.&lt;/P&gt;

&lt;P&gt;My test platform has a Core i7-6700 (3.4GHz) with an integrated HD Graphics 530.&lt;/P&gt;

&lt;P&gt;-Robby&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jun 2017 17:49:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121901#M7365</guid>
      <dc:creator>RSun9</dc:creator>
      <dc:date>2017-06-08T17:49:52Z</dc:date>
    </item>
    <item>
      <title>I managed to run tests at the</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121902#M7366</link>
      <description>&lt;P&gt;I managed to run tests at the same data points. I'll just post my results, and let you draw the conclusion ;-)&lt;/P&gt;

&lt;P&gt;Again, my&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;test platform has a Core i7-6700 (3.4GHz) with an integrated HD Graphics 530.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;[ Edit: I can't seem to get the table to display properly. Will have to leave it as is. ]&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;GPU, FP16&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;batch-size &amp;nbsp;average-ms &amp;nbsp;ms/image &amp;nbsp; &amp;nbsp;images/sec&lt;BR /&gt;
	1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 9 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 9.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 111.1&lt;BR /&gt;
	2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 84 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;42.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;23.8&lt;BR /&gt;
	4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 163 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 40.8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;24.5&lt;BR /&gt;
	8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 322 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 40.3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;24.8&lt;BR /&gt;
	16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;118 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 7.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 135.6&lt;BR /&gt;
	32 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;126 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3.9 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 254.0&lt;BR /&gt;
	64 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;217 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 294.9&lt;BR /&gt;
	128 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 431 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 297.0&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;GPU, FP32&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;batch-size &amp;nbsp;average-ms &amp;nbsp;ms/image &amp;nbsp; &amp;nbsp;images/sec&lt;BR /&gt;
	1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 15 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;15.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;66.7&lt;BR /&gt;
	2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 24 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;12.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;83.3&lt;BR /&gt;
	4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 41 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;10.3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;97.6&lt;BR /&gt;
	8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 52 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;6.5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 153.8&lt;BR /&gt;
	16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;93 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5.8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 172.0&lt;BR /&gt;
	32 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;178 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5.6 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 179.8&lt;BR /&gt;
	64 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;351 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5.5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 182.3&lt;BR /&gt;
	128 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 700 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5.5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 182.9&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;CPU, FP32&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 13 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;13.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;76.9&lt;BR /&gt;
	2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 28 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;14.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;71.4&lt;BR /&gt;
	4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 35 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;8.8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 114.3&lt;BR /&gt;
	8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 50 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;6.3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 160.0&lt;BR /&gt;
	16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;82 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5.1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 195.1&lt;BR /&gt;
	32 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;143 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4.5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 223.8&lt;BR /&gt;
	64 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;260 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4.1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 246.2&lt;BR /&gt;
	128 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 506 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 253.0&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;-Robby&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jun 2017 00:50:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121902#M7366</guid>
      <dc:creator>RSun9</dc:creator>
      <dc:date>2017-06-09T00:50:00Z</dc:date>
    </item>
    <item>
      <title>Thanks for the additional</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121903#M7367</link>
      <description>&lt;P&gt;Thanks for the additional data. &amp;nbsp; Fortunately the dev team has already recognized this behavior from internal testing. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;So, to go back to your original post, the issue is in IE (your option #3 above) but we should expect improvements in the next release.&lt;/P&gt;

&lt;P&gt;Will it work for you to proceed with the current performance limitations until the next release is available?&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jun 2017 19:51:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121903#M7367</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2017-06-12T19:51:23Z</dc:date>
    </item>
    <item>
      <title>Hi Jeffrey, thanks for the</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121904#M7368</link>
      <description>&lt;P&gt;Hi Jeffrey, thanks for the update. I am glad your team was able to find the real cause so fast.&lt;/P&gt;

&lt;P&gt;I can work with the current version for now, and wait for the next release.&lt;/P&gt;

&lt;P&gt;-Robby&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jun 2017 20:14:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Inference-Engine-s-classification-sample-batch-performance/m-p/1121904#M7368</guid>
      <dc:creator>RSun9</dc:creator>
      <dc:date>2017-06-12T20:14:42Z</dc:date>
    </item>
  </channel>
</rss>

