<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Michael, in Graphics</title>
    <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074177#M87353</link>
    <description>&lt;P&gt;Hi Michael,&lt;/P&gt;

&lt;P&gt;Thanx for response,&lt;BR /&gt;
	&lt;SPAN id="result_box" lang="en"&gt;&lt;SPAN&gt;I understand&lt;/SPAN&gt; &lt;SPAN&gt;that checking of&lt;/SPAN&gt; &lt;SPAN&gt;intermediate results can help, and I tried dump them.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	I can tell that issue has random errors on some blocks of data. And errors can be not on first step of algorithm.&lt;BR /&gt;
	&lt;SPAN id="result_box" lang="en"&gt;&lt;SPAN&gt;Finnaly,&lt;/SPAN&gt; &lt;SPAN&gt;I don't have enough&lt;/SPAN&gt; &lt;SPAN&gt;tools&lt;/SPAN&gt; &lt;SPAN&gt;to debug &lt;/SPAN&gt;&lt;SPAN&gt;glsl code, and on big data dump it's very slow. &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I can implement algorithm in OpenCL or in something else, if it can help to understand root of problem.&lt;BR /&gt;
	Can you suggest how to do better, maybe?&lt;/P&gt;

&lt;P&gt;Regards&lt;BR /&gt;
	Oleg Ageev&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Apr 2016 22:25:00 GMT</pubDate>
    <dc:creator>Oleg_A_1</dc:creator>
    <dc:date>2016-04-15T22:25:00Z</dc:date>
    <item>
      <title>Radix Sort - OpenGL Compute Shader</title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074172#M87348</link>
      <description>&lt;P&gt;Hello, I implement radix-sort on OpenGL Compute Shader.&lt;BR /&gt;
	&lt;A href="https://github.com/cNoNim/radix-sort"&gt;&lt;U&gt;&lt;FONT color="#0066cc"&gt;&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://github.com/cNoNim/radix-sort" target="_blank"&gt;https://github.com/cNoNim/radix-sort&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;But with Intel GPA&amp;nbsp;I have&amp;nbsp;some problems. On same code from this branch&lt;BR /&gt;
	&lt;A href="https://github.com/cNoNim/radix-sort/tree/simple"&gt;&lt;U&gt;&lt;FONT color="#0066cc"&gt;&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://github.com/cNoNim/radix-sort/tree/simple" target="_blank"&gt;https://github.com/cNoNim/radix-sort/tree/simple&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I have AMD GPU and algorithm works perfectly on it. But fails on NVidia and Intel.&lt;BR /&gt;
	I'm already post question on NVidia devtalk.&lt;BR /&gt;
	&lt;A href="https://devtalk.nvidia.com/default/topic/916998/cuda-programming-and-performance/radix-sort-opengl-compute-shader/"&gt;https://devtalk.nvidia.com/default/topic/916998/cuda-programming-and-performance/radix-sort-opengl-compute-shader/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In branch above I use key-only sort of increasing sequence of unsigned integers...&lt;BR /&gt;
	And I get:&lt;/P&gt;

&lt;PRE&gt;OpenGL &lt;SPAN style="color: rgb(240, 64, 0);"&gt;4&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;3&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt; - Build &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;18&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;14&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;4332&lt;/SPAN&gt;
        Intel
        Intel&lt;SPAN style="color: rgb(160, 96, 0);"&gt;(&lt;/SPAN&gt;R&lt;SPAN style="color: rgb(160, 96, 0);"&gt;)&lt;/SPAN&gt; HD Graphics &lt;SPAN style="color: rgb(240, 64, 0);"&gt;4600&lt;/SPAN&gt;
count   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;67108864&lt;/SPAN&gt; elapsed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;33442256&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;3&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;34422560&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;20067086&lt;/SPAN&gt; per sec        - FAILED
count   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;33554432&lt;/SPAN&gt; elapsed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;30997096&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;3&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;09970960&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10825024&lt;/SPAN&gt; per sec        - FAILED
count   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;16777216&lt;/SPAN&gt; elapsed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;15963538&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;1&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;59635380&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10509710&lt;/SPAN&gt; per sec        - FAILED
count    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;8388608&lt;/SPAN&gt; elapsed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;7868773&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;78687730&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10660630&lt;/SPAN&gt; per sec        - FAILED
count    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;4194304&lt;/SPAN&gt; elapsed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;3936232&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;39362320&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10655632&lt;/SPAN&gt; per sec        - FAILED
count    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;2097152&lt;/SPAN&gt; elapsed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;2028931&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;20289310&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10336241&lt;/SPAN&gt; per sec        - FAILED
count    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;1048576&lt;/SPAN&gt; elapsed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;1044249&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;10442490&lt;/SPAN&gt; sec speed   &lt;SPAN style="color: rgb(240, 64, 0);"&gt;10041436&lt;/SPAN&gt; per sec        - FAILED
count     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;524288&lt;/SPAN&gt; elapsed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;536672&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;05366720&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;9769244&lt;/SPAN&gt; per sec        - FAILED
count     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;262144&lt;/SPAN&gt; elapsed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;282782&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;02827820&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;9270179&lt;/SPAN&gt; per sec        - FAILED
count     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;131072&lt;/SPAN&gt; elapsed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;157309&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;01573090&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;8332136&lt;/SPAN&gt; per sec        - FAILED
count      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;65536&lt;/SPAN&gt; elapsed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;101779&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;01017790&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;6439049&lt;/SPAN&gt; per sec        - FAILED
count      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;32768&lt;/SPAN&gt; elapsed      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;74790&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;00747900&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;4381334&lt;/SPAN&gt; per sec        - FAILED
count      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;16384&lt;/SPAN&gt; elapsed      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;61738&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;00617380&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;2653795&lt;/SPAN&gt; per sec        - FAILED
count       &lt;SPAN style="color: rgb(240, 64, 0);"&gt;8192&lt;/SPAN&gt; elapsed      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;64297&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;00642970&lt;/SPAN&gt; sec speed    &lt;SPAN style="color: rgb(240, 64, 0);"&gt;1274087&lt;/SPAN&gt; per sec        - FAILED
count       &lt;SPAN style="color: rgb(240, 64, 0);"&gt;4096&lt;/SPAN&gt; elapsed      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;57811&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;00578110&lt;/SPAN&gt; sec speed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;708515&lt;/SPAN&gt; per sec        - FAILED
count       &lt;SPAN style="color: rgb(240, 64, 0);"&gt;2048&lt;/SPAN&gt; elapsed      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;98717&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;00987170&lt;/SPAN&gt; sec speed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;207461&lt;/SPAN&gt; per sec        - FAILED
count       &lt;SPAN style="color: rgb(240, 64, 0);"&gt;1024&lt;/SPAN&gt; elapsed      &lt;SPAN style="color: rgb(240, 64, 0);"&gt;52636&lt;/SPAN&gt; ticks &lt;SPAN style="color: rgb(240, 64, 0);"&gt;0&lt;/SPAN&gt;.&lt;SPAN style="color: rgb(240, 64, 0);"&gt;00526360&lt;/SPAN&gt; sec speed     &lt;SPAN style="color: rgb(240, 64, 0);"&gt;194543&lt;/SPAN&gt; per sec        - FAILED&lt;/PRE&gt;

&lt;P&gt;reference result&amp;nbsp;from AMD GPU:&lt;/P&gt;

&lt;TABLE&gt;
	&lt;TBODY&gt;
		&lt;TR&gt;
			&lt;TD&gt;
				&lt;DIV style="background: rgb(242, 242, 242); margin: 4px; padding: 5px;"&gt;
					&lt;PRE&gt;OpenGL 4.5.13399 Compatibility Profile Context 16.201.1151.1007
        ATI Technologies Inc.
        AMD Radeon HD 6700M Series
count   67108864 elapsed   49687237 ticks 4.96872370 sec speed   13506257 per sec        - PASSED
count   33554432 elapsed   29207774 ticks 2.92077740 sec speed   11488185 per sec        - PASSED
count   16777216 elapsed   14705172 ticks 1.47051720 sec speed   11409057 per sec        - PASSED
count    8388608 elapsed    7428293 ticks 0.74282930 sec speed   11292780 per sec        - PASSED
count    4194304 elapsed    3587719 ticks 0.35877190 sec speed   11690726 per sec        - PASSED
count    2097152 elapsed    1815771 ticks 0.18157710 sec speed   11549650 per sec        - PASSED
count    1048576 elapsed     934891 ticks 0.09348910 sec speed   11216024 per sec        - PASSED
count     524288 elapsed     631452 ticks 0.06314520 sec speed    8302895 per sec        - PASSED
count     262144 elapsed     266753 ticks 0.02667530 sec speed    9827218 per sec        - PASSED
count     131072 elapsed     142823 ticks 0.01428230 sec speed    9177233 per sec        - PASSED
count      65536 elapsed      92056 ticks 0.00920560 sec speed    7119144 per sec        - PASSED
count      32768 elapsed      66577 ticks 0.00665770 sec speed    4921819 per sec        - PASSED
count      16384 elapsed      51747 ticks 0.00517470 sec speed    3166173 per sec        - PASSED
count       8192 elapsed      47519 ticks 0.00475190 sec speed    1723942 per sec        - PASSED
count       4096 elapsed      42577 ticks 0.00425770 sec speed     962021 per sec        - PASSED
count       2048 elapsed      40735 ticks 0.00407350 sec speed     502761 per sec        - PASSED
count       1024 elapsed      41904 ticks 0.00419040 sec speed     244368 per sec        - PASSED
COMPLETE&lt;/PRE&gt;
				&lt;/DIV&gt;
			&lt;/TD&gt;
		&lt;/TR&gt;
	&lt;/TBODY&gt;
&lt;/TABLE&gt;

&lt;P&gt;Can somebody explain/help?&lt;/P&gt;

&lt;P&gt;Why I get such behavior? And what totally wrong In my code?&lt;BR /&gt;
	When I debug algorithm on simple case and array with 1024 elements, I get wrong intermediate result, but not on first stage of radix sort.&lt;BR /&gt;
	And some time I get correct result and test PASSED for 1024 elements, but for other array sizes I get fails, and if I try several tests like above I all time get FAILED.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Feb 2016 09:42:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074172#M87348</guid>
      <dc:creator>Oleg_A_1</dc:creator>
      <dc:date>2016-02-16T09:42:30Z</dc:date>
    </item>
    <item>
      <title>Hi Oleg, </title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074173#M87349</link>
      <description>&lt;P&gt;Hi Oleg,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;This is most likely a question for the Intel GPU driver team. &amp;nbsp;Let me move this thread to their forum. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;When we get the point where your shader is passing and want to analyze it with GPA, I can help then.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best,&lt;/P&gt;

&lt;P&gt;Seth&lt;/P&gt;</description>
      <pubDate>Tue, 16 Feb 2016 21:09:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074173#M87349</guid>
      <dc:creator>Seth_S_Intel</dc:creator>
      <dc:date>2016-02-16T21:09:44Z</dc:date>
    </item>
    <item>
      <title>Hi Seth,</title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074174#M87350</link>
      <description>&lt;P&gt;Hi Seth,&lt;/P&gt;

&lt;P&gt;And thanks for response. May be i can explain some places in code, if this needed.&lt;/P&gt;

&lt;P&gt;I use visual studio 2015 community edition for project building. And code have some macroses for embeding glsl code.&lt;/P&gt;

&lt;P&gt;May be it will complicate debugging. But i can divide code if this needed.&lt;/P&gt;

&lt;P&gt;And i have another question. On AMD GPU algorithm perfectly works with BARRIER defined like&lt;/P&gt;

&lt;P&gt;#define BARRIER groupMemoryBarrier()&lt;/P&gt;

&lt;P&gt;But on NVidia it lead to fails all time. And I try define BARRIER like&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;#define BARRIER groupMemoryBarrier(); barrier()&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;But I don't understand why first definition not enough.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;And I try place glMemoryBarrier between compute shader dispatch invocation. But on AMD code works without it.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;If you will fix or explain behavior of driver may be you also can check these places&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Regards,&lt;/P&gt;

&lt;P&gt;Oleg&lt;/P&gt;</description>
      <pubDate>Tue, 16 Feb 2016 23:10:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074174#M87350</guid>
      <dc:creator>Oleg_A_1</dc:creator>
      <dc:date>2016-02-16T23:10:00Z</dc:date>
    </item>
    <item>
      <title>I can tell that sometimes it</title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074175#M87351</link>
      <description>&lt;P&gt;I can tell that sometimes it works on master branch.&lt;BR /&gt;
	In master branch used key-value sort for signed integer key.&lt;BR /&gt;
	And in master branch I try implement algorithm on C++ AMP for comparison.&lt;/P&gt;

&lt;PRE&gt;OpenGL 4.3.0 - Build 10.18.14.4332
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Intel
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Intel(R) HD Graphics 4600
count&amp;nbsp;&amp;nbsp;&amp;nbsp; 1048576 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp; 2304529 ticks&amp;nbsp; 0.2304529 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 4550066 per sec - FAILED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 524288 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 605071 ticks&amp;nbsp; 0.0605071 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 8664900 per sec - FAILED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 262144 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 335384 ticks&amp;nbsp; 0.0335384 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 7816234 per sec - FAILED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 131072 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 197828 ticks&amp;nbsp; 0.0197828 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 6625553 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 65536 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 130822 ticks&amp;nbsp; 0.0130822 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 5009554 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 32768 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 92115 ticks&amp;nbsp; 0.0092115 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 3557292 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16384 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 75177 ticks&amp;nbsp; 0.0075177 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 2179389 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8192 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 73296 ticks&amp;nbsp; 0.0073296 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp; 1117659 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4096 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 66101 ticks&amp;nbsp; 0.0066101 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 619657 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2048 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 56784 ticks&amp;nbsp; 0.0056784 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 360664 per sec - PASSED
count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1024 elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 52732 ticks&amp;nbsp; 0.0052732 sec speed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 194189 per sec - PASSED&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2016 09:27:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074175#M87351</guid>
      <dc:creator>Oleg_A_1</dc:creator>
      <dc:date>2016-02-18T09:27:39Z</dc:date>
    </item>
    <item>
      <title>Hi Oleg,</title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074176#M87352</link>
      <description>&lt;P&gt;Hi Oleg,&lt;/P&gt;

&lt;P&gt;We were able to reproduce &amp;nbsp;the issue &amp;nbsp;with 5th and 6th Generation Core Processors with the latest 15.40 driver. We also saw the&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;issue with Nvidia's latest drivers (364.47)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;We tested a few options: compiler &amp;nbsp;(IGC/USC), SIMD mode, disable compiler optimizations,&amp;nbsp; disable low precision, none of them had and effect on the issue as it still failed. We&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;think it could be a synchronization problem, the compute program uses groupMemoryBarrier() and barrier()&amp;nbsp; functions for synchronization.You are using&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;4 compute programs, perhaps checking intermediate results before going to the next dispatch may help.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;-Michael&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 17:38:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074176#M87352</guid>
      <dc:creator>Michael_C_Intel2</dc:creator>
      <dc:date>2016-03-11T17:38:37Z</dc:date>
    </item>
    <item>
      <title>Hi Michael,</title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074177#M87353</link>
      <description>&lt;P&gt;Hi Michael,&lt;/P&gt;

&lt;P&gt;Thanx for response,&lt;BR /&gt;
	&lt;SPAN id="result_box" lang="en"&gt;&lt;SPAN&gt;I understand&lt;/SPAN&gt; &lt;SPAN&gt;that checking of&lt;/SPAN&gt; &lt;SPAN&gt;intermediate results can help, and I tried dump them.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	I can tell that issue has random errors on some blocks of data. And errors can be not on first step of algorithm.&lt;BR /&gt;
	&lt;SPAN id="result_box" lang="en"&gt;&lt;SPAN&gt;Finnaly,&lt;/SPAN&gt; &lt;SPAN&gt;I don't have enough&lt;/SPAN&gt; &lt;SPAN&gt;tools&lt;/SPAN&gt; &lt;SPAN&gt;to debug &lt;/SPAN&gt;&lt;SPAN&gt;glsl code, and on big data dump it's very slow. &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I can implement algorithm in OpenCL or in something else, if it can help to understand root of problem.&lt;BR /&gt;
	Can you suggest how to do better, maybe?&lt;/P&gt;

&lt;P&gt;Regards&lt;BR /&gt;
	Oleg Ageev&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 22:25:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074177#M87353</guid>
      <dc:creator>Oleg_A_1</dc:creator>
      <dc:date>2016-04-15T22:25:00Z</dc:date>
    </item>
    <item>
      <title>Hi Oleg,</title>
      <link>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074178#M87354</link>
      <description>&lt;P&gt;Hi Oleg,&lt;/P&gt;

&lt;P&gt;The observed&amp;nbsp; problems are random errors of block of data&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;To root cause the problem, you should divide it to smaller parts:&lt;/SPAN&gt;&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;Test only data flow&amp;nbsp; (read/write) &amp;nbsp;between compute shader dispatch- it can help identify synchronization issues or other memory related problems (invalid : layout, size, bindings, &amp;nbsp;etc) and isolate it &amp;nbsp;from sorting algorithm.&lt;/LI&gt;
	&lt;LI&gt;If data flow is correct check the sorting algorithm&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;-Michael&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 20:18:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/Radix-Sort-OpenGL-Compute-Shader/m-p/1074178#M87354</guid>
      <dc:creator>Michael_C_Intel2</dc:creator>
      <dc:date>2016-04-21T20:18:53Z</dc:date>
    </item>
  </channel>
</rss>

