<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: About Bus Transaction in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/About-Bus-Transaction/m-p/965826#M5400</link>
    <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Persepone -&lt;/P&gt;
&lt;P&gt;Are you running the same number of threads as before? They could be scheduled on the same physical processor (in a dual HT platform) which is what you are trying to avoid with your division of data. Even with four threads in a dual HT enabled system, you will have two threads assigned to the same processor with sharing or even splitting of cache and other processor resources.&lt;/P&gt;
&lt;P&gt;-- clay&lt;/P&gt;
&lt;DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 07 Apr 2004 22:30:13 GMT</pubDate>
    <dc:creator>ClayB</dc:creator>
    <dc:date>2004-04-07T22:30:13Z</dc:date>
    <item>
      <title>About Bus Transaction</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/About-Bus-Transaction/m-p/965825#M5399</link>
      <description>&lt;DIV&gt;A way to improve performance, I try to minimized bus transaction.&lt;BR /&gt;I read below from "IA-32 Intel Architecture Optimization Chapter 7", &lt;BR /&gt;and try to measure performance. But I don't know how I show the performance.&lt;BR /&gt;When I turned on the Hyper-Threading, performance went down.&lt;BR /&gt;Is there any incorrect part? &lt;BR /&gt;I also wonder which factors affect performance.&lt;BR /&gt;Test Source is my code. Pleae Review the code.&lt;/DIV&gt;
&lt;DIV&gt;system spec: H/W : IBM Xseries 225&lt;BR /&gt; OS : Redhat Linux 9&lt;BR /&gt; compiler : icc 8.0&lt;/DIV&gt;
&lt;DIV&gt;From IA-32 Intel Architecture Optimization Chapter 7"&lt;/DIV&gt;
&lt;DIV&gt;Minimize Sharing of Data between Physical Processors&lt;BR /&gt;When two threads are executing on two physical processors and sharing&lt;BR /&gt;data, reading from or writing to shared data usually involves several bus&lt;BR /&gt;transactions (including snooping, request for ownership changes, and&lt;BR /&gt;sometimes fetching data across the bus). A thread accessing a large&lt;BR /&gt;amount of shared memory is not likely to scale with processor clock&lt;BR /&gt;rates.&lt;/DIV&gt;
&lt;DIV&gt;User/Source Coding Rule 31. (H impact, M generality) Minimize the&lt;BR /&gt;sharing of data between threads that execute on different physical processors&lt;BR /&gt;sharing a common bus.&lt;/DIV&gt;
&lt;DIV&gt;One technique to minimize sharing of data is to copy data to local stack&lt;BR /&gt;variables if it is to be accessed repeatedly over an extended period. If&lt;BR /&gt;necessary, results from multiple threads can be combined later by&lt;BR /&gt;writing them back to a shared memory location. This approach can also&lt;BR /&gt;minimize time spent to synchronize access to shared data.&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;Test Source&lt;BR /&gt;+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;PTHREAD.H&gt;&lt;BR /&gt;#include &lt;SYS&gt;&lt;/SYS&gt;&lt;/PTHREAD.H&gt;&lt;/STDIO.H&gt;&lt;/DIV&gt;
&lt;DIV&gt;// For Debug&lt;BR /&gt;#ifdef DEBUG&lt;BR /&gt;#define DPRINTF(arg) printf arg&lt;BR /&gt;#else&lt;BR /&gt;#define DPRINTF(arg)&lt;BR /&gt;#endif&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;#define NUM_PROC 4&lt;BR /&gt;#define MAXLEN 1024*1024&lt;/DIV&gt;
&lt;DIV&gt;pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;&lt;/DIV&gt;
&lt;DIV&gt;int A[MAXLEN];&lt;BR /&gt;int B[MAXLEN];&lt;BR /&gt;int C[MAXLEN];&lt;BR /&gt;int full_cnt = 1;&lt;/DIV&gt;
&lt;DIV&gt;void* thread_fn(void *arg) {&lt;BR /&gt;int *t1, *t2, *t3;&lt;BR /&gt;long count = 0;&lt;BR /&gt;int i;&lt;/DIV&gt;
&lt;DIV&gt;#ifdef NORMAL&lt;BR /&gt;for (i=0; i&lt;FULL_CNT&gt;pthread_mutex_lock(&amp;amp;mutex);&lt;BR /&gt;for (count=0; count&lt;MAXLEN&gt;C[count] += A[count] + B[count];&lt;BR /&gt;}&lt;BR /&gt;pthread_mutex_unlock(&amp;amp;mutex);&lt;BR /&gt;}&lt;BR /&gt;#endif&lt;/MAXLEN&gt;&lt;/FULL_CNT&gt;&lt;/DIV&gt;
&lt;DIV&gt;#ifdef FAST &lt;BR /&gt;t1 = (int*)malloc(sizeof(int)*MAXLEN);&lt;BR /&gt;t2 = (int*)malloc(sizeof(int)*MAXLEN);&lt;BR /&gt;t3 = (int*)malloc(sizeof(int)*MAXLEN);&lt;/DIV&gt;
&lt;DIV&gt;for (count=0; count&lt;MAXLEN&gt;t1[count] = A[count];&lt;BR /&gt;t2[count] = B[count];&lt;BR /&gt;t3[count] = t1[count] + t2[count];&lt;BR /&gt;}&lt;/MAXLEN&gt;&lt;/DIV&gt;
&lt;DIV&gt;for (i=0; i&lt;FULL_CNT&gt;pthread_mutex_lock(&amp;amp;mutex);&lt;BR /&gt;for (count=0; count&lt;MAXLEN&gt;C[count] += t3[count];&lt;BR /&gt;}&lt;BR /&gt;pthread_mutex_unlock(&amp;amp;mutex);&lt;BR /&gt;}&lt;/MAXLEN&gt;&lt;/FULL_CNT&gt;&lt;/DIV&gt;
&lt;DIV&gt;free(t1);&lt;BR /&gt;free(t2);&lt;BR /&gt;free(t3);&lt;BR /&gt;#endif&lt;BR /&gt;}&lt;/DIV&gt;
&lt;DIV&gt;int main(int argc, char *argv[])&lt;BR /&gt;{&lt;BR /&gt;pthread_t tid[NUM_PROC];&lt;/DIV&gt;
&lt;DIV&gt;struct timeval start, end, result;&lt;BR /&gt;long i;&lt;BR /&gt;long j;&lt;/DIV&gt;
&lt;DIV&gt;if (argc &amp;lt; 2) {&lt;BR /&gt;printf("usage: false_none count
");&lt;BR /&gt;return 0;&lt;BR /&gt;}&lt;/DIV&gt;
&lt;DIV&gt;full_cnt = atoi(argv[1]);&lt;/DIV&gt;
&lt;DIV&gt;for (j=0; j&amp;lt; MAXLEN; j++) {&lt;BR /&gt;A&lt;J&gt; = 1;&lt;BR /&gt;B&lt;J&gt; = 1;&lt;BR /&gt;C&lt;J&gt; = 0;&lt;BR /&gt;}&lt;/J&gt;&lt;/J&gt;&lt;/J&gt;&lt;/DIV&gt;
&lt;DIV&gt;for (i=0; i&lt;NUM_PROC&gt;pthread_create(&amp;amp;tid&lt;I&gt;, NULL, thread_fn, NULL);&lt;BR /&gt;}&lt;/I&gt;&lt;/NUM_PROC&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;gettimeofday(&amp;amp;star
t, NULL);&lt;/DIV&gt;
&lt;DIV&gt;for (i=0; i&lt;NUM_PROC&gt;pthread_join(tid&lt;I&gt;, NULL);&lt;/I&gt;&lt;/NUM_PROC&gt;&lt;/DIV&gt;
&lt;DIV&gt;gettimeofday(&amp;amp;end, NULL);&lt;/DIV&gt;
&lt;DIV&gt;timersub(&amp;amp;end, &amp;amp;start, &amp;amp;result);&lt;BR /&gt;printf("%ld sec, %ld usec
", result.tv_sec, result.tv_usec);&lt;/DIV&gt;
&lt;DIV&gt;#ifdef DEBUG&lt;BR /&gt;for (i=255; i&amp;lt; 275; i++)&lt;BR /&gt;printf("%d",C&lt;I&gt;);&lt;/I&gt;&lt;/DIV&gt;
&lt;DIV&gt;printf("
");&lt;BR /&gt;#endif&lt;/DIV&gt;
&lt;DIV&gt;return 0;&lt;BR /&gt;}&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 26 Mar 2004 21:17:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/About-Bus-Transaction/m-p/965825#M5399</guid>
      <dc:creator>icicle</dc:creator>
      <dc:date>2004-03-26T21:17:38Z</dc:date>
    </item>
    <item>
      <title>Re: About Bus Transaction</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/About-Bus-Transaction/m-p/965826#M5400</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Persepone -&lt;/P&gt;
&lt;P&gt;Are you running the same number of threads as before? They could be scheduled on the same physical processor (in a dual HT platform) which is what you are trying to avoid with your division of data. Even with four threads in a dual HT enabled system, you will have two threads assigned to the same processor with sharing or even splitting of cache and other processor resources.&lt;/P&gt;
&lt;P&gt;-- clay&lt;/P&gt;
&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Apr 2004 22:30:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/About-Bus-Transaction/m-p/965826#M5400</guid>
      <dc:creator>ClayB</dc:creator>
      <dc:date>2004-04-07T22:30:13Z</dc:date>
    </item>
  </channel>
</rss>

