<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Which thread on which processor - can it be controlled (sch in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888835#M3673</link>
    <description>&lt;BR /&gt;Hi there,&lt;BR /&gt;&lt;BR /&gt;Could anyone play around the code above and found that this code is taking double time on dual core than single core? (On myside -- gcc, Fedora - as mentioned in first post, x86 - as mentioned in first post.)&lt;BR /&gt;&lt;BR /&gt;Don't you expect it to take half the time of single core than double the time?&lt;BR /&gt;&lt;BR /&gt;Any settings to be taken care of (related to gcc, machine bios, NUMA, etc, etc)?&lt;BR /&gt;&lt;BR /&gt;:-)&lt;BR /&gt;&lt;BR /&gt;-BJ_CW</description>
    <pubDate>Mon, 22 Jan 2007 14:31:16 GMT</pubDate>
    <dc:creator>bj_cw</dc:creator>
    <dc:date>2007-01-22T14:31:16Z</dc:date>
    <item>
      <title>Which thread on which processor - can it be controlled (scheduled)?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888830#M3668</link>
      <description>I've Intel Pentium D (dual core). Fedora kernel 2.6.15-1.2054_FC5SMP.&lt;BR /&gt;&lt;BR /&gt;The following code is good enough for checking which thread is working on which core?&lt;BR /&gt;Here, I see that all the threads run on only one core.&lt;BR /&gt;How can I set affinity of 25 threads to one core and other 25 threads to anothrer core?&lt;BR /&gt;&lt;BR /&gt;The code is here -&lt;BR /&gt;/***************************************************/&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;PTHREAD.H&gt;&lt;BR /&gt;&lt;BR /&gt;pthread_mutex_t mutex_printf;&lt;BR /&gt;&lt;BR /&gt;void fillmem()&lt;BR /&gt;{&lt;BR /&gt; int i;&lt;BR /&gt; int j;&lt;BR /&gt; unsigned long mask;&lt;BR /&gt; for(i=0; i&amp;lt;200; i++)&lt;BR /&gt; {&lt;BR /&gt; pthread_mutex_lock (&amp;amp;mutex_printf);&lt;BR /&gt; pthread_getaffinity_np(pthread_self(), sizeof(mask), &amp;amp;mask);&lt;BR /&gt; printf("%d ", mask); /* Print the current proc number */&lt;BR /&gt; pthread_mutex_unlock (&amp;amp;mutex_printf);&lt;BR /&gt;&lt;BR /&gt; for(j=0; j&amp;lt;20000; j++); /* Delay */&lt;BR /&gt; }&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;int main(int argc, char *argv[])&lt;BR /&gt;{&lt;BR /&gt; int j;&lt;BR /&gt; pthread_t my_thread[50];&lt;BR /&gt; unsigned long mask = 1;&lt;BR /&gt;&lt;BR /&gt; pthread_mutex_init(&amp;amp;mutex_printf, NULL);&lt;BR /&gt;&lt;BR /&gt; for(j=0;j&amp;lt;49;j++) /* Loop to fire threads */&lt;BR /&gt; {&lt;BR /&gt; pthread_create(&amp;amp;my_thread&lt;J&gt;, NULL, fillmem, NULL);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; for(j=0;j&amp;lt;49;j++)&lt;BR /&gt; pthread_join(my_thread&lt;J&gt;, NULL);&lt;BR /&gt;&lt;BR /&gt; pthread_mutex_destroy(&amp;amp;mutex_printf);&lt;BR /&gt;&lt;BR /&gt; pthread_getaffinity_np(pthread_self(), sizeof(mask), &amp;amp;mask);&lt;BR /&gt; printf("%d
", mask);&lt;BR /&gt;}&lt;BR /&gt;/*********************************************************/&lt;BR /&gt;&lt;BR /&gt;How can I set affinity of 25 threads to one core and other 25 threads to anothrer core?&lt;BR /&gt;(I tried &lt;I&gt;&lt;B&gt;pthread_setaffinity_np()&lt;/B&gt;&lt;/I&gt;, &lt;I&gt;&lt;B&gt;sched_setaffinity()&lt;/B&gt;&lt;/I&gt; ...)&lt;BR /&gt;Can you get me a sample code?&lt;BR /&gt;&lt;BR /&gt;Thanks. :-)&lt;BR /&gt;BJ_CW&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/J&gt;&lt;/J&gt;&lt;/PTHREAD.H&gt;&lt;/STDIO.H&gt;</description>
      <pubDate>Wed, 17 Jan 2007 18:40:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888830#M3668</guid>
      <dc:creator>bj_cw</dc:creator>
      <dc:date>2007-01-17T18:40:59Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888831#M3669</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;
&lt;P&gt;Well i'm not an expert, but i'd suggest you to have a look at the "&lt;A href="http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/code/275339.htm"&gt;Detecting Multi-Core Processor Topology in an IA-32 Platform&lt;/A&gt;" document. In their sample, they show how to retrieve the amount of cores, and they also get the thread affinity related to each core. Then it's just a matter of using the proper affinity mask for each group of threads.&lt;/P&gt;
&lt;P&gt;/david&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2007 02:40:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888831#M3669</guid>
      <dc:creator>dpotages</dc:creator>
      <dc:date>2007-01-18T02:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888832#M3670</link>
      <description>&lt;P&gt;(Thanks David)&lt;/P&gt;
&lt;P&gt;But in that code,native assembly instructions are used. I'm deloping a portable app. Please consider the following code:&lt;/P&gt;
&lt;P&gt;/******************/&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;PTHREAD.H&gt;&lt;/PTHREAD.H&gt;&lt;/STDIO.H&gt;&lt;/P&gt;
&lt;P&gt;void th1()&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int j;&lt;BR /&gt;int k;&lt;BR /&gt;for(i=0; i&amp;lt;10000;i++)&lt;BR /&gt; for(j=0; j&amp;lt;10000;j++)&lt;BR /&gt; for(k=0; k&amp;lt;100;k++);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;void th2()&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int j;&lt;BR /&gt;int k;&lt;BR /&gt;for(i=0; i&amp;lt;10000;i++)&lt;BR /&gt; for(j=0; j&amp;lt;10000;j++)&lt;BR /&gt; for(k=0; k&amp;lt;100;k++);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;int main()&lt;BR /&gt;{&lt;BR /&gt; clock_t initial, final, seconds;&lt;BR /&gt; pthread_t my_thread1, my_thread2;&lt;/P&gt;
&lt;P&gt; initial = clock ();&lt;/P&gt;
&lt;P&gt; pthread_create(&amp;amp;my_thread1, NULL, th1, NULL);&lt;/P&gt;
&lt;P&gt; pthread_create(&amp;amp;my_thread2, NULL, th2, NULL);&lt;/P&gt;
&lt;P&gt; pthread_join(my_thread2, NULL);&lt;BR /&gt; pthread_join(my_thread1, NULL);&lt;/P&gt;
&lt;P&gt; final = clock();&lt;BR /&gt; printf("time = %lf
", (final-initial)/(double)CLOCKS_PER_SEC );&lt;BR /&gt;}&lt;BR /&gt;/******************/&lt;/P&gt;
&lt;P&gt;This code takes some 46clock timeon single core machine. And some 96 clock time on dual core. (Some times on dual core it executes in some 46 clock time).Why doesn't itexecute in some 23 clock time? Or how can it betweaked to run in 23 clock time?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Thanks, :-)&lt;/P&gt;
&lt;P&gt;BJ_CW&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2007 19:00:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888832#M3670</guid>
      <dc:creator>bj_cw</dc:creator>
      <dc:date>2007-01-18T19:00:04Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888833#M3671</link>
      <description>If those threads run so fast, it indicates your compiler has optimized away those loops. You would need some operations in the loops which the compiler doesn't recognize as do-nothing. So, you are measuring only the time required to set up the threads, which necessarily increases with number of threads.</description>
      <pubDate>Thu, 18 Jan 2007 19:12:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888833#M3671</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2007-01-18T19:12:43Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888834#M3672</link>
      <description>&lt;P&gt;(Hi tim18)&lt;/P&gt;
&lt;P&gt;The clock() function used here is assumed to be not interfering the thread create, execute and join phases. Is this assumption wrong?&lt;/P&gt;
&lt;P&gt;Whether both the threads (th1, th2 in the above code) are executing on two different cores of Intel D? If yes, what is the best way to find out the time lengthof execution?&lt;/P&gt;
&lt;P&gt;Or how to confirm that th1 and th2 are running on two different cores?&lt;/P&gt;
&lt;P&gt;(The for loop is good enough for unidentified do-nothing, as no optimization level is set.)&lt;/P&gt;
&lt;P&gt;-BJ_CW.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2007 21:04:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888834#M3672</guid>
      <dc:creator>bj_cw</dc:creator>
      <dc:date>2007-01-18T21:04:41Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888835#M3673</link>
      <description>&lt;BR /&gt;Hi there,&lt;BR /&gt;&lt;BR /&gt;Could anyone play around the code above and found that this code is taking double time on dual core than single core? (On myside -- gcc, Fedora - as mentioned in first post, x86 - as mentioned in first post.)&lt;BR /&gt;&lt;BR /&gt;Don't you expect it to take half the time of single core than double the time?&lt;BR /&gt;&lt;BR /&gt;Any settings to be taken care of (related to gcc, machine bios, NUMA, etc, etc)?&lt;BR /&gt;&lt;BR /&gt;:-)&lt;BR /&gt;&lt;BR /&gt;-BJ_CW</description>
      <pubDate>Mon, 22 Jan 2007 14:31:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888835#M3673</guid>
      <dc:creator>bj_cw</dc:creator>
      <dc:date>2007-01-22T14:31:16Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888836#M3674</link>
      <description>&lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;Hi&lt;BR /&gt;&lt;BR /&gt;One more point to update:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;I claim that features of single core and dual core machines, that I'm using, are exactly same. 
&lt;BR /&gt;How?&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;Machine is: Intel 
Pentium D (dual core).&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; &lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;&lt;BR /&gt;Fedora 
kernel: 2.6.15-1.2054_FC5  for Single core&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT color="blue" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: blue; font-family: Arial;"&gt;.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; 
&lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;&lt;BR /&gt;Fedora kernel: 
2.6.15-1.2054_FC5SMP  for Dual core&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT color="blue" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: blue; font-family: Arial;"&gt;.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT color="navy" face="Arial" size="2"&gt;&lt;SPAN style="font-size: 10pt; color: navy; font-family: Arial;"&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;Now, when I take one of the codes above and run on this single core, it takes "x" seconds, and on dual core, it takes "2x" secs. (Shouldn't it take "x/2" sec?) &lt;BR /&gt;Why this performance degradation?&lt;BR /&gt;Do you get similar result on your machine?&lt;BR /&gt;&lt;BR /&gt;What do you suggest me to get performance of "x/2", if "x/2" is right thing to have?&lt;BR /&gt;&lt;BR /&gt;-BJ_CW&lt;BR /&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 24 Jan 2007 16:51:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888836#M3674</guid>
      <dc:creator>bj_cw</dc:creator>
      <dc:date>2007-01-24T16:51:05Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888837#M3675</link>
      <description>&lt;P&gt;BJ_CW,&lt;/P&gt;
&lt;P&gt;I took the liberty to modify your source to use OpenMP in lieu of pthreads. On my Windows Server 2003 with 4 processors. Your pthread results should be similar assuming your threads are actually starting as you intend them to start.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;TIME.H&gt;&lt;BR /&gt;#include &lt;OMP.H&gt;&lt;/OMP.H&gt;&lt;/TIME.H&gt;&lt;/STDIO.H&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;void th1()&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int j;&lt;BR /&gt;int k;&lt;BR /&gt;for(i=0; i&amp;lt;10000;i++)&lt;BR /&gt; for(j=0; j&amp;lt;10000;j++)&lt;BR /&gt; for(k=0; k&amp;lt;100;k++);&lt;BR /&gt;}&lt;/PRE&gt;&lt;PRE&gt;void th2()&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int j;&lt;BR /&gt;int k;&lt;BR /&gt;for(i=0; i&amp;lt;10000;i++)&lt;BR /&gt; for(j=0; j&amp;lt;10000;j++)&lt;BR /&gt; for(k=0; k&amp;lt;100;k++);&lt;BR /&gt;}&lt;/PRE&gt;&lt;PRE&gt;void th1P4();// forward reference to variant on th1&lt;BR /&gt;void th2P4();// forward reference to variant on th2&lt;/PRE&gt;&lt;PRE&gt;int main()&lt;BR /&gt;{&lt;BR /&gt;clock_t initial, final, seconds;&lt;BR /&gt;// Single Thread&lt;BR /&gt;printf("Begin single thread test...
");&lt;BR /&gt;initial = clock ();&lt;BR /&gt;th1();&lt;BR /&gt;th2();&lt;BR /&gt;final = clock();&lt;BR /&gt;printf("time = %lf

", (final-initial)/(double)CLOCKS_PER_SEC );&lt;BR /&gt;&lt;BR /&gt;printf("Begin two thread test...
");&lt;BR /&gt;initial = clock ();&lt;BR /&gt;#pragma omp parallel sections num_threads(2)&lt;BR /&gt;{&lt;BR /&gt; #pragma omp section&lt;BR /&gt; {&lt;BR /&gt; th1();&lt;BR /&gt; }&lt;BR /&gt; #pragma omp section&lt;BR /&gt; {&lt;BR /&gt; th2();&lt;BR /&gt; }&lt;BR /&gt;}&lt;BR /&gt;final = clock();&lt;BR /&gt;printf("time = %lf

", (final-initial)/(double)CLOCKS_PER_SEC );&lt;BR /&gt;&lt;BR /&gt;printf("Begin Four thread test...
");&lt;BR /&gt;initial = clock ();&lt;BR /&gt;th1P4();&lt;BR /&gt;th2P4();&lt;BR /&gt;final = clock();&lt;BR /&gt;printf("time = %lf

", (final-initial)/(double)CLOCKS_PER_SEC );&lt;BR /&gt;}&lt;/PRE&gt;&lt;PRE&gt;void th1P4()&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int j;&lt;BR /&gt;int k;&lt;BR /&gt;#pragma omp parallel for num_threads(4) private(i, j, k)&lt;BR /&gt;for(i=0; i&amp;lt;10000;i++)&lt;BR /&gt; for(j=0; j&amp;lt;10000;j++)&lt;BR /&gt; for(k=0; k&amp;lt;100;k++);&lt;BR /&gt;}&lt;/PRE&gt;&lt;PRE&gt;void th2P4()&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int j;&lt;BR /&gt;int k;&lt;BR /&gt;#pragma omp parallel for num_threads(4) private(i, j, k)&lt;BR /&gt;for(i=0; i&amp;lt;10000;i++)&lt;BR /&gt; for(j=0; j&amp;lt;10000;j++)&lt;BR /&gt; for(k=0; k&amp;lt;100;k++);&lt;BR /&gt;}&lt;/PRE&gt;&lt;PRE&gt;------------- Output --------------------&lt;/PRE&gt;&lt;PRE&gt;Begin single thread test...&lt;BR /&gt;time = 53.343000&lt;/PRE&gt;&lt;PRE&gt;Begin two thread test...&lt;BR /&gt;time = 26.703000&lt;/PRE&gt;&lt;PRE&gt;Begin Four thread test...&lt;BR /&gt;time = 13.375000&lt;/PRE&gt;&lt;PRE&gt;Jim Dempsey&lt;/PRE&gt;&lt;PRE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Jan 2007 23:48:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888837#M3675</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-01-24T23:48:12Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888838#M3676</link>
      <description>&lt;P&gt;Use the "time" command and check the "user" and "real" times reported. On 2.6.x Redhat Linux kernels, clock() reports "user" time. This is the sum of CPU time used by all threads and can be greater than "real" time -- which is the wallclock time. &lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;On 2.4.x Redhat Linux kernels, clock() reports "real" time -- ie, true wallclock time. &lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Here's an example that should allow you to reclaim your sanity. "Matmul kernel wall clock time" is just the delta from reading clock() before and after entering the parallel region.&lt;/P&gt;
&lt;P&gt;On my Pentium D box (2 threads) running a Redhat 2.6 kernel,notice that what clock() reports is very close to the "user" value, as reportedby "time" ---&lt;/P&gt;
&lt;P&gt;$ cat /proc/version&lt;BR /&gt;Linux version 2.6.9-11.ELsmp (&lt;A href="mailto:bhcompile@crowe.devel.redhat.com"&gt;bhcompile@crowe.devel.redhat.com&lt;/A&gt;) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri May 20 18:25:30 EDT 2005&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;$ icc -V&lt;BR /&gt;Intel C Compiler for Intel EM64T-based applications, Version 9.1 Build 20070109 Package ID: l_cc_c_9.1.046&lt;BR /&gt;Copyright (C) 1985-2007 Intel Corporation. All rights reserved.&lt;/P&gt;
&lt;P&gt;$ icc -openmp matmul_clock.cpp &amp;amp;&amp;amp; time ./a.out&lt;/P&gt;
&lt;P&gt;Using clock() for wall clock time&lt;BR /&gt;Problem size: c(900,3600) = a(900,1800)*b(1800,3600)&lt;BR /&gt;Calculating product 5 time(s)&lt;BR /&gt;We are using 2 thread(s)...&lt;/P&gt;
&lt;P&gt;Matmul kernel wall clock time = 30.45 sec&lt;BR /&gt;Wall clock time/thread = 15.225 sec&lt;BR /&gt;Expected value for each matrix element is 1620900&lt;BR /&gt;Checking that all 3240000 elements of c&lt;I&gt;&lt;J&gt; = 1620900...done&lt;/J&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;===&amp;gt;&amp;gt;&amp;gt; Solution Validates &amp;lt;&amp;lt;&amp;lt;===&lt;/P&gt;
&lt;P&gt;real 0m15.346s&lt;BR /&gt;user 0m30.454s&lt;BR /&gt;sys 0m0.055s&lt;BR /&gt;$&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Now, compare that to my hyperthreaded DP Xeon server (4 threads), running a Redhat 2.4 kernel -- you will see that clock() (Matmul kernel wall clock time) is very close to time's "real" time -- and that the "user" time is about 4x the "real" time:&lt;/P&gt;
&lt;P&gt;$ cat /proc/version&lt;BR /&gt;Linux version 2.4.21-20.EL (&lt;A href="mailto:bhcompile@dolly.build.redhat.com"&gt;bhcompile@dolly.build.redhat.com&lt;/A&gt;) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-42)) #1 SMP Wed Aug 18 20:34:58 EDT 2004&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;$ icc -V&lt;BR /&gt;Intel C Compiler for Intel EM64T-based applications, Version 9.1 Build 20070109 Package ID: l_cc_c_9.1.046&lt;BR /&gt;Copyright (C) 1985-2007 Intel Corporation. All rights reserved.&lt;/P&gt;
&lt;P&gt;$ icc -openmp matmul_clock.cpp &amp;amp;&amp;amp; time ./a.out&lt;/P&gt;
&lt;P&gt;Using clock() for wall clock time&lt;BR /&gt;Problem size: c(900,3600) = a(900,1800)*b(1800,3600)&lt;BR /&gt;Calculating product 5 time(s)&lt;BR /&gt;We are using 4 thread(s)...&lt;/P&gt;
&lt;P&gt;Matmul kernel wall clock time = 17.64 sec&lt;BR /&gt;Wall clock time/thread = 4.41 sec&lt;BR /&gt;Expected value for each matrix element is 1620900&lt;BR /&gt;Checking that all 3240000 elements of c&lt;I&gt;&lt;J&gt; = 1620900...done&lt;/J&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;===&amp;gt;&amp;gt;&amp;gt; Solution Validates &amp;lt;&amp;lt;&amp;lt;===&lt;/P&gt;
&lt;P&gt;real 0m18.089s&lt;BR /&gt;user 1m10.160s&lt;BR /&gt;sys 0m0.090s&lt;BR /&gt;$&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Patrick Kennedy&lt;/P&gt;
&lt;P&gt;Intel Developer Support&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2007 06:56:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888838#M3676</guid>
      <dc:creator>pbkenned1</dc:creator>
      <dc:date>2007-01-25T06:56:57Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888839#M3677</link>
      <description>&lt;BR /&gt;(Thanks Patrick Kennedy)&lt;BR /&gt;&lt;BR /&gt;Please consider the following source:&lt;BR /&gt;/* &lt;A href="http://www.csce.uark.edu/~aapon/courses/os/examples/another.c" target="_blank"&gt;http://www.csce.uark.edu/~aapon/courses/os/examples/another.c&lt;/A&gt; */&lt;BR /&gt;/* I've modified slightly - BJ_CW */&lt;BR /&gt;&lt;BR /&gt;/**************************************************/&lt;BR /&gt;/* Another thread example. This one shows that */&lt;BR /&gt;/* pthreads in Linux can use both processors in */&lt;BR /&gt;/* a dual-processor Pentium. */&lt;BR /&gt;/* */&lt;BR /&gt;/* Usage: a.out &lt;NUM threads=""&gt; */&lt;BR /&gt;/* */&lt;BR /&gt;/* To compile me in Linux type: */&lt;BR /&gt;/* gcc -o another another.c -lpthread */&lt;BR /&gt;/**************************************************/&lt;BR /&gt;&lt;BR /&gt;#include &lt;PTHREAD.H&gt;&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;STDLIB.H&gt;&lt;BR /&gt;&lt;BR /&gt;#define MAX_THREADS 10&lt;BR /&gt;&lt;BR /&gt;int sum; /* this data is shared by the thread(s) */&lt;BR /&gt;void *runner(void * param);&lt;BR /&gt;&lt;BR /&gt;main(int argc, char *argv[])&lt;BR /&gt;{&lt;BR /&gt; int num_threads, i;&lt;BR /&gt; pthread_t tid[MAX_THREADS]; /* the thread identifiers */&lt;BR /&gt; pthread_attr_t attr; /* set of thread attributes */&lt;BR /&gt;&lt;BR /&gt; if (argc != 2) {&lt;BR /&gt; fprintf(stderr, "usage: a.out &lt;INTEGER value=""&gt;
");&lt;BR /&gt; exit(3);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; if (atoi(argv[1]) &amp;lt;= 0) {&lt;BR /&gt; fprintf(stderr,"%d must be &amp;gt; 0
", atoi(argv[1]));&lt;BR /&gt; exit(1);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; if (atoi(argv[1]) &amp;gt; MAX_THREADS) {&lt;BR /&gt; fprintf(stderr,"%d must be &amp;lt;= %d
", atoi(argv[1]), MAX_THREADS);&lt;BR /&gt; exit(2);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; num_threads = atoi(argv[1]);&lt;BR /&gt; printf("The number of threads is %d
", num_threads);&lt;BR /&gt;&lt;BR /&gt; /* get the default attributes */&lt;BR /&gt; pthread_attr_init(&amp;amp;attr);&lt;BR /&gt;&lt;BR /&gt; /* create the threads */&lt;BR /&gt; for (i=0; i&lt;NUM_THREADS&gt; pthread_create(&amp;amp;(tid&lt;I&gt;), &amp;amp;attr, runner, (void *) i);&lt;BR /&gt; printf("Creating thread number %d, tid=%lu 
", i, tid&lt;I&gt;);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; /* now wait for the threads to exit */&lt;BR /&gt; for (i=0; i&lt;NUM_THREADS&gt; pthread_join(tid&lt;I&gt;,NULL);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;/* The thread will begin control in this function */&lt;BR /&gt;void *runner(void * param)&lt;BR /&gt;{&lt;BR /&gt; int i;&lt;BR /&gt; int threadnumber = (int) param;&lt;BR /&gt; for (i=0; i&amp;lt;1000; i++) printf("Thread number=%d, i=%d
", threadnumber, i);&lt;BR /&gt; pthread_exit(0);&lt;BR /&gt;}&lt;BR /&gt;/*************************************/
&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;When I used the command:&lt;BR /&gt;$ time ./a.out 10 &amp;gt; a.txt&lt;BR /&gt;&lt;BR /&gt;the result was worse for dual core than single core -&lt;BR /&gt;single core result:&lt;BR /&gt;&lt;BR /&gt;real 0m0.008s&lt;BR /&gt;user 0m0.004s&lt;BR /&gt;sys 0m0.004s&lt;BR /&gt;&lt;BR /&gt;or&lt;BR /&gt;&lt;BR /&gt;real 0m0.008s&lt;BR /&gt;user 0m0.008s&lt;BR /&gt;sys 0m0.000s&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;dual core result:&lt;BR /&gt;&lt;BR /&gt;real 0m0.016s&lt;BR /&gt;user 0m0.008s&lt;BR /&gt;sys 0m0.020s&lt;BR /&gt;&lt;BR /&gt;or&lt;BR /&gt;&lt;BR /&gt;real 0m0.009s&lt;BR /&gt;user 0m0.008s&lt;BR /&gt;sys 0m0.000s&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;The 'real' time for dual core is more than that for single core.&lt;BR /&gt;Performance looks like degrading with dual core instead of improving.&lt;BR /&gt;Is there something wrong?&lt;BR /&gt;Or is something missing?&lt;BR /&gt;Do you get similar result on your machine?&lt;BR /&gt;Can you reason out?&lt;BR /&gt;&lt;BR /&gt;(The configuration of the system is same as above&lt;BR /&gt;Machine is: Intel Pentium D (dual core).&lt;BR /&gt;Fedora kernel: 2.6.15-1.2054_FC5  for Single core.&lt;BR /&gt;Fedora kernel: 2.6.15-1.2054_FC5SMP  for Dual core&lt;BR /&gt;)&lt;BR /&gt;&lt;BR /&gt;-BJ_CW&lt;BR /&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/NUM_THREADS&gt;&lt;/I&gt;&lt;/I&gt;&lt;/NUM_THREADS&gt;&lt;/INTEGER&gt;&lt;/STDLIB.H&gt;&lt;/STDIO.H&gt;&lt;/PTHREAD.H&gt;&lt;/NUM&gt;</description>
      <pubDate>Tue, 30 Jan 2007 14:33:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888839#M3677</guid>
      <dc:creator>bj_cw</dc:creator>
      <dc:date>2007-01-30T14:33:03Z</dc:date>
    </item>
    <item>
      <title>Re: Which thread on which processor - can it be controlled (sch</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888840#M3678</link>
      <description>&lt;P&gt;printf is a serialized function (on MP it performs an enter criticla section, prints, exit critical section). And display output rate isn't infinite. With more threads butting into each other the code will take longer to run.&lt;/P&gt;
&lt;P&gt;Insert some compute-only code in your runner function&lt;/P&gt;&lt;PRE&gt;void *runner(void * param)&lt;BR /&gt;{&lt;BR /&gt; int i;&lt;BR /&gt; int threadnumber = (int) param;&lt;BR /&gt; printf("Begin Thread number=%d, i=%d
", threadnumber);&lt;BR /&gt; for (i=0; i&amp;lt;1000000; i++)&lt;BR /&gt; if((double)i == 0.5) break;&lt;BR /&gt; printf("End Thread number=%d, i=%d
", threadnumber);&lt;BR /&gt; pthread_exit(0);&lt;BR /&gt;}&lt;BR /&gt;&lt;/PRE&gt;&lt;PRE&gt;Jim Dempsey&lt;/PRE&gt;</description>
      <pubDate>Wed, 31 Jan 2007 01:45:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-thread-on-which-processor-can-it-be-controlled-scheduled/m-p/888840#M3678</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-01-31T01:45:31Z</dc:date>
    </item>
  </channel>
</rss>

