Hyperthreading and affinity

alef_dos · ‎07-25-2009

I run a program and I measure the time.I have a Intel core i7 (4 cores , hyperthreading x2)

I use affinity . The time when I run the program with 2 threads in 1 core (so is using hyperthreading) is the same as when I run the program with 1 thread x 2 cores.

Can be possible or I am doing something wrong? I thought that in 1 core, the time would be higher because the core have to switch between the 2 threads.

aazue · ‎07-25-2009

Quoting - alef_dos

I run a program and I measure the time.I have a Intel core i7 (4 cores , hyperthreading x2)

I use affinity . The time when I run the program with 2 threads in 1 core (so is using hyperthreading) is the same as when I run the program with 1 thread x 2 cores.

Can be possible or I am doing something wrong? I thought that in 1 core, the time would be higher because the core have to switch between the 2 threads.

Hi
Can be possible or I am doing something wrong?
I think, no

Your operate system is Llinux or Microsoft ?.

I think that you can observing really difference if you call
several process in same time the percent performance are decrease can be as reference. Personally I use this control for justify specific machine (several core and processors) at engineering control quality if required.
(IN started) process can be multiple process but (OUTING result) is generally 1 (using 1 or 12 processors not show significant benefit in only one inique processes test. Also you have not first owner priority level for exactly result programmed with all process system as working same time.
If you not observe result, do not determine easily benefit is false, is very hard task.
Best regards

alef_dos · ‎07-25-2009

Quoting - bustaf

Hi
Can be possible or I am doing something wrong?
I think, no

Your operate system is Llinux or Microsoft ?.

I think that you can observing really difference if you call
several process in same time the percent performance are decrease can be as reference. Personally I use this control for justify specific machine (several core and processors) at engineering control quality if required.
(IN started) process can be multiple process but (OUTING result) is generally 1 (using 1 or 12 processors not show significant benefit in only one inique processes test. Also you have not first owner priority level for exactly result programmed with all process system as working same time.
If you not observe result, do not determine easily benefit is false, is very hard task.
Best regards

Thanks for the answer.
My operate system is Windows.

I use a low-level instruction to set the affinity: kmp_set_affinity_mask_proc(i, &mask).

Anybody knows if this instruction will work in Linux or which are the low-level instructions to set the affinity in Linux?

TimP · ‎07-25-2009

kmp calls would be the same for the corresponding Intel Linux library.
It would be unusual to find 2 threads running as fast on 2 logicals of the same core as on 2 cores, when threads compete for shared resources. if it's a question of speed of switching between threads, the single core case could have an advantage.

aazue · ‎07-25-2009

Quoting - tim18

kmp calls would be the same for the corresponding Intel Linux library.
It would be unusual to find 2 threads running as fast on 2 logicals of the same core as on 2 cores, when threads compete for shared resources. if it's a question of speed of switching between threads, the single core case could have an advantage.

Hi
kmp calls would be the same for the corresponding Intel Linux library.
Yes and not, the rules semaphores driving by kernel are not same in Linux and Microsoft when you drive affinity physical processors or thread.
You can having better result with each operate system, depending task programmed .
Advantage Linux you having source majority lib to understand
how you must operate for less potential conflicts.
Without theoretic and reality can be showed , i am in accord with Tim, generally same potential result.

Sometime ago i have make this test with Linux and Gcc (supposing is same with Icc ???)

alef_dos
Can you make same with appropriate syntax Microsoft to see if also affinity locking Openmp ???

Thank
Best Regards
(I cut and paste part of older exchange)

An minimal example
/*g++-4.3 -Wall -fopenmp -lm -O2 -m32 -Wno-write-strings -ftree-vrp -ftracer -fpredictive-commoning -fivopts -ftree-vectorize -ARCH=pentiumpro - -mtune=pentiumpro -fomit-frame-pointer -pipe omptest1.cc -o omptest1 */

/*for C type language as: (users interested)
unsigned long mask = choice number processor;
sched_setaffinity(0, sizeof(mask), &mask) <0);
*/

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "omp.h"
int main()
{
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(3,&mask); // PROCESSOR 4 RESERVED
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0)
{
perror("sched_setaffinity");
}

#pragma omp parallel for schedule(guided,2)
for(int i=0;i<=10;i++)
{
std::cout << i<< std::endl;
for(int j=0;j<=10000000;j++)
{
// ...............
}
}
}

result with(sched_setaffinity) disable
debian:/# ./omptest1
48
105

2
3
6
7

9
0
1

result with (sched_setaffinity) enable
debian:/# ./omptest1
0
1
2
3
4
5
6
7
8
9
10

Threads not workink with simple (sched_setaffinity) ???

jimdempseyatthecove · ‎07-26-2009

CPU_ZERO(&mask);
CPU_SET(3,&mask); // PROCESSOR 4 RESERVED
^^^^
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) < 0)

Only run on CPU 3 (CPU 0,1,2 reserved)

Also, CPU in this context means Hardware Thread (logical processor).

CPU_ZERO(&mask);
CPU_SET(0,&mask); // USE PROCESSOR 0
CPU_SET(1,&mask); // USE PROCESSOR 1
CPU_SET(2,&mask); // USE PROCESSOR 2
// CPU 3 (and higher) RESERVED
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) < 0)

Jim Dempsey

alef_dos · ‎07-26-2009

Quoting - bustaf

Hi
kmp calls would be the same for the corresponding Intel Linux library.
Yes and not, the rules semaphores driving by kernel are not same in Linux and Microsoft when you drive affinity physical processors or thread.
You can having better result with each operate system, depending task programmed .
Advantage Linux you having source majority lib to understand
how you must operate for less potential conflicts.
Without theoretic and reality can be showed , i am in accord with Tim, generally same potential result.

Sometime ago i have make this test with Linux and Gcc (supposing is same with Icc ???)

alef_dos
Can you make same with appropriate syntax Microsoft to see if also affinity locking Openmp ???

Thanks for the code. I will test in linux.

In windows i set the affinity: (for example)

# include
# include
# include
#include
#include

#define N_THREADS 2

int main ()
{

omp_set_num_threads (N_THREADS);

#pragma omp parallel
{

int tnum;

kmp_affinity_mask_t mask;
kmp_create_affinity_mask(&mask);

tnum = omp_get_thread_num();

if(tnum==0)
{

kmp_set_affinity_mask_proc(0, &mask); //thread 0 bind to processor 0
}

if(tnum==1)
{

kmp_set_affinity_mask_proc(7, &mask); //thread 1 bind to processor 7
}

}
.......
.......
return 0;
}

I used this code to set the affinity in a simple program of matrix multiplication and It works correctly but in other program more complex It is not working.

aazue · ‎07-26-2009

Quoting - alef_dos

Quoting - bustaf

Hi
kmp calls would be the same for the corresponding Intel Linux library.
Yes and not, the rules semaphores driving by kernel are not same in Linux and Microsoft when you drive affinity physical processors or thread.
You can having better result with each operate system, depending task programmed .
Advantage Linux you having source majority lib to understand
how you must operate for less potential conflicts.
Without theoretic and reality can be showed , i am in accord with Tim, generally same potential result.

Sometime ago i have make this test with Linux and Gcc (supposing is same with Icc ???)

alef_dos
Can you make same with appropriate syntax Microsoft to see if also affinity locking Openmp ???

Thanks for the code. I will test in linux.

In windows i set the affinity: (for example)

# include
# include
# include
#include
#include

#define N_THREADS 2

int main ()
{

omp_set_num_threads (N_THREADS);

#pragma omp parallel
{

int tnum;

kmp_affinity_mask_t mask;
kmp_create_affinity_mask(&mask);

tnum = omp_get_thread_num();

if(tnum==0)
{

kmp_set_affinity_mask_proc(0, &mask); //thread 0 bind to processor 0
}

if(tnum==1)
{

kmp_set_affinity_mask_proc(7, &mask); //thread 1 bind to processor 7
}

}
.......
.......
return 0;
}

I used this code to set the affinity in a simple program of matrix multiplication and It works correctly but in other program more complex It is not working.

Hi Jim and alef_dos

Sorry for my bad control with your language (in personal //comment) resulting other understand
Not desperate with my small head must minimum 5 years participate forum before showed progress.
also I hope with Google or Bing translator progress.....

Jim
int main()
{
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(3,&mask); // 3 CORRESPONDING PHYSICAL PROCESSOR 4 (count include #0) IS RESERVED FOR THIS TASK
//I WANT ONLY THAT ONLY PROCESSOR 4 MUST BE USED HERE.... NOT OTHER PHYSICAL # 1 2 3 5 6 7 8 (ARE RESERVED)

Also to choice other unique specific processor in lot resulting resulting same problem.

if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0) // IF (-1) PHYSICAL PROCESSOR 4 NOT ACCEPTED
{
perror("sched_setaffinity");
}
RESULT LIST ORDERED ARE OPENMP IS NOT WORKING
RESULT LIST NO ORDERED ARE OPENMP IS WORKING

QUESTION IS SIMPLE, YOU CAN SHARE PROGRAMMING MANUALLY CONTROLLING
AFFINITY PHYSICAL PROCESSOR COEXISTING WITH OPENMP. ???

alef_dos. (Thank for sample source)

Problem is with old source where i must operate some change
some part of original source have already control affinity function without OPENMP relation.
Is resolved with manually control thread old school and function
pthread_attr_setaffinity_np,but is long and can be better and easy with use OPENMP.
Also problem is extended when you not having authorize to modify existing part source that you not have write...

In my change i want use OPENMP only in extend and change.
Two independent type call affinity processors..
I think for test Microsoft (if you want and you have times ,is not important) you can drive
affinity processor are not at openmp but separate
to evaluate if can be coexisting.

Thank for you two
Best regards

jimdempseyatthecove · ‎07-26-2009

When you run a multi-threaded application (process) and when the process is restricted to one processor (e.g. logical processor 3) then the threads run interleaved on the one (logical) processor. Runtime will be at least the same time as for running serialy (+overhead for context switching).

Your description of what you wanted to do was to run 2 threads as an HT pair of threads on two HT threads within one physical core (inside one physical processor)

Therefore you must set 2 affinity bits that corrispond to HT siblings. The affinity bits select hardware threads (logical processors) not physical CPUs, nor physical cores.

"CPU 0 and 1" may be (but are not necessarily assured to be) adjacent hardwarethreads and on HT system may (but not necessarily) be the HT siblings within a single core. (sometimes HT siblings are 0/2, 1/3, other times 0/1, 2/3). The system BIOS, when generating APIC ID's will generally determine the 0/1 or 0/2 (or other) sequencing.

Now with 1 hardware thread available to run two software threads,
and with schedule(dynamic,2)
and when time-slice of thread exceeds time to process all iterations of loop

Then,first thread to run will run all iteraions of loop

Now with 1 hardware thread available to run two software threads,
and with schedule(static,2)
and when time-slice of thread exceeds time to process all iterations of loop

Then each thread will process half the loop (but not at the same time), in this case you will likely see
0,1,4,5,8,9 (thread team member 0)
2,3,6,7,10 (thread team member 1)

Above order may be swapped when thread team member number 1 runs first.

Also note, if the OpenMP Block time is set for say 200ms. then inlast scenario (static) you will likely see computational dwell time _after_ first thread completes its half of loop and before second thread starts.

Jim Dempsey

TimP · ‎07-26-2009

Quoting - bustaf

Problem is with old source where i must operate some change
some part of original source have already control affinity function without OPENMP relation.
Is resolved with manually control thread old school and function
pthread_attr_setaffinity_np,but is long and can be better and easy with use OPENMP.
Also problem is extended when you not having authorize to modify existing part source that you not have write...

In my change i want use OPENMP only in extend and change.
Two independent type call affinity processors..
I think for test Microsoft (if you want and you have times ,is not important) you can drive
affinity processor are not at openmp but separate
to evaluate if can be coexisting.

Yes, it is good to stay with OpenMP and kmp extensions, not mixing in other functions which are likely to have conflicting effect.
If you are on Windows, mixing pthreads with OpenMP isn't supported, as the latter uses Windows threading. Pthreads aren't supported by any Intel or Microsoft provided Windows libraries, unless possibly SFU.

aazue · ‎07-26-2009

Quoting - tim18

Quoting - bustaf

Problem is with old source where i must operate some change
some part of original source have already control affinity function without OPENMP relation.
Is resolved with manually control thread old school and function
pthread_attr_setaffinity_np,but is long and can be better and easy with use OPENMP.
Also problem is extended when you not having authorize to modify existing part source that you not have write...

In my change i want use OPENMP only in extend and change.
Two independent type call affinity processors..
I think for test Microsoft (if you want and you have times ,is not important) you can drive
affinity processor are not at openmp but separate
to evaluate if can be coexisting.

Yes, it is good to stay with OpenMP and kmp extensions, not mixing in other functions which are likely to have conflicting effect.
If you are on Windows, mixing pthreads with OpenMP isn't supported, as the latter uses Windows threading. Pthreads aren't supported by any Intel or Microsoft provided Windows libraries, unless possibly SFU.

Hi
Thank Jim and Tim for your two instructive and interesting answers.
Best regards