- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I run a program and I measure the time.I have a Intel core i7 (4 cores , hyperthreading x2)
I use affinity . The time when I run the program with 2 threads in 1 core (so is using hyperthreading) is the same as when I run the program with 1 thread x 2 cores.
Can be possible or I am doing something wrong? I thought that in 1 core, the time would be higher because the core have to switch between the 2 threads.
I use affinity . The time when I run the program with 2 threads in 1 core (so is using hyperthreading) is the same as when I run the program with 1 thread x 2 cores.
Can be possible or I am doing something wrong? I thought that in 1 core, the time would be higher because the core have to switch between the 2 threads.
Link Copied
10 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - alef_dos
I run a program and I measure the time.I have a Intel core i7 (4 cores , hyperthreading x2)
I use affinity . The time when I run the program with 2 threads in 1 core (so is using hyperthreading) is the same as when I run the program with 1 thread x 2 cores.
Can be possible or I am doing something wrong? I thought that in 1 core, the time would be higher because the core have to switch between the 2 threads.
I use affinity . The time when I run the program with 2 threads in 1 core (so is using hyperthreading) is the same as when I run the program with 1 thread x 2 cores.
Can be possible or I am doing something wrong? I thought that in 1 core, the time would be higher because the core have to switch between the 2 threads.
Hi
Can be possible or I am doing something wrong?
I think, no
Your operate system is Llinux or Microsoft ?.
I think that you can observing really difference if you call
several process in same time the percent performance are decrease can be as reference. Personally I use this control for justify specific machine (several core and processors) at engineering control quality if required.
(IN started) process can be multiple process but (OUTING result) is generally 1 (using 1 or 12 processors not show significant benefit in only one inique processes test. Also you have not first owner priority level for exactly result programmed with all process system as working same time.
If you not observe result, do not determine easily benefit is false, is very hard task.
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - bustaf
Hi
Can be possible or I am doing something wrong?
I think, no
Your operate system is Llinux or Microsoft ?.
I think that you can observing really difference if you call
several process in same time the percent performance are decrease can be as reference. Personally I use this control for justify specific machine (several core and processors) at engineering control quality if required.
(IN started) process can be multiple process but (OUTING result) is generally 1 (using 1 or 12 processors not show significant benefit in only one inique processes test. Also you have not first owner priority level for exactly result programmed with all process system as working same time.
If you not observe result, do not determine easily benefit is false, is very hard task.
Best regards
Thanks for the answer.
My operate system is Windows.
I use a low-level instruction to set the affinity: kmp_set_affinity_mask_proc(i, &mask).
Anybody knows if this instruction will work in Linux or which are the low-level instructions to set the affinity in Linux?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
kmp calls would be the same for the corresponding Intel Linux library.
It would be unusual to find 2 threads running as fast on 2 logicals of the same core as on 2 cores, when threads compete for shared resources. if it's a question of speed of switching between threads, the single core case could have an advantage.
It would be unusual to find 2 threads running as fast on 2 logicals of the same core as on 2 cores, when threads compete for shared resources. if it's a question of speed of switching between threads, the single core case could have an advantage.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
kmp calls would be the same for the corresponding Intel Linux library.
It would be unusual to find 2 threads running as fast on 2 logicals of the same core as on 2 cores, when threads compete for shared resources. if it's a question of speed of switching between threads, the single core case could have an advantage.
It would be unusual to find 2 threads running as fast on 2 logicals of the same core as on 2 cores, when threads compete for shared resources. if it's a question of speed of switching between threads, the single core case could have an advantage.
Hi
kmp calls would be the same for the corresponding Intel Linux library.
Yes and not, the rules semaphores driving by kernel are not same in Linux and Microsoft when you drive affinity physical processors or thread.
You can having better result with each operate system, depending task programmed .
Advantage Linux you having source majority lib to understand
how you must operate for less potential conflicts.
Without theoretic and reality can be showed , i am in accord with Tim, generally same potential result.
Sometime ago i have make this test with Linux and Gcc (supposing is same with Icc ???)
alef_dos
Can you make same with appropriate syntax Microsoft to see if also affinity locking Openmp ???
Thank
Best Regards
(I cut and paste part of older exchange)
An minimal example
/*g++-4.3 -Wall -fopenmp -lm -O2 -m32 -Wno-write-strings -ftree-vrp -ftracer -fpredictive-commoning -fivopts -ftree-vectorize -ARCH=pentiumpro - -mtune=pentiumpro -fomit-frame-pointer -pipe omptest1.cc -o omptest1 */
/*for C type language as: (users interested)
unsigned long mask = choice number processor;
sched_setaffinity(0, sizeof(mask), &mask) <0);
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "omp.h"
int main()
{
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(3,&mask); // PROCESSOR 4 RESERVED
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0)
{
perror("sched_setaffinity");
}
#pragma omp parallel for schedule(guided,2)
for(int i=0;i<=10;i++)
{
std::cout << i<< std::endl;
for(int j=0;j<=10000000;j++)
{
// ...............
}
}
}
result with(sched_setaffinity) disable
debian:/# ./omptest1
48
105
2
3
6
7
9
0
1
result with (sched_setaffinity) enable
debian:/# ./omptest1
0
1
2
3
4
5
6
7
8
9
10
Threads not workink with simple (sched_setaffinity) ???
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CPU_ZERO(&mask);
CPU_SET(3,&mask); // PROCESSOR 4 RESERVED
^^^^
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) < 0)
Only run on CPU 3 (CPU 0,1,2 reserved)
Also, CPU in this context means Hardware Thread (logical processor).
CPU_ZERO(&mask);
CPU_SET(0,&mask); // USE PROCESSOR 0
CPU_SET(1,&mask); // USE PROCESSOR 1
CPU_SET(2,&mask); // USE PROCESSOR 2
// CPU 3 (and higher) RESERVED
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) < 0)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - bustaf
Hi
kmp calls would be the same for the corresponding Intel Linux library.
Yes and not, the rules semaphores driving by kernel are not same in Linux and Microsoft when you drive affinity physical processors or thread.
You can having better result with each operate system, depending task programmed .
Advantage Linux you having source majority lib to understand
how you must operate for less potential conflicts.
Without theoretic and reality can be showed , i am in accord with Tim, generally same potential result.
Sometime ago i have make this test with Linux and Gcc (supposing is same with Icc ???)
alef_dos
Can you make same with appropriate syntax Microsoft to see if also affinity locking Openmp ???
Thanks for the code. I will test in linux.
In windows i set the affinity: (for example)
# include
# include
# include
#include
#include
#define N_THREADS 2
int main ()
{
omp_set_num_threads (N_THREADS);
#pragma omp parallel
{
int tnum;
kmp_affinity_mask_t mask;
kmp_create_affinity_mask(&mask);
tnum = omp_get_thread_num();
if(tnum==0)
{
kmp_set_affinity_mask_proc(0, &mask); //thread 0 bind to processor 0
}
if(tnum==1)
{
kmp_set_affinity_mask_proc(7, &mask); //thread 1 bind to processor 7
}
}
.......
.......
return 0;
}
I used this code to set the affinity in a simple program of matrix multiplication and It works correctly but in other program more complex It is not working.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - alef_dos
Quoting - bustaf
Hi
kmp calls would be the same for the corresponding Intel Linux library.
Yes and not, the rules semaphores driving by kernel are not same in Linux and Microsoft when you drive affinity physical processors or thread.
You can having better result with each operate system, depending task programmed .
Advantage Linux you having source majority lib to understand
how you must operate for less potential conflicts.
Without theoretic and reality can be showed , i am in accord with Tim, generally same potential result.
Sometime ago i have make this test with Linux and Gcc (supposing is same with Icc ???)
alef_dos
Can you make same with appropriate syntax Microsoft to see if also affinity locking Openmp ???
Thanks for the code. I will test in linux.
In windows i set the affinity: (for example)
# include
# include
# include
#include
#include
#define N_THREADS 2
int main ()
{
omp_set_num_threads (N_THREADS);
#pragma omp parallel
{
int tnum;
kmp_affinity_mask_t mask;
kmp_create_affinity_mask(&mask);
tnum = omp_get_thread_num();
if(tnum==0)
{
kmp_set_affinity_mask_proc(0, &mask); //thread 0 bind to processor 0
}
if(tnum==1)
{
kmp_set_affinity_mask_proc(7, &mask); //thread 1 bind to processor 7
}
}
.......
.......
return 0;
}
I used this code to set the affinity in a simple program of matrix multiplication and It works correctly but in other program more complex It is not working.
Hi Jim and alef_dos
Sorry for my bad control with your language (in personal //comment) resulting other understand
Not desperate with my small head must minimum 5 years participate forum before showed progress.
also I hope with Google or Bing translator progress.....
Jim
int main()
{
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(3,&mask); // 3 CORRESPONDING PHYSICAL PROCESSOR 4 (count include #0) IS RESERVED FOR THIS TASK
//I WANT ONLY THAT ONLY PROCESSOR 4 MUST BE USED HERE.... NOT OTHER PHYSICAL # 1 2 3 5 6 7 8 (ARE RESERVED)
Also to choice other unique specific processor in lot resulting resulting same problem.
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0) // IF (-1) PHYSICAL PROCESSOR 4 NOT ACCEPTED
{
perror("sched_setaffinity");
}
RESULT LIST ORDERED ARE OPENMP IS NOT WORKING
RESULT LIST NO ORDERED ARE OPENMP IS WORKING
QUESTION IS SIMPLE, YOU CAN SHARE PROGRAMMING MANUALLY CONTROLLING
AFFINITY PHYSICAL PROCESSOR COEXISTING WITH OPENMP. ???
alef_dos. (Thank for sample source)
Problem is with old source where i must operate some change
some part of original source have already control affinity function without OPENMP relation.
Is resolved with manually control thread old school and function
pthread_attr_setaffinity_np,but is long and can be better and easy with use OPENMP.
Also problem is extended when you not having authorize to modify existing part source that you not have write...
In my change i want use OPENMP only in extend and change.
Two independent type call affinity processors..
I think for test Microsoft (if you want and you have times ,is not important) you can drive
affinity processor are not at openmp but separate
to evaluate if can be coexisting.
Thank for you two
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you run a multi-threaded application (process) and when the process is restricted to one processor (e.g. logical processor 3) then the threads run interleaved on the one (logical) processor. Runtime will be at least the same time as for running serialy (+overhead for context switching).
Your description of what you wanted to do was to run 2 threads as an HT pair of threads on two HT threads within one physical core (inside one physical processor)
Therefore you must set 2 affinity bits that corrispond to HT siblings. The affinity bits select hardware threads (logical processors) not physical CPUs, nor physical cores.
"CPU 0 and 1" may be (but are not necessarily assured to be) adjacent hardwarethreads and on HT system may (but not necessarily) be the HT siblings within a single core. (sometimes HT siblings are 0/2, 1/3, other times 0/1, 2/3). The system BIOS, when generating APIC ID's will generally determine the 0/1 or 0/2 (or other) sequencing.
Now with 1 hardware thread available to run two software threads,
and with schedule(dynamic,2)
and when time-slice of thread exceeds time to process all iterations of loop
Then,first thread to run will run all iteraions of loop
Now with 1 hardware thread available to run two software threads,
and with schedule(static,2)
and when time-slice of thread exceeds time to process all iterations of loop
Then each thread will process half the loop (but not at the same time), in this case you will likely see
0,1,4,5,8,9 (thread team member 0)
2,3,6,7,10 (thread team member 1)
Above order may be swapped when thread team member number 1 runs first.
Also note, if the OpenMP Block time is set for say 200ms. then inlast scenario (static) you will likely see computational dwell time _after_ first thread completes its half of loop and before second thread starts.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - bustaf
Problem is with old source where i must operate some change
some part of original source have already control affinity function without OPENMP relation.
Is resolved with manually control thread old school and function
pthread_attr_setaffinity_np,but is long and can be better and easy with use OPENMP.
Also problem is extended when you not having authorize to modify existing part source that you not have write...
In my change i want use OPENMP only in extend and change.
Two independent type call affinity processors..
I think for test Microsoft (if you want and you have times ,is not important) you can drive
affinity processor are not at openmp but separate
to evaluate if can be coexisting.
If you are on Windows, mixing pthreads with OpenMP isn't supported, as the latter uses Windows threading. Pthreads aren't supported by any Intel or Microsoft provided Windows libraries, unless possibly SFU.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Quoting - bustaf
Problem is with old source where i must operate some change
some part of original source have already control affinity function without OPENMP relation.
Is resolved with manually control thread old school and function
pthread_attr_setaffinity_np,but is long and can be better and easy with use OPENMP.
Also problem is extended when you not having authorize to modify existing part source that you not have write...
In my change i want use OPENMP only in extend and change.
Two independent type call affinity processors..
I think for test Microsoft (if you want and you have times ,is not important) you can drive
affinity processor are not at openmp but separate
to evaluate if can be coexisting.
If you are on Windows, mixing pthreads with OpenMP isn't supported, as the latter uses Windows threading. Pthreads aren't supported by any Intel or Microsoft provided Windows libraries, unless possibly SFU.
Hi
Thank Jim and Tim for your two instructive and interesting answers.
Best regards
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page