- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have written a code whose skeleton looks like below.
#define CPU_THREADS 4 #define INPUTSIZE 4 #pragma omp parallel num_threads(CPU_THREADS) { #pragma omp for for(i =0; i < INPUTSIZE; i++) { .................................. ..............some code............... .. .................................. for(j=0; j < 100; j++) { ......some code....... #pragma offload target(mic) in(a[0:size] alloc_if(0) free_if(0)) out() { #pragma omp parallel num_threads(60) { #pragma omp for for(i=0; i< 240; i++) { .....................some code................ .......................................... } } } } } }
So here each CPU THREAD gets 60 MIC threads. I want to set my affinity in such a way that, 1st CPU THREAD has to use first cores of xeonphi(4 threads per core). 2nd CPU THREAD has to use from 15-30 cores. simlilarly 3rd and 4th CPU THREADS has to use 30-45, 45-60 cores.
Here what i observed is if i set KMP_AFFINITY= compact. only 15 cores are getting used. I think MIC threads from 0-4 of each CPU THREAD are getting mapped to core 0. Is there any way i can set AFFINITY based on their CPUTHREAD number.
Please help me on this. Please ask if you need any furthur clarifications.
Also, I noticed that my program is getting hanged at offload call some times(if i use multiple threads its hanging. in single thread mode it is working fine). what could be the reasons for it. Can i suspect the memory allocations that are happening during offload call are the reason?
Thanks
sivaramakrishna
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
KMP_AFFINITY presumably applies on host side only.
How are you determining which cores are active on the MIC side? micsmc-gui should give a quick picture.
It seems likely that setting an identical affinity (same origin, as in MIC_KMP_PLACE_THREADS=15c,0o) for each of the offloads would cause each to use the same group of cores.
I haven't seen any discussion of whether this would constitute a case of OMP_NESTED.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
I have launched 7 CPU threads. Each CPU THREAD has a offload call. Each CPUTHRAED has 236/7 = 33 MIC threads. I tried all AFFINITY schemes using export MIC_KMP_AFFINITY=SCATTER
In SCATTER mode each thread is scattered across of cores. most of the threads are scattering in 10-40(I don't know why).
export MIC_KMP_AFFINITY=BALANCED
In BALANCED mode all the threads are getting mapped between 0-32(As each CPU THREAD has 33 MIC threads).
export MIC_KMP_AFFINITY=COMPACT
I didnt understand why there is a gap between 10-30.
export MIC_KMP_AFFINITY=NONE
In case of NONE, threads are getting allocating based on resource availability. So only this one using all the cores. But in this case if consecutive threads are acting on same data i can not use data locality(they might schedule on different cores).
So i want to set to my affinity in such a way i have to use all the cores and each CPUTHREAD has to launch MIC threads on specific cores like CPUTHREAD 1 should use MIC cores 0-7, CPUTHRAED 2 should use MIC cores 8-14..... How to set my AFFINITY to make this happen.Is it possible to set MIC_KMP_AFFINITY for depends on the CPU THREAD number while program is running? Please let me know .
Thanks
sivaramakrishna

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page