- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have a small linux cluster Oscar/CentOS 5.5: a master and 4 nodes. We are computing only on the nodes. The nodes are identical: 2 hexacores X5650 (so 2*6 cores per node). There are 24 Gb of RAM per nodes. All the cluster is connected with infiniband and the driver of open linux fabrics is used. The intel cluster Toolkit is installed on the master and on all the nodes.
The Problem:
- with intel Cluster Toolkit:
-- 2 jobs 2x8 are running on the nodes; so all the nodes are busy with jobs but there are 4 cores free per nodes. there is enough free RAm on all the nodes.
-- I start a 2x2 job: this is a CFD program. I just notice the duration of a time step: ~ 2.0x10^-1 s (very good!)
-- I start the same job but not 2x2, 1x4. I read the duration of time steps: ~ 1.2 s (very veyr bad! 6x slower)
- with openmpi:
-- the same 2x8 jobs are running.
-- I start the same 2x2 job: the duration of a time step: ~ 3.0x10^-1 s (good but intel mpi does better)
-- I start the same 1x4 job. I read the duration of time steps: ~ 3.0x10^-1 s (so much much better thant inte mpi)
I have probably done a configuration error...but I don't find it. Have anyone a idea ? Where can I start to search ?
Thx a lot,
Best regards
We have a small linux cluster Oscar/CentOS 5.5: a master and 4 nodes. We are computing only on the nodes. The nodes are identical: 2 hexacores X5650 (so 2*6 cores per node). There are 24 Gb of RAM per nodes. All the cluster is connected with infiniband and the driver of open linux fabrics is used. The intel cluster Toolkit is installed on the master and on all the nodes.
The Problem:
- with intel Cluster Toolkit:
-- 2 jobs 2x8 are running on the nodes; so all the nodes are busy with jobs but there are 4 cores free per nodes. there is enough free RAm on all the nodes.
-- I start a 2x2 job: this is a CFD program. I just notice the duration of a time step: ~ 2.0x10^-1 s (very good!)
-- I start the same job but not 2x2, 1x4. I read the duration of time steps: ~ 1.2 s (very veyr bad! 6x slower)
- with openmpi:
-- the same 2x8 jobs are running.
-- I start the same 2x2 job: the duration of a time step: ~ 3.0x10^-1 s (good but intel mpi does better)
-- I start the same 1x4 job. I read the duration of time steps: ~ 3.0x10^-1 s (so much much better thant inte mpi)
I have probably done a configuration error...but I don't find it. Have anyone a idea ? Where can I start to search ?
Thx a lot,
Best regards
Link Copied
19 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tested with -env I_MPI_DEBUG 5:
For the case 1x4 it tells me:
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[0] MPI startup(): I_MPI_DEBUG=5
[1] MPI startup(): set domain to {4,5,6,7} on node n04
[2] MPI startup(): set domain to {8,9,10,11} on node n04
[0] MPI startup(): set domain to {0,1,2,3} on node n04
[0] Rank Pid Node name Pin cpu
[0] 0 23415 n04 {0,1,2,3}
[0] 1 23413 n04 {4,5,6,7}
[0] 2 23414 n04 {8,9,10,11}
So it is using share memory. and that's good.
For the case 2x2 it tells me: shm and ofa transfert so it is good too (I set I_MPI_FABRICS=shm:ofa)
For the case 1x4 it tells me:
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[0] MPI startup(): I_MPI_DEBUG=5
[1] MPI startup(): set domain to {4,5,6,7} on node n04
[2] MPI startup(): set domain to {8,9,10,11} on node n04
[0] MPI startup(): set domain to {0,1,2,3} on node n04
[0] Rank Pid Node name Pin cpu
[0] 0 23415 n04 {0,1,2,3}
[0] 1 23413 n04 {4,5,6,7}
[0] 2 23414 n04 {8,9,10,11}
So it is using share memory. and that's good.
For the case 2x2 it tells me: shm and ofa transfert so it is good too (I set I_MPI_FABRICS=shm:ofa)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tested with another program (the IMB-MPI1 provided in the intel cluster toolkit):
it is quite surprising (I do not copy/paste all the log):
-- for the job 1x4:
#----------------------------------------------------------------
# Benchmarking Scatterv
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.11 0.12 0.11
1 1000 447.40 447.73 447.64
2 1000 0.98 0.98 0.98
4 1000 0.97 0.97 0.97
8 1000 1.08 1.08 1.08
16 1000 0.97 0.97 0.97
32 1000 0.97 0.97 0.97
64 1000 44.12 44.12 44.12
128 1000 1.09 1.09 1.09
256 1000 1.19 1.19 1.19
512 1000 1.24 1.24 1.24
1024 1000 40.89 41.59 41.42
2048 1000 1.80 1.80 1.80
4096 1000 326.15 326.97 326.76
8192 1000 1393.64 1393.65 1393.64
16384 1000 1064.20 1162.20 1100.94
32768 1000 1434.28 1443.94 1441.52
65536 640 10194.96 10227.67 10210.60
131072 320 7958.28 8018.32 7973.31
262144 160 10653.16 10797.76 10750.83
524288 80 19088.16 19260.49 19199.91
1048576 40 15891.75 16334.65 16115.53
2097152 20 29414.09 29636.35 29540.07
4194304 10 96673.39 104203.61 101227.36
-- for the 2x2 job:
#----------------------------------------------------------------
# Benchmarking Scatterv
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.14 0.14 0.14
1 1000 1.91 1.91 1.91
2 1000 1.92 1.92 1.92
4 1000 1.88 1.88 1.88
8 1000 1.91 1.91 1.91
16 1000 1.91 1.91 1.91
32 1000 1.97 1.97 1.97
64 1000 1.99 1.99 1.99
128 1000 2.21 2.21 2.21
256 1000 2.89 2.90 2.90
512 1000 3.17 3.18 3.18
1024 1000 3.79 3.79 3.79
2048 1000 5.04 5.05 5.05
4096 1000 7.42 7.44 7.43
8192 1000 14.30 14.32 14.31
16384 1000 27.08 27.11 27.09
32768 1000 57.79 57.91 57.85
65536 640 110.94 111.05 111.00
131072 320 173.11 173.54 173.37
262144 160 471.64 473.28 472.65
524288 80 885.68 893.69 890.77
1048576 40 1809.90 1836.95 1826.64
2097152 20 3471.26 3581.45 3539.99
4194304 10 6918.19 7352.30 7189.42
So we see that in the case 1x4 shm it is total unstable and slooooowww. I just copy Benchmarking Scatterv but it is the same with the others tests.
The previous tests with IMB-MPI1 was started in a batch job with TORQUE/MAUI. But I get exactly the same behaviour when I start mpirun directly on the node without TORQUE/MAUI.
any ideas ?
Best regards
it is quite surprising (I do not copy/paste all the log):
-- for the job 1x4:
#----------------------------------------------------------------
# Benchmarking Scatterv
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.11 0.12 0.11
1 1000 447.40 447.73 447.64
2 1000 0.98 0.98 0.98
4 1000 0.97 0.97 0.97
8 1000 1.08 1.08 1.08
16 1000 0.97 0.97 0.97
32 1000 0.97 0.97 0.97
64 1000 44.12 44.12 44.12
128 1000 1.09 1.09 1.09
256 1000 1.19 1.19 1.19
512 1000 1.24 1.24 1.24
1024 1000 40.89 41.59 41.42
2048 1000 1.80 1.80 1.80
4096 1000 326.15 326.97 326.76
8192 1000 1393.64 1393.65 1393.64
16384 1000 1064.20 1162.20 1100.94
32768 1000 1434.28 1443.94 1441.52
65536 640 10194.96 10227.67 10210.60
131072 320 7958.28 8018.32 7973.31
262144 160 10653.16 10797.76 10750.83
524288 80 19088.16 19260.49 19199.91
1048576 40 15891.75 16334.65 16115.53
2097152 20 29414.09 29636.35 29540.07
4194304 10 96673.39 104203.61 101227.36
-- for the 2x2 job:
#----------------------------------------------------------------
# Benchmarking Scatterv
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.14 0.14 0.14
1 1000 1.91 1.91 1.91
2 1000 1.92 1.92 1.92
4 1000 1.88 1.88 1.88
8 1000 1.91 1.91 1.91
16 1000 1.91 1.91 1.91
32 1000 1.97 1.97 1.97
64 1000 1.99 1.99 1.99
128 1000 2.21 2.21 2.21
256 1000 2.89 2.90 2.90
512 1000 3.17 3.18 3.18
1024 1000 3.79 3.79 3.79
2048 1000 5.04 5.05 5.05
4096 1000 7.42 7.44 7.43
8192 1000 14.30 14.32 14.31
16384 1000 27.08 27.11 27.09
32768 1000 57.79 57.91 57.85
65536 640 110.94 111.05 111.00
131072 320 173.11 173.54 173.37
262144 160 471.64 473.28 472.65
524288 80 885.68 893.69 890.77
1048576 40 1809.90 1836.95 1826.64
2097152 20 3471.26 3581.45 3539.99
4194304 10 6918.19 7352.30 7189.42
So we see that in the case 1x4 shm it is total unstable and slooooowww. I just copy Benchmarking Scatterv but it is the same with the others tests.
The previous tests with IMB-MPI1 was started in a batch job with TORQUE/MAUI. But I get exactly the same behaviour when I start mpirun directly on the node without TORQUE/MAUI.
any ideas ?
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not familiar with the means for setting up openmpi to run in this fashion (with cores on each node busy with other jobs). However, one of the big differences in defaults between openmpi and Intel mpi is that the former doesn't attempt any affinity settings unless you ask for them, while the latter sets affinity by default (on Intel CPUs), as you quoted, without regard to what might already be running. Your quoted assignments don't agree with your assertion that the jobs are being submitted 1 per node (-perhost 1, or maybe by the more recent scatter option), which conflicts with your statement that you want shared memory. For any chance of useful results, you would have to disable affinity, or, preferably, restrict each job to its own set of CPUs (better yet, to its own nodes, and let them use their standard affinity schemes).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thx for your answer. I did a mistake when I copy/paste the I_MPI_DEBUG 5 report. It was for a job 1x3 (so 3 processes on one node). I will try to disable the affinity in order to compare.
Regards
thx for your answer. I did a mistake when I copy/paste the I_MPI_DEBUG 5 report. It was for a job 1x3 (so 3 processes on one node). I will try to disable the affinity in order to compare.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, I misunderstood what you meant by 2x2 and 1x4, but it does seem likely that performance suffered from multiple jobs pinned to same cores.
Your point about usefulness of shared memory actually is well taken; as more powerful multi-core nodes are introduced without major improvements in inter-node communication, it becomes more important.
Your point about usefulness of shared memory actually is well taken; as more powerful multi-core nodes are introduced without major improvements in inter-node communication, it becomes more important.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have installed "htop" on the master and the nodes. With htop we can see which cores are in use or not:
-- I have the 2x8 jobs. So on each nodes I have 4 cores free.
-- I start a 1x4 on one node (so 4 process on a node) with IMB-MPI1;
-- I open htop and see that 8 cores are in use with the "big" job. and just 2 cores for the the IMB-MPI1. so the 4 processes of the IMB-MPI1 are running just on 2 cores...and I have 2 cores which do nothing...so I think that the problem...any idea to solve that ?
thx for your help!
I have installed "htop" on the master and the nodes. With htop we can see which cores are in use or not:
-- I have the 2x8 jobs. So on each nodes I have 4 cores free.
-- I start a 1x4 on one node (so 4 process on a node) with IMB-MPI1;
-- I open htop and see that 8 cores are in use with the "big" job. and just 2 cores for the the IMB-MPI1. so the 4 processes of the IMB-MPI1 are running just on 2 cores...and I have 2 cores which do nothing...so I think that the problem...any idea to solve that ?
thx for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's really strange:
let's say I have 2 nodes totally free. so 12 free cores on each nodes. I'm starting a 1x4 job, so 4 processes just on one node (mpirun -4 IMB-MPI1). WHen I open htop, i'm seeing that 4 cores are busy. so one for each processes. so no problem! I stop the job.
now I start the "big" job on the nodes: 2x8. WHen I open htop, I'm seeing that 8 cores are busy on each nodes. So that's correct. the big job is running and I have 4 free cores on each nodes. I decide to start the 1x4 IMB-MPI1 job. So I log onto the node and type mpirun -4 IMB-MPI1. When I open htop, i'm seeing that 2 cores are busy and 2 are free. So that's not correct...strange...
let's say I have 2 nodes totally free. so 12 free cores on each nodes. I'm starting a 1x4 job, so 4 processes just on one node (mpirun -4 IMB-MPI1). WHen I open htop, i'm seeing that 4 cores are busy. so one for each processes. so no problem! I stop the job.
now I start the "big" job on the nodes: 2x8. WHen I open htop, I'm seeing that 8 cores are busy on each nodes. So that's correct. the big job is running and I have 4 free cores on each nodes. I decide to start the 1x4 IMB-MPI1 job. So I log onto the node and type mpirun -4 IMB-MPI1. When I open htop, i'm seeing that 2 cores are busy and 2 are free. So that's not correct...strange...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you write "disable the affinity", you mean I_MPI_PIN = 0 ???
When I disable the processes pinning and start my 1x4 job I get 4 cores busy, so the correct behaviour:
mpirun -genv I_MPI_PIN 0 -np 4 IMB-MPI1
Is it really without risk to disable processes pinning ?
Or is there a better solution ?
When I disable the processes pinning and start my 1x4 job I get 4 cores busy, so the correct behaviour:
mpirun -genv I_MPI_PIN 0 -np 4 IMB-MPI1
Is it really without risk to disable processes pinning ?
Or is there a better solution ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried I_MPI_PIN_DOMAIN=auto but it isn't optimal too...mpirun -genv I_MPI_PIN_DOMAIN auto -np 4 IMB-MPI1 starts on 3 cores...
but mpirun -genv I_MPI_PIN_DOMAIN node -np 4 IMB-MPI1 starts on 4 cores...so it seems good. Is there any problems to use I_MPI_PIN_DOMAIN=node ??
Regards
but mpirun -genv I_MPI_PIN_DOMAIN node -np 4 IMB-MPI1 starts on 4 cores...so it seems good. Is there any problems to use I_MPI_PIN_DOMAIN=node ??
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default (auto) options of I_MPI_PIN_DOMAIN and I_MPI_PIN_PROCS assume that all visible cores are available to your job, so they aren't suitable when running multiple jobs on the same nodes, as you found. The default action of the OS scheduler may be better, if you can't take the care to set affinity of each job to a non-overlapping group of cores.
You take risks in running multiple MPI jobs on the same nodes; I don't see that you increase it by improving scheduling.
You take risks in running multiple MPI jobs on the same nodes; I don't see that you increase it by improving scheduling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guillaume,
>Is it really without risk to disable processes pinning ?
Yes it is. You can disable pinning and an Operating System will place processes on different cores. I_MPI_PIN_DOMAIN equal to 'node' means the same - pinning is diabled.
Unfortunately, there is no better solution for now if you are going to run several application at the same time on a node.
Regards!
Dmitry
>Is it really without risk to disable processes pinning ?
Yes it is. You can disable pinning and an Operating System will place processes on different cores. I_MPI_PIN_DOMAIN equal to 'node' means the same - pinning is diabled.
Unfortunately, there is no better solution for now if you are going to run several application at the same time on a node.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok. I played a little bit with I_MPI_PIN_DOMAIN, but I don't find better solution...so pinning is now disabled. too bad :'( with pinning the calculations are faster :'(
Regards,
Guillaume
Regards,
Guillaume
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You must pin all the jobs running on the same nodes so as to minimize mutual interference. It's hardly worth the trouble until platforms like Xeon EX become cost effective for cluster computing.
Wouldn't running each job on its own nodes be a better solution?
Wouldn't running each job on its own nodes be a better solution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We don't have a lot of nodes (just 4) at the moment. So when there is free cores on our nodes, we would like to use them...
I don't understand clearly the pinning at the moment:
For example I set I_MPI_PIN_DOMAIN=core. I start a job 1x4 (1 node: 4 cores). The first cores are selected and the processes are pinned on these cores (0,1,2,3). Just for the test I start another job 1x4 on the same node with the same I_MPI_PIN_DOMAIN=core. And it select the (0,1,2,3) cores too...so I have 2 jobs which are running on the same 4 first cores...and the 8 another one are totally free. Why mpirun doesn't see that the 4 first cores are busy and so select the 4 next one ?
Regards
We don't have a lot of nodes (just 4) at the moment. So when there is free cores on our nodes, we would like to use them...
I don't understand clearly the pinning at the moment:
For example I set I_MPI_PIN_DOMAIN=core. I start a job 1x4 (1 node: 4 cores). The first cores are selected and the processes are pinned on these cores (0,1,2,3). Just for the test I start another job 1x4 on the same node with the same I_MPI_PIN_DOMAIN=core. And it select the (0,1,2,3) cores too...so I have 2 jobs which are running on the same 4 first cores...and the 8 another one are totally free. Why mpirun doesn't see that the 4 first cores are busy and so select the 4 next one ?
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guillaume,
I_MPI_PIN_DOMAIN is mainly used for hybrid applications (openMP). Each MPI process in this case may create several threads. Setting I_MPI_PIN_DOMAIN to 'core' means that you create a 'domain' which consists of one processor only and openMP threads will be executed on this core (processor).
It seems to me that explanation in the Reference Manual is good enough.
Intel MPI library was developed to run in exclusive mode and cannot check workload of a system. There is a feature request to add such functionality but it was not implemented yet.
The easiest way for you is to disable pinning. I think you'll benefit by running 2 (1x4) applications on the same node which will run in parallel in this case. What is the perfomance degradation with disabled pinning (one 1x4 application)?
Regards!
Dmitry
I_MPI_PIN_DOMAIN is mainly used for hybrid applications (openMP). Each MPI process in this case may create several threads. Setting I_MPI_PIN_DOMAIN to 'core' means that you create a 'domain' which consists of one processor only and openMP threads will be executed on this core (processor).
It seems to me that explanation in the Reference Manual is good enough.
Intel MPI library was developed to run in exclusive mode and cannot check workload of a system. There is a feature request to add such functionality but it was not implemented yet.
The easiest way for you is to disable pinning. I think you'll benefit by running 2 (1x4) applications on the same node which will run in parallel in this case. What is the perfomance degradation with disabled pinning (one 1x4 application)?
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
yeah I have read the Reference Manual. I don't know If I have understood it correctly...but that's another point ;)
Ok. I didn't know the point that "Intel MPI library was developed to run in exclusive mode and cannot check workload of a system".
Ther performance degradation without pinning is:
-- with pinning a time step with our fluid solver is 2x10^-1
-- without pinning a time step with our fluid solver is 3x10^-1.
So it is relativ correct. But with htop I'm seeing that the processes jump from a core to another. I will take a look to the taskset command...perhaps I can do self pinning :)
Best Regards,
Guillaume
yeah I have read the Reference Manual. I don't know If I have understood it correctly...but that's another point ;)
Ok. I didn't know the point that "Intel MPI library was developed to run in exclusive mode and cannot check workload of a system".
Ther performance degradation without pinning is:
-- with pinning a time step with our fluid solver is 2x10^-1
-- without pinning a time step with our fluid solver is 3x10^-1.
So it is relativ correct. But with htop I'm seeing that the processes jump from a core to another. I will take a look to the taskset command...perhaps I can do self pinning :)
Best Regards,
Guillaume
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Guillaume,
Intel MPI library was developed to run one MPI application at a time.
Taskset will work only in case of disabled pinning. You should be very careful due to the cpu numbers used by this utility have BIOS ordering and may be different on different clusters.
I think you may run taskset command explicitly. For instance,
$ mpiexec -perhost 4 -n 4 -env I_MPI_PIN disable taskset -c 0-3 application_name1
$ mpiexec -perhost 4 -n 4 -env I_MPI_PIN disable taskset -c 4-7 application_name2
Give it a try.
Regards!
Dmitry
Intel MPI library was developed to run one MPI application at a time.
Taskset will work only in case of disabled pinning. You should be very careful due to the cpu numbers used by this utility have BIOS ordering and may be different on different clusters.
I think you may run taskset command explicitly. For instance,
$ mpiexec -perhost 4 -n 4 -env I_MPI_PIN disable taskset -c 0-3 application_name1
$ mpiexec -perhost 4 -n 4 -env I_MPI_PIN disable taskset -c 4-7 application_name2
Give it a try.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thx! I will try it.
if I start the command explicitly, I can use the I_MPI_PIN enable with I_MPI_PIN_PROCESSOR_LIST set to the right cores, can't I ?
I find an interesting link on the web:
autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems
http://www.hipeac.net/system/files?file=carsten.pdf
I don't know where I can find this tool...but it seems interesting.
if I start the command explicitly, I can use the I_MPI_PIN enable with I_MPI_PIN_PROCESSOR_LIST set to the right cores, can't I ?
I find an interesting link on the web:
autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems
http://www.hipeac.net/system/files?file=carsten.pdf
I don't know where I can find this tool...but it seems interesting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>if I start the command explicitly, I can use the I_MPI_PIN enable with
I_MPI_PIN_PROCESSOR_LIST set to the right cores, can't I ?
Yes, of cause you can.
>I don't know where I can find this tool...but it seems interesting.
Yeah, the tool is quite interesting. You probably need to get in contact with the author. I'm not sure that this is a product.
Best wishes,
Dmitry
Yes, of cause you can.
>I don't know where I can find this tool...but it seems interesting.
Yeah, the tool is quite interesting. You probably need to get in contact with the author. I'm not sure that this is a product.
Best wishes,
Dmitry

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page