- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
On Xeon Phi, if I run an application with 32 threads and 64 active cores (max 64*4 threads) then will the system allocate:
1) 1 thread to each core
2) 4 thread to each core
3) Or, I need to map the threads in the code using affinity or flag like KMP_AFFINITY?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The answer is definitely (3).
The tools available for binding depend on the software you are using. Binding controls are available from Linux (taskset & numactl for external use, and sched_setaffinity() for inline use), from the OpenMP runtime library (using GNU, OpenMP, or Intel-specific environment variables), from the MPI library (again, different options for different implementations).
It is not always easy to understand the interactions of these mechanisms, and it is sometimes difficult to find all of the relevant environment variables for each of the software layers. Fortunately, the Intel software usually makes it easy to see what binding is actually being applied, so you can keep fiddling until you get the desired behavior. For codes using the Intel MPI library, I use "I_MPI_DEBUG=4" (or any higher value), and for codes using the Intel Fortran and/or C compilers (including MPI codes), I add the "verbose" option to the KMP_AFFINITY variable.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you don't tell the OS what to do, it will do something :-). This is no different from running a two thread code on a machine that can support more than two logicalCPUs.
If you want to control what happens and force a specific binding, then you need to enforce it. If you're using Intel (or LLVM) OpenMP you cn force that by using the KMP_HW_SUBSET envirable. If not, then taskset may be your friend.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The answer is definitely (3).
The tools available for binding depend on the software you are using. Binding controls are available from Linux (taskset & numactl for external use, and sched_setaffinity() for inline use), from the OpenMP runtime library (using GNU, OpenMP, or Intel-specific environment variables), from the MPI library (again, different options for different implementations).
It is not always easy to understand the interactions of these mechanisms, and it is sometimes difficult to find all of the relevant environment variables for each of the software layers. Fortunately, the Intel software usually makes it easy to see what binding is actually being applied, so you can keep fiddling until you get the desired behavior. For codes using the Intel MPI library, I use "I_MPI_DEBUG=4" (or any higher value), and for codes using the Intel Fortran and/or C compilers (including MPI codes), I add the "verbose" option to the KMP_AFFINITY variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may wish to use KMP_AFFINITY=scatter
*** However, this will distribute software threads distributed amongst cores (or sockets on MP) starting at the base logical processor. IOW, if you have two (multiple) processes (programs), each may be assigned to the same scattered hardware threads.
In addition to KMP_HW_SUBSET (or taskset) you have OMP_PLACES and OMP_PROC_BIND.
See: http://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-affinity.html
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James
Cownie, James H (Intel) wrote:
If you don't tell the OS what to do, it will do something :-). This is no different from running a two thread code on a machine that can support more than two logicalCPUs.
System Details:
- Operating System: CentOS Linux 7 (Core)
- Kernel: Linux 3.10.0-514.10.2.el7.x86_64
- Architecture: x86-64 Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Hi John
McCalpin, John wrote:
It is not always easy to understand the interactions of these mechanisms, and it is sometimes difficult to find all of the relevant environment variables for each of the software layers. Fortunately, the Intel software usually makes it easy to see what binding is actually being applied, so you can keep fiddling until you get the desired behavior. For codes using the Intel MPI library, I use "I_MPI_DEBUG=4" (or any higher value), and for codes using the Intel Fortran and/or C compilers (including MPI codes), I add the "verbose" option to the KMP_AFFINITY variable.
What if I am using standard benchmarks that Intel suggest to run using micprun in this document (page 9)?
Also, validation of whether they are running correctly on the cores is required, hence my question of core utilization which I assume will help me validate threads and where they are running.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Cownie, James H (Intel) wrote:
If you want to control what happens and force a specific binding, then you need to enforce it. If you're using Intel (or LLVM) OpenMP you cn force that by using the KMP_HW_SUBSET envirable. If not, then taskset may be your friend.
I did a small experiment:
Benchmark: micprun -k linpack -p "--omp_num_threads 192 --matrix_size 40960 --num_rep 1"
Validation: turbostat
Using above approach, it seems the 192 threads were equally distributed to all the available physical and virtual (threads) cores. Each core got 3 threads. I am validating this based on the "%Busy" column in below log. Same observation is valid for 16/32/64/128/256 threads.
Is my intuition that threads are distributed equally among all the available resources correct? May be the linkpack benchmark has flag that is taking care of this, but I don't have the source used by micprun to validate the coding methodology used.
Log:
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c6 CoreTmp PkgTmp Pkg%pc3 Pkg%pc6 PkgWatt RAMWatt PKG_% RAM_% - 975 75.01 1300 1300 0 24.99 0.00 34 36 0.00 0.00 119.19 10.29 0.00 0.00 0 1299 99.90 1300 1300 0 0.10 0.00 33 36 0.00 0.00 119.19 10.29 0.00 0.00 64 1299 99.90 1300 1300 0 0.10 128 1299 99.90 1300 1300 0 0.10 192 199 15.34 1300 1300 0 84.66 1 1299 99.90 1300 1300 0 0.10 0.00 33 65 1299 99.90 1300 1300 0 0.10 129 1299 99.90 1300 1300 0 0.10 193 1 0.08 1301 1300 0 99.92 2 1299 99.90 1300 1300 0 0.10 0.00 32 66 1299 99.90 1300 1300 0 0.10 130 1299 99.90 1300 1300 0 0.10 194 1 0.08 1334 1300 0 99.92 3 1299 99.90 1300 1300 0 0.10 0.00 32 67 1299 99.90 1300 1300 0 0.10 131 1299 99.90 1300 1300 0 0.10 195 2 0.15 1301 1300 0 99.85 4 1299 99.90 1300 1300 0 0.10 0.00 31 68 1299 99.90 1300 1300 0 0.10 132 1299 99.90 1300 1300 0 0.10 196 1 0.04 1303 1300 0 99.96 5 1299 99.90 1300 1300 0 0.10 0.00 31 69 1299 99.90 1300 1300 0 0.10 133 1299 99.90 1300 1300 0 0.10 197 1 0.06 1279 1300 0 99.94 6 1299 99.90 1300 1300 0 0.10 0.00 32 70 1299 99.90 1300 1300 0 0.10 134 1299 99.90 1300 1300 0 0.10 198 1 0.05 1298 1300 0 99.95 7 1299 99.90 1300 1300 0 0.10 0.00 32 71 1299 99.90 1300 1300 0 0.10 135 1299 99.90 1300 1300 0 0.10 199 6 0.47 1300 1300 0 99.53 8 1299 99.90 1300 1300 0 0.10 0.00 33 72 1299 99.90 1300 1300 0 0.10 136 1299 99.90 1300 1300 0 0.10 200 1 0.07 1297 1300 0 99.93 9 1299 99.90 1300 1300 0 0.10 0.00 34 73 1299 99.90 1300 1300 0 0.10 137 1299 99.90 1300 1300 0 0.10 201 1 0.06 1298 1300 0 99.94 10 1299 99.90 1300 1300 0 0.10 0.00 33 74 1299 99.90 1300 1300 0 0.10 138 1299 99.90 1300 1300 0 0.10 202 1 0.05 1303 1300 0 99.95 11 1299 99.90 1300 1300 0 0.10 0.00 33 75 1299 99.90 1300 1300 0 0.10 139 1299 99.90 1300 1300 0 0.10 203 0 0.04 1301 1300 0 99.96 12 1298 99.90 1300 1300 0 0.10 0.00 31 76 1298 99.90 1300 1300 0 0.10 140 1298 99.90 1300 1300 0 0.10 204 1 0.07 1303 1300 0 99.93 13 1298 99.90 1300 1300 0 0.10 0.00 31 77 1298 99.90 1300 1300 0 0.10 141 1298 99.90 1300 1300 0 0.10 205 1 0.05 1301 1300 0 99.95 14 1298 99.90 1300 1300 0 0.10 0.00 33 78 1298 99.90 1300 1300 0 0.10 142 1298 99.90 1300 1300 0 0.10 206 1 0.05 1300 1300 0 99.95 15 1299 99.90 1300 1300 0 0.10 0.00 33 79 1299 99.90 1300 1300 0 0.10 143 1299 99.90 1300 1300 0 0.10 207 1 0.05 1298 1300 0 99.95 16 1299 99.90 1300 1300 0 0.10 0.00 32 80 1299 99.90 1300 1300 0 0.10 144 1299 99.90 1300 1300 0 0.10 208 1 0.08 1301 1300 0 99.92 17 1299 99.90 1300 1300 0 0.10 0.00 32 81 1299 99.90 1300 1300 0 0.10 145 1299 99.90 1300 1300 0 0.10 209 1 0.06 1301 1300 0 99.94 18 1299 99.90 1300 1300 0 0.10 0.00 31 82 1299 99.90 1300 1300 0 0.10 146 1299 99.90 1300 1300 0 0.10 210 1 0.04 1302 1300 0 99.96 19 1299 99.90 1300 1300 0 0.10 0.00 31 83 1299 99.90 1300 1300 0 0.10 147 1299 99.90 1300 1300 0 0.10 211 1 0.06 1300 1300 0 99.94 20 1299 99.90 1300 1300 0 0.10 0.00 32 84 1299 99.90 1300 1300 0 0.10 148 1299 99.90 1300 1300 0 0.10 212 1 0.06 1298 1300 0 99.94 21 1299 99.90 1300 1300 0 0.10 0.00 32 85 1299 99.90 1300 1300 0 0.10 149 1299 99.90 1300 1300 0 0.10 213 1 0.04 1300 1300 0 99.96 22 1299 99.90 1300 1300 0 0.10 0.00 31 86 1299 99.90 1300 1300 0 0.10 150 1299 99.90 1300 1300 0 0.10 214 1 0.05 1301 1300 0 99.95 23 1299 99.90 1300 1300 0 0.10 0.00 32 87 1299 99.90 1300 1300 0 0.10 151 1299 99.90 1300 1300 0 0.10 215 1 0.05 1299 1300 0 99.95 24 1299 99.90 1300 1300 0 0.10 0.00 32 88 1299 99.90 1300 1300 0 0.10 152 1299 99.90 1300 1300 0 0.10 216 1 0.05 1300 1300 0 99.95 25 1299 99.90 1300 1300 0 0.10 0.00 32 89 1299 99.90 1300 1300 0 0.10 153 1299 99.90 1300 1300 0 0.10 217 1 0.06 1298 1300 0 99.94 26 1299 99.90 1300 1300 0 0.10 0.00 32 90 1299 99.90 1300 1300 0 0.10 154 1299 99.90 1300 1300 0 0.10 218 1 0.05 1299 1300 0 99.95 27 1299 99.90 1300 1300 0 0.10 0.00 32 91 1299 99.90 1300 1300 0 0.10 155 1299 99.90 1300 1300 0 0.10 219 1 0.05 1299 1300 0 99.95 28 1299 99.90 1300 1300 0 0.10 0.00 32 92 1299 99.90 1300 1300 0 0.10 156 1299 99.90 1300 1300 0 0.10 220 1 0.04 1301 1300 0 99.96 29 1299 99.90 1300 1300 0 0.10 0.00 32 93 1299 99.90 1300 1300 0 0.10 157 1299 99.90 1300 1300 0 0.10 221 1 0.04 1299 1300 0 99.96 30 1299 99.90 1300 1300 0 0.10 0.00 33 94 1299 99.90 1300 1300 0 0.10 158 1299 99.90 1300 1300 0 0.10 222 1 0.05 1302 1300 0 99.95 31 1299 99.90 1300 1300 0 0.10 0.00 33 95 1299 99.90 1300 1300 0 0.10 159 1299 99.90 1300 1300 0 0.10 223 1 0.04 1155 1300 0 99.96 32 1299 99.90 1300 1300 0 0.10 0.00 32 96 1299 99.90 1300 1300 0 0.10 160 1299 99.90 1300 1300 0 0.10 224 1 0.04 1302 1300 0 99.96 33 1299 99.90 1300 1300 0 0.10 0.00 32 97 1299 99.90 1300 1300 0 0.10 161 1299 99.90 1300 1300 0 0.10 225 1 0.04 1302 1300 0 99.96 34 1299 99.90 1300 1300 0 0.10 0.00 32 98 1299 99.90 1300 1300 0 0.10 162 1299 99.90 1300 1300 0 0.10 226 1 0.04 1298 1300 0 99.96 35 1299 99.90 1300 1300 0 0.10 0.00 32 99 1299 99.90 1300 1300 0 0.10 163 1299 99.90 1300 1300 0 0.10 227 1 0.04 1296 1300 0 99.96 36 1299 99.90 1300 1300 0 0.10 0.00 31 100 1299 99.90 1300 1300 0 0.10 164 1299 99.90 1300 1300 0 0.10 228 1 0.05 1302 1300 0 99.95 37 1299 99.90 1300 1300 0 0.10 0.00 31 101 1299 99.90 1300 1300 0 0.10 165 1299 99.90 1300 1300 0 0.10 229 1 0.10 1300 1300 0 99.90 38 1299 99.90 1300 1300 0 0.10 0.00 32 102 1299 99.90 1300 1300 0 0.10 166 1299 99.90 1300 1300 0 0.10 230 1 0.05 1300 1300 0 99.95 39 1299 99.90 1300 1300 0 0.10 0.00 32 103 1299 99.90 1300 1300 0 0.10 167 1299 99.90 1300 1300 0 0.10 231 1 0.05 1301 1300 0 99.95 40 1299 99.90 1300 1300 0 0.10 0.00 32 104 1299 99.90 1300 1300 0 0.10 168 1299 99.90 1300 1300 0 0.10 232 1 0.05 1302 1300 0 99.95 41 1299 99.90 1300 1300 0 0.10 0.00 32 105 1299 99.90 1300 1300 0 0.10 169 1299 99.90 1300 1300 0 0.10 233 1 0.04 1296 1300 0 99.96 42 1299 99.90 1300 1300 0 0.10 0.00 32 106 1299 99.90 1300 1300 0 0.10 170 1299 99.90 1300 1300 0 0.10 234 1 0.05 1299 1300 0 99.95 43 1299 99.90 1300 1300 0 0.10 0.00 32 107 1299 99.90 1300 1300 0 0.10 171 1299 99.90 1300 1300 0 0.10 235 1 0.05 1298 1300 0 99.95 44 1299 99.90 1300 1300 0 0.10 0.00 31 108 1299 99.90 1300 1300 0 0.10 172 1299 99.90 1300 1300 0 0.10 236 1 0.08 1300 1300 0 99.92 45 1299 99.90 1300 1300 0 0.10 0.00 31 109 1299 99.90 1300 1300 0 0.10 173 1299 99.90 1300 1300 0 0.10 237 1 0.05 1303 1300 0 99.95 46 1299 99.90 1300 1300 0 0.10 0.00 31 110 1299 99.90 1300 1300 0 0.10 174 1299 99.90 1300 1300 0 0.10 238 1 0.04 1304 1300 0 99.96 47 1299 99.90 1300 1300 0 0.10 0.00 31 111 1299 99.90 1300 1300 0 0.10 175 1299 99.90 1300 1300 0 0.10 239 1 0.05 1390 1300 0 99.95 48 1299 99.90 1300 1300 0 0.10 0.00 32 112 1299 99.90 1300 1300 0 0.10 176 1299 99.90 1300 1300 0 0.10 240 1 0.05 1302 1300 0 99.95 49 1299 99.90 1300 1300 0 0.10 0.00 32 113 1299 99.90 1300 1300 0 0.10 177 1299 99.90 1300 1300 0 0.10 241 1 0.05 1299 1300 0 99.95 50 1299 99.90 1300 1300 0 0.10 0.00 30 114 1299 99.90 1300 1300 0 0.10 178 1299 99.90 1300 1300 0 0.10 242 1 0.06 1304 1300 0 99.94 51 1299 99.90 1300 1300 0 0.10 0.00 30 115 1299 99.90 1300 1300 0 0.10 179 1299 99.90 1300 1300 0 0.10 243 1 0.05 1303 1300 0 99.95 52 1299 99.90 1300 1300 0 0.10 0.00 31 116 1299 99.90 1300 1300 0 0.10 180 1299 99.90 1300 1300 0 0.10 244 1 0.06 1182 1300 0 99.94 53 1299 99.90 1300 1300 0 0.10 0.00 32 117 1299 99.90 1300 1300 0 0.10 181 1299 99.90 1300 1300 0 0.10 245 1 0.05 1304 1300 0 99.95 54 1299 99.90 1300 1300 0 0.10 0.00 31 118 1299 99.90 1300 1300 0 0.10 182 1299 99.90 1300 1300 0 0.10 246 1 0.06 1304 1300 0 99.94 55 1299 99.90 1300 1300 0 0.10 0.00 31 119 1299 99.90 1300 1300 0 0.10 183 1299 99.90 1300 1300 0 0.10 247 1 0.05 1302 1300 0 99.95 56 1299 99.90 1300 1300 0 0.10 0.00 33 120 1299 99.90 1300 1300 0 0.10 184 1299 99.90 1300 1300 0 0.10 248 1 0.05 1297 1300 0 99.95 57 1299 99.90 1300 1300 0 0.10 0.00 33 121 1299 99.90 1300 1300 0 0.10 185 1299 99.90 1300 1300 0 0.10 249 6 0.45 1300 1300 0 99.55 58 1299 99.90 1300 1300 0 0.10 0.00 32 122 1299 99.90 1300 1300 0 0.10 186 1299 99.90 1300 1300 0 0.10 250 1 0.05 1299 1300 0 99.95 59 1299 99.90 1300 1300 0 0.10 0.00 32 123 1299 99.90 1300 1300 0 0.10 187 1299 99.90 1300 1300 0 0.10 251 1 0.06 1303 1300 0 99.94 60 1299 99.90 1300 1300 0 0.10 0.00 32 124 1299 99.90 1300 1300 0 0.10 188 1299 99.90 1300 1300 0 0.10 252 1 0.11 1301 1300 0 99.89 61 1299 99.90 1300 1300 0 0.10 0.00 32 125 1299 99.90 1300 1300 0 0.10 189 1299 99.90 1300 1300 0 0.10 253 3 0.24 1300 1300 0 99.76 62 1299 99.90 1300 1300 0 0.10 0.00 33 126 1299 99.90 1300 1300 0 0.10 190 1299 99.90 1300 1300 0 0.10 254 1 0.08 1298 1300 0 99.92 63 1299 99.90 1300 1300 0 0.10 0.00 33 127 1299 99.90 1300 1300 0 0.10 191 1299 99.90 1300 1300 0 0.10 255 9 0.69 1300 1300 0 99.31
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
May be the linkpack benchmark has flag that is taking care of this, but I don't have the source used by micprun to validate the coding methodology used.
Micperf package provides useful help command:
micprun -k linpack -p help
You can dig even deeper as micprun is written in python, I don't recommend it but you can read the code with:
vim `which micprun`
I case of linpack, micprun just wraps executable provided by MKL.
The KMP_AFFINITY set by micprun is KMP_AFFINITY=compact,1,0
Regards
Sebastian
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page