- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I found a VNNI sample code provided by Intel.
It declares a data type, __m512i, which is mapping to registers in CPU.
As I know, the number of registers in a CPU is limited.
Here is the information of CPU I ran
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Stepping: 7 CPU MHz: 1337.399 BogoMIPS: 5000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-47 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
I have some questions
1. How to know how many registers can be allocated in a program?
2. If the number of registers in a program is more than limitation of a CPU, is there any error message?
3. For multi-core / multi-thread CPU, are all registers shared or independent?
4. As mentioned in 3, if they are independent, how to know how many registers per core / per thread?
Lot of thanks
chiungliang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have some questions
1. How to know how many registers can be allocated in a program?
For the architectural GP registers: 16 (64-bit mode)
For the architectural SIMD registers: 32 (AVX512 ISA)
There are additional registers like debug,floating-point state,model specific registers.
2. If the number of registers in a program is more than limitation of a CPU, is there any error message?
Architectural registers are mapped to physical (hidden from software) integer and floating-point register file(s). There are usually files one for integer PRF and second for floating-point PRF. The question is what is the size of floating-point PRF on Skylake-uarch. The size could 512-bit divided into 4 128-bit lanes with own power up/down circuitry and wires connecting the different lanes.
3. For multi-core / multi-thread CPU, are all registers shared or independent?
Each core has own set of physical registers which on HT machine could be fairly shared between the threads.
4. As mentioned in 3, if they are independent, how to know how many registers per core / per thread?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Since the query is more about architecture fundamentals , we are moving it to respective forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, thanks a lot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For all Intel processors, the number of register names is determined by the processor's mode of operation (32-bit or 64-bit) and the instruction set in use. The available registers for each mode are described in section 3.2 of Volume 1 of the Intel Architectures Software Developer's Manual (document 253665), available at https://software.intel.com/en-us/articles/intel-sdm
Attempting to use a register name that does not exist would trigger an error in the assembler.
Most (all?) Intel processors use "register renaming" to minimize stalls due to false dependencies. The number of "physical registers" onto which the named registers are mapped is sometimes mentioned in technical publications, but is not part of the implementation that is visible through mechanisms like CPUID.
There are at least enough "physical registers" for each "logical processor" to have all "named registers" mapped at the same time, so HyperThreading does not reduce the number of register names available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have some questions
1. How to know how many registers can be allocated in a program?
For the architectural GP registers: 16 (64-bit mode)
For the architectural SIMD registers: 32 (AVX512 ISA)
There are additional registers like debug,floating-point state,model specific registers.
2. If the number of registers in a program is more than limitation of a CPU, is there any error message?
Architectural registers are mapped to physical (hidden from software) integer and floating-point register file(s). There are usually files one for integer PRF and second for floating-point PRF. The question is what is the size of floating-point PRF on Skylake-uarch. The size could 512-bit divided into 4 128-bit lanes with own power up/down circuitry and wires connecting the different lanes.
3. For multi-core / multi-thread CPU, are all registers shared or independent?
Each core has own set of physical registers which on HT machine could be fairly shared between the threads.
4. As mentioned in 3, if they are independent, how to know how many registers per core / per thread?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks you both.
Since there are 32 ZMM registers,
I think if I try to allocate more than 32 512-bit register,
it might result in some problems.
I try to allocate more than 32 512-bit registers in a sub-routine
int main() { const int NUM = 1024; const int NUM_MUL64 = NUM*64; __m512i a[NUM]; __m512i b[NUM]; int8_t int_a[NUM_MUL64]; int8_t int_b[NUM_MUL64]; int8_t int_c[NUM_MUL64]; int8_t *p_a = int_a; int8_t *p_b = int_b; int8_t *p_c = int_c; //for(int i=0; i<NUM_MUL64; i++) //{ // int_a = (int8_t)i; // int_b = (int8_t)i; //} memset(int_a, 0, sizeof(int8_t)*NUM_MUL64); memset(int_b, 0, sizeof(int8_t)*NUM_MUL64); for(int i=0; i<NUM; i++) { a = _mm512_loadu_si512(p_a); b = _mm512_loadu_si512(p_b); a = _mm512_add_epi8(a, b); _mm512_storeu_si512((void*)p_c, a); p_a += 64; p_b += 64; p_c += 64; } return 0; }
There is no error message.
Is there anyone know how it work?
Lot of thanks
chiungliang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you are using intrinsics, the compiler will take care of register assignment. In other words, with your code, your are not "allocating registers" but using variables that might be mapped to registers or memory. You can see these if you disassemble the binary.This is not different from normal C/C++ code, where the compiler can choose to directly use registers for variables or use memory (and then copy to registers for computation).
If you want to have full control of what registers are used, you will have to write in assembly. However, it is one of the big advantages of using intrinsics in C/C++ that you don't need to deal with register assignment, because the compiler usually is good at it.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page