Assuming an instruction accesses memory and stores a register's content into it, on x86_64 architecture. For Example,
bndcl rax, [rip + offset1] mov rax, [rip + offset1]
My question is, if the segmentation is enabled, and the access to same memory address was done with segment register, will it incur more overhead(latency or throughput)?For Example,
bndcl rax, gs:[rip + offset2] mov rax, gs:[rip + offset2]
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
When explicit segmentation is specified, the instruction byte sequence has an additional segment override prefix byte. This may add a small amount of overhead (more instruction bytes read), but in some cases when it causes a tight instruction loop to expand to one more cache line, it might be more noticeable.
The ISA always uses segment/selector registers implicitly. In virtual flat model, they ostensibly have the same: base, size and granularity. Also note, some TLS (Thread Local Storage) implementations may use fs or gs to point to the TLS for the thread.