- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not think there are constant registers in x86. When I define a const array, x86 access these constants from a memory but not a direct constant in instruction. Any instructions can assign a 128bit/256bit constant to a SSE/AVX register?
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Chang-li,
I understand what you are trying to achieve, but I fear that you would not gain much. Assuming there was an load instruction for YMM registers with an immediate, the enconding would be longer than 32 bytes. This would result in some major hick-ups in the the core. For example, the loop-stream detector processes the instructions in 32-byte chunks. Therefore, your instruction wouldn't even fit in one chunk!
On the other hand you have two load ports and can do up to two loads per cycles. Reading a constant from memory can be pipelined nicely with other loads as there are no dependencies. When you are absolutely limited by the number of loads, keeping at least some of the constants in a register might help as a last resort.
Kind regards
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>On the other hand you have two load ports and can do up to two loads per cycles.>>>
Will it stay the same on Haswell architecture?I mean load/store ports
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thomas Willhalm (Intel) wrote:
Chang-li,
I understand what you are trying to achieve, but I fear that you would not gain much. Assuming there was an load instruction for YMM registers with an immediate, the enconding would be longer than 32 bytes. This would result in some major hick-ups in the the core. For example, the loop-stream detector processes the instructions in 32-byte chunks. Therefore, your instruction wouldn't even fit in one chunk!
On the other hand you have two load ports and can do up to two loads per cycles. Reading a constant from memory can be pipelined nicely with other loads as there are no dependencies. When you are absolutely limited by the number of loads, keeping at least some of the constants in a register might help as a last resort.
Kind regards
Thomas
It is true for YMM* that is 256-bit (32 bytes). But XMM* is 128-bit (16 bytes) that a direct constant instruction can be in one chunk.
Chang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
>>...But XMM* is 128-bit (16 bytes) that a direct constant instruction can be in one chunk...
What about throughput of instructions? For example, in case of a General Purpose MOV instruction it is 3 instructions in one clock cycle. Take a look at Intel Optimization Reference for more information.
There is no XMM* direct constant assign instruction yet.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »