- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to use only a part of a 128 register (__m128i, __m128), say upper 64 or lower 64 bits only if the use of the older 64 bit register is not desirable?
Does this affect the performance (since only a part of the register is used while the rest is still be processed).
Deepak
Does this affect the performance (since only a part of the register is used while the rest is still be processed).
Deepak
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We are in the process of consolidating the 64-bit Programming forum into the Intel AVX and CPU Instructions forum and Intel C++ Compiler forum, so I am moving this thread to the C++ Compiler forum, as it is related to assembler programming.
Thanks for your patience.
==
Aubrey W.
Intel Software Network Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Deepak,
For floating-point numbers, there are instructions that process only 1 value, e.g. mulss for single-precision multiplication and mullsd for double-precision multiplication. However, they have the same latency and throughput as multiplying the whole register.
For integer numbers, such instruction do not exists. Nevertheless, you can still use only the lower 64 bit and ignore the upper half. Obviously, the performance will be the same as for processing the whole register.
Kind regards
Thomas
P.S.: You can find the instruction latencies of instructions in Appendix C of the
For floating-point numbers, there are instructions that process only 1 value, e.g. mulss for single-precision multiplication and mullsd for double-precision multiplication. However, they have the same latency and throughput as multiplying the whole register.
For integer numbers, such instruction do not exists. Nevertheless, you can still use only the lower 64 bit and ignore the upper half. Obviously, the performance will be the same as for processing the whole register.
Kind regards
Thomas
P.S.: You can find the instruction latencies of instructions in Appendix C of the
Intel 64 and IA-32 Architectures Optimization Reference Manual
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the input Thomas. I looked at this issue and that's what I gathered also....
-regards,
Kittur
-regards,
Kittur

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page