- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ICC cannot compile the following:
__m128d x = {0.7, 1.1};
__m128 y = {0.7f, 1.1f, 2.7f, 3.1f};
Initializers are standard, not varied like in __m128i for example... so why not? BTW, VC6 allows this, preserving the memory order, however.
Also, __m128 is not compatible with __m128i or __m128d, e.g.
__m128 a = _mm_set_ps(0.7f, 1.1f, 2.7f, 3.1f);
__m128 b = _mm_shuffle_epi32(a, 0xB0); // error
Please note that I'm aware of conversions (_mm_cvt*), pointer casting (*(x*)&y) and unions. However, the former method changes the value and issues other instruction(s). Both casting and unions seem to require the unnecessary memory access, at least in ICC7; VC optimizes (eliminates) a load after a store to the same location.
Is a reason for this lack of compatibility (on intrinsic level) is:
- conversion-proof code? (so why not a warning instead of an error?)
- future compatibility/performance issues? (so why VC does this?)
- cosmetic? (in VC this may be cosmetic, in ICC it is not)
- any other reason?
Best regards,
Anna Niedzicka
__m128d x = {0.7, 1.1};
__m128 y = {0.7f, 1.1f, 2.7f, 3.1f};
Initializers are standard, not varied like in __m128i for example... so why not? BTW, VC6 allows this, preserving the memory order, however.
Also, __m128 is not compatible with __m128i or __m128d, e.g.
__m128 a = _mm_set_ps(0.7f, 1.1f, 2.7f, 3.1f);
__m128 b = _mm_shuffle_epi32(a, 0xB0); // error
Please note that I'm aware of conversions (_mm_cvt*), pointer casting (*(x*)&y) and unions. However, the former method changes the value and issues other instruction(s). Both casting and unions seem to require the unnecessary memory access, at least in ICC7; VC optimizes (eliminates) a load after a store to the same location.
Is a reason for this lack of compatibility (on intrinsic level) is:
- conversion-proof code? (so why not a warning instead of an error?)
- future compatibility/performance issues? (so why VC does this?)
- cosmetic? (in VC this may be cosmetic, in ICC it is not)
- any other reason?
Best regards,
Anna Niedzicka
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Anna,
We made a conscious decision not to allow these initializers, because doing so would have undesirable side effects. For example, you can write the following using VC6.
__m128 y = {0.7f, 1.1f, 2.7f, 3.1f};
float f = y.m128_f32[2];
Accessing the individual elements of y in this manner can lead to undesirable and possibly unexpected performance problems. This sequence is effectively the same as using
float f = ((float*)&y)[2];
You've already observed the negative consequences of casting, namely memory accesses that are otherwise unnecessary.
The recommended alternative method of initialiation is to use intrinsics. For example,
__m128 y = _mm_set_ps(3.1f, 2.7f, 1.1f, 0.7f);
This method always works in C++, and it works for local non-static variables in C. For global and static variables in C, you either need to initialize in code using intrinsics, or you need to use a union as follows.
union {
float f[4];
__m128 m;
} y = {0.7f, 1.1f, 2.7f, 3.1f};
There should be no performance drawbacks to this union provided that all other references to it are through y.m.
We also made a conscious decision to use strict typing for the XMM data types. This avoids potential performance problems with future processor generations. You can freely mix types on a Pentium 4 processor without penalty, but that might not be true for future processors.
For the specific case you raise, you could use the following equivalent code that doesn't mix types.
__m128 a = _mm_set_ps(0.7f, 1.1f, 2.7f, 3.1f);
__m128 b = _mm_shuffle_ps(a, a, 0xB0);
David Kreitzer
IA32 Code Generation Group
We made a conscious decision not to allow these initializers, because doing so would have undesirable side effects. For example, you can write the following using VC6.
__m128 y = {0.7f, 1.1f, 2.7f, 3.1f};
float f = y.m128_f32[2];
Accessing the individual elements of y in this manner can lead to undesirable and possibly unexpected performance problems. This sequence is effectively the same as using
float f = ((float*)&y)[2];
You've already observed the negative consequences of casting, namely memory accesses that are otherwise unnecessary.
The recommended alternative method of initialiation is to use intrinsics. For example,
__m128 y = _mm_set_ps(3.1f, 2.7f, 1.1f, 0.7f);
This method always works in C++, and it works for local non-static variables in C. For global and static variables in C, you either need to initialize in code using intrinsics, or you need to use a union as follows.
union {
float f[4];
__m128 m;
} y = {0.7f, 1.1f, 2.7f, 3.1f};
There should be no performance drawbacks to this union provided that all other references to it are through y.m.
We also made a conscious decision to use strict typing for the XMM data types. This avoids potential performance problems with future processor generations. You can freely mix types on a Pentium 4 processor without penalty, but that might not be true for future processors.
For the specific case you raise, you could use the following equivalent code that doesn't mix types.
__m128 a = _mm_set_ps(0.7f, 1.1f, 2.7f, 3.1f);
__m128 b = _mm_shuffle_ps(a, a, 0xB0);
David Kreitzer
IA32 Code Generation Group
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page