- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
short Array2 [100];
short Val = 34;
for (int i = 0;i < 100; ++i)
{
if (Array1 > Val)
{
Array2 = 3;
}
}
unsigned short Array2 [100];
unsigned short Val = 34;
for (int i = 0;i < 100; ++i)
{
if (Array1 > Val)
{
Array2 = 3;
}
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#pragma vector always
on the line preceding the for(), or, if you wish to assert in addition that the operands are aligned,
#pragma vector aligned
In an example such as you quote, it looks impossible that an exception could be raised by speculative execution, but the compiler appears not to be able to make the distinction.
I'm not familiar enough with the parallel 16-bit operations to be certain, but I assume "condition is too complex" would refer to lack of a 16-bit parallel unsigned compare instruction.
Message Edited by tim18 on 08-18-2004 07:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reason the first loop is vectorized with 8-way SIMD parallelism on packed words but not the second is due to the lack of support for a packedunsigned > comparison (viz.instruction pcmpgtw supports > comparisons on packed signed words, but not for packed unsigned words). In contrast, a == or != comparison will vectorize for both cases, because the instruction pcmpeqw(possibly negated) works for both packed signed and unsigned words (provided that both operands are consistently either sign extended or zero extended, as in your example, so that the comparison can be done in the narrower 16-bit precision to allow for maximum parallelism).
The condition may protect exception appears for Max because the 8.x compilers are more careful about vectorizing a condition with bit-masking (where conditionally executed code is moved into the always-taken path) than the 7.x compilers. Suppose the trip count would not be known for this loop and Array1 is much longer than Array2. Then,conditions Array1 > Val could be used in the prefix to indicate that Array2 may still be accessed, while all remaining elements of Array1 are set to a value <= Val to signal an out-of-bounds of Array2. This may seem contrived for this example, but, in general, compilers must make conservative assumptions. The 8.x compilers only uses some simple symbolic manipulations to disprove that conditional exceptions can be moved into the always-taken path (and not yet the actual range of subscripts), which explains the rather conservative rejection ofthe example where the trip count and array sizes are statically known. Like Tim said, any vectorization enabling pragma will tell the compiler that it is okay to skip the analysis of whether conditions guard exceptions. I will work on further improving the analysis in the 9.x products.
For such and many other details on vectorization in the Intel compiler, please refer to the recently published book:
Aart J.C. Bik. The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance.Intel Press, June, 2004, http://www.intel.com/intelpress/sum_vmmx.htm
Hope this was useful.
Aart Bik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
short Array2 [100];
short Val = 34;
int i;
for (i = 0; i < 100; ++i)
if (Array1 > Val)
{
Array2 = 3;
}
}
s.c(9) : (col. 4) remark: loop was not vectorized: condition may protect exception.
s.c(8) : (col. 1) remark: LOOP WAS VECTORIZED.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
short Array2 [100];
short Val = 34;
for (int i = 0; i < 100; ++i)
{
if (Array1 > Val)
{
Array2 = 3;
}
}
unsigned short Array2 [100];
unsigned short Val = 34;
for (int i = 0; i < 100; ++i)
{
if (Array1 > Val)
{
Array2 = 3;
}
}
Top one (with signed integer) is vectorized as -
--
]$ icpc test1.cc
test1.cc(9): (col. 1) remark: LOOP WAS VECTORIZED.
--
but with unsigned integers it doesn't vectorizes.
The Intel Reference document - Intel-64 & IA-32 Architectures Software Develope's Manual, Vol.2B 4-91, Order# 253667-025US only speaks about signed bytes/words/double intergers.
~BR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BR,
Although SSE3 does not support unsigned integer math I think you can roll your own code to use SSE3
unsigned short SignBit = 0x8000;
unsigned short Array1[100];
unsigned short Array2[100];
...
if(
( ((short)Array1 - (short)Val) & (short)SignBit)
!=
((short)Array1 & (short)SignBit)
) (short)Array2 = (short)3;
...
The following may or may not work (too lazy to try) as I do not know how the SSE3 treats the xor of an SSE3 register loaded with short integers with a register short integers but treated as XORPD (xor of shorts)
if(
(
(
((short)Array1 - (short)Val)
^
((short)Array1)
)
&
(
(short)SignBit
)
)
(short)Array2 = (short)3;
I hope the above reads ok.
Basicly you are perfroming signed math on the unsigned values (subtraction) then testing to see if the sign bit flipped.
*** Val will have to be set such that the unvectored loop uses Array >= Val and not >.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SSE should have an XORPI (xor packed integers) for registers loaded as integers.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is being done here is basically an array clipping operation, I wrote an article for the ISN some time ago covering the subject:
http://software.intel.com/en-us/articles/array-clipping/
You might want to take a look at it. Too bad that even ICC 11.1 beta five years of compiler development later still doesn't use that trick to vectorize such code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is being done here is basically an array clipping operation, I wrote an article for the ISN some time ago covering the subject:
http://software.intel.com/en-us/articles/array-clipping/
You might want to take a look at it. Too bad that even ICC 11.1 beta five years of compiler development later still doesn't use that trick to vectorize such code.
I think Forums are meant for Intel users to learn & share, if some issues arises than report those issues for the betterment of Intel tools to Intel developers to investigate and fix if needed.
~BR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think Forums are meant for Intel users to learn & share, if some issues arises than report those issues for the betterment of Intel tools to Intel developers to investigate and fix if needed.
~BR
No, Aart Bik said that he (as in personally) will work to better support such cases in 9.x -- not fix it or resolve them simply because such issues aren't easily fixable or resolvable which he did explain in his rather long and high quality post.
That was however in August 2004, and Aart Bik doesn't work for Intel anymore since May 2007 so to me it seems a bit unrealistic to expect something from bringing this subject up.
As for the purpose of the forums -- if I understand it correctly, forums are intended mostly for self-help, although Intel engineers do watch it carefully and sometimes even jump in.
With that said, if you have a specific issue with code which doesn't vectorize when it should I suggest you to file a bug report on Premier Support. Also, if you believe that some code should be vectorizable feel free to file a feature request on Premier Support instead of bringing up old threads.
Finally, feel free to adapt my sample code, you can also ask me for help if you are having problems understanding it. After all I was just trying to be helpfull.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is being done here is basically an array clipping operation, I wrote an article for the ISN some time ago covering the subject:
http://software.intel.com/en-us/articles/array-clipping/
You might want to take a look at it. Too bad that even ICC 11.1 beta five years of compiler development later still doesn't use that trick to vectorize such code.
If I consider - Multiplication of two unsigned 64-bit numbers.
This multiplication can't be vectorize thereby with any versions of ICC or optimize to give better performance than 32-bit using vectorization?
~BR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I consider - Multiplication of two unsigned 64-bit numbers.
This multiplication can't be vectorize thereby with any versions of ICC or optimize to give better performance than 32-bit using vectorization?
~BR
The problem with 64-bit multiplication is that there is no hardware support for it be it signed or unsigned.
Therefore, the compiler cannot utilize something that doesn't exist. The code I wrote is a workaround for smaller datatypes because for those datatypes instructions do exist (except that they deal only with signed numbers).
So the answer is NO.
If you are looking for optimized code for large number multiplication you should consider specialized libraries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem with 64-bit multiplication is that there is no hardware support for it be it signed or unsigned.
Therefore, the compiler cannot utilize something that doesn't exist. The code I wrote is a workaround for smaller datatypes because for those datatypes instructions do exist (except that they deal only with signed numbers).
So the answer is NO.
If you are looking for optimized code for large number multiplication you should consider specialized libraries.
Could be some other processor might be supporting.
~BR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could be some other processor might be supporting.
~BR
Not that I am aware of. The only sort of advice I can give you for big number multiplication is to use FFT.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page