- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following code is taken from https://software.intel.com/en-us/code-samples/intel-c-compiler/application-domains/finance/averaging-filter
here is the serial code which is not hard to explain
Linear Version of the code
for(int i = 1; i < (h+1); i++)
{
int x = ((resized_width * i) + 1);
for(int j = x; j < (x + w); j++)
{
unsigned int red = 0, green = 0, blue = 0;
for(int k1 = (-1); k1 <= 1; k1++)
{
int pos = j + (k1 * resized_width);
for(int k2 = (-1); k2 <= 1; k2++)
{
red += resized_indataset[(pos + k2)].red;
green += resized_indataset[(pos + k2)].green;
blue += resized_indataset[(pos + k2)].blue;
}
}
resized_outdataset[j].red = red/9;
resized_outdataset[j].green = green/9;
resized_outdataset[j].blue = blue/9;
}
}
Array Notation
for(int i = 1; i < (h+1); i++)
{
int x = ((resized_width * i) + 1);
for(int j = x; j < (x + w); j+=8)
{
int row2index = 3*j - 3;
int jump = (resized_width * 3);
int row1index = row2index - jump;
int row3index = row2index + jump;
out[(row2index + 3):24] = (in[row1index:24] + in[row1index+3:24] + in[row1index+6:24] + in[row2index:24] + in[row2index+3:24] + in[row2index+6:24] + in[row3index:24] + in[row2index+3:24] + in[row2index+6:24])/9;
}
}
I'm trying to understand the code but I do not understand that basics assumption here.
In the following code
for(int j = x; j < (x + w); j+=8)
{
int row2index = 3*j - 3;
int jump = (resized_width * 3);
int row1index = row2index - jump;
int row3index = row2index + jump;
out[(row2index + 3):24] = (in[row1index:24] + in[row1index+3:24] + in[row1index+6:24] + in[row2index:24] + in[row2index+3:24] + in[row2index+6:24] + in[row3index:24] + in[row3index+3:24] + in[row3index+6:24])/9;
}
a.how do you decide that j needs to jump to j+=8 ? why 8?
b.why does jump = (resized_width * 3)?
c.why do we have index +0, index+3 ,index +6 ??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your question is a good one. This is the first time I've seen this code, and did not see any comments explaining it, so I was puzzled too.
Answer to a: The code appears to be trying to use array notation to seduce the compiler into generating vector code, by type-punning slices of 8 {r,g,b} structures onto vectors of length 24. Why 24? The vector length needs to be a multiple of 3 because the r,g,b structure has 3 elements. The widest type in each calculation is a 32-bit integer (because of C++ promotion rules). An AVX instruction can do 8 32-bit operations at once. 24 is the least common multiple of 8 and 3, and so works out well for AVX; i.e. three AVX instructions can operate on a 24-element slice. If the target machine had only SSE instructions, which can do only 4 32-bit operations at once, then maybe 12 would be a better value. 24 would still be workable, but might raise the register pressure badly, or might help by improving instruction-level parallelism. The only way to be sure is to time the variations -- modern hardware is too complex to predict. That's why when I write this sort of code, I make the 24 a symbolic constant and time the variations.
Answer to b: The jump is in units of the type-punned pointer to a color component (one of r, g, or b), not the original pointer to a struct containing a triple {r, g, b}.
Answer to c: See answer to be for why the offsets are multiples of 3. Though a more literal translation of the serial code's offsets -1, 0, 1, would rescale them to -3, 0, 3. For reasons not apparent to me, the code shifts the offsets upwards by 3, and compensates by burying a -3 in this line:
int row2index = 3*j - 3;
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your question is a good one. This is the first time I've seen this code, and did not see any comments explaining it, so I was puzzled too.
Answer to a: The code appears to be trying to use array notation to seduce the compiler into generating vector code, by type-punning slices of 8 {r,g,b} structures onto vectors of length 24. Why 24? The vector length needs to be a multiple of 3 because the r,g,b structure has 3 elements. The widest type in each calculation is a 32-bit integer (because of C++ promotion rules). An AVX instruction can do 8 32-bit operations at once. 24 is the least common multiple of 8 and 3, and so works out well for AVX; i.e. three AVX instructions can operate on a 24-element slice. If the target machine had only SSE instructions, which can do only 4 32-bit operations at once, then maybe 12 would be a better value. 24 would still be workable, but might raise the register pressure badly, or might help by improving instruction-level parallelism. The only way to be sure is to time the variations -- modern hardware is too complex to predict. That's why when I write this sort of code, I make the 24 a symbolic constant and time the variations.
Answer to b: The jump is in units of the type-punned pointer to a color component (one of r, g, or b), not the original pointer to a struct containing a triple {r, g, b}.
Answer to c: See answer to be for why the offsets are multiples of 3. Though a more literal translation of the serial code's offsets -1, 0, 1, would rescale them to -3, 0, 3. For reasons not apparent to me, the code shifts the offsets upwards by 3, and compensates by burying a -3 in this line:
int row2index = 3*j - 3;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the quick answer but i'm still a little puzzled.
so if you run the linear version I get in pos+k2
0 1 2
3266 3267 3268
6532 6533 6534
1 2 3
3267 3268 3269
6533 6534 6535
2 3 4
3268 3269 3270
6534 6535 6536
3 4 5
3269 3270 3271
6535 6536 6537
and so on for the next 8 runs - we said 8 because of 3*8 =24 for the AVX
each position has RGB channels we are calculating resized_outdataset
j = the middle 2,2 index starting from 3267, 3268 for the next 3x3
so we iterate 8 times over 3 char* which are the RGB channels for each position
when I use the AN version I get totally different indices
row1index = 0
row2index = 9798
row3index = 19596
Can you please explain this? even the assignment in the end of the loop is into row2Index which is not the same index as j.
I would expect the syntax here to say
(in[row1index:24] + in[row1index+3:24] + in[row1index+6:24]
iterating over 8*3 chars
the first index was 0 1 2 for the first iteration for pos+k2 so 0 == row1index when pos+k2 =1 this means you need to skip 3 chars because of the RGB channels so we get to row1index+3 and so on for index+6 when pos+k2=2
but when pos+k2 = 3266 , 3267, 3268 for the first iteration the AN version syntax says
in[row2index:24] + in[row2index+3:24] + in[row2index+6:24]
and row2index starts from 9798.... so i'm lost here
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The index j in the serial code is an index into an array of struct, where there are 3 bytes per struct. The row...index variable is indexing on the scale of bytes. So there's a 3x ratio, i.e. 9798/3 = 3266.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much, now I understand the whole code. what a weird way of programming....
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page