- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello guys,
I am trying to write a fast version ofthe followingmatrix operation:
for all row
for all col
if(matrix(row, col) < 0.04045){
matrix(row, col) /= 12.92;
}
else{matrix(row, col) = pow(((matrix(row, col)+0.055)/1.055), 2.4);
} // if} // for col
} // for row
Can this be vectorized?? Thanks in advance for any suggestions.
Alex
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To speed it up further, multiply with the reciprocal instead of dividing.
You may also do a lot better to use an array insead of the matrix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
wordy79@hotmail.com:Not with that else block in there. If you can do away with it,
unfortunately no, that's my problem.
wordy79@hotmail.com:them depending on what matrix() is, you may get it to vectorize.
matrix is a 2D array of floating-point data.
wordy79@hotmail.com:You may also do a lot better to use an array insead of the matrix.
You mean a 1D array vsa 2D matrix? Why would that make a difference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It may be that the library team didn't find a generally satisfactory way to write a vector pow() (double) function. There is an exp2() (not masked), and log2() is present in both masked and unmasked versions. If you can accept the limitations of substituting those functions for pow(), you may be able to approximate the result you requested.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You mean a 1D array vsa 2D matrix? Why would that make a difference?I mean a 2D array (e.g. double foo[10][10]) instead of an abstract data type.
Either way, it's academic if you can't re-design to lose the else part. :-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> As there is no double counterpart (masked or not), it
> looks like the support you would need for double is not
> there.
The data I process are actually float (not double). Can it be vectorized in this case? I thought the presence of the if-else made the vectorization almost impossible? Is that right?
Also, what is the difference between a masked and a unmasked function?!
Thanks a lot for the informative answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
wordy79@hotmail.com:You mean a 1D array vsa 2D matrix? Why would that make a difference?I mean a 2D array (e.g. double foo[10][10]) instead of an abstract data type.
When I write foo(row, col), foo is actually an instance of an Image class, and the operator() is simply an inlined "foo.data[row*width + col]". It's not a static array though, it can't be.
wordy79@hotmail.com:
Either way, it's academic if you can't re-design to lose the else part. :-(
I don't quite understand what you mean by "academic"? (English is not my first language)
Thanks for the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18:icc 10.0 has added a masked powf() function in svml. As there is no double counterpart (masked or not), it looks like the support you would need for double is not there.
It's interesting you assumed the algo was working on double, and not float. What made you think that?
Collorary question: Say I've got a template as follow
template
void ExpMultiply(Matrix mat)
// Matrix contains a 2D array of type T
{
for(all rows){
for(all cols){
mat(row, col) = exp(mat(row, col) * 1.0);
}
}
}
and say I compile the app with parameters for fastest speed. Which svml version of exp() will be called? The one for double or the one for float?
Whay about the initial function of my first post? say the matrix data are float? will the computation be made in double or float?
Thanks again
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
For operations on the entire matrix add an overload for single cell reference. i.e. permit access via either matrix(row, col) or via matrix(cellNumber).
// if a cell is < 0.04045 make it even smaller
for all cellNumber
if(matrix(cellNumber) < 0.04045){
matrix(cellNumber) /= 12.92;
} // if
} // for cellNumber
// Now catch the cells >= 0.04045
// Avoid repeating binary fraction problem
// Note, any cell that was < 0.04045 is now < 0.04045/12.92
// And therefore is also < 0.04045/10.
// Therefore 0.004045 is known to be > any modified cell above
// as well as known to be < any unmodified cell above
for all cellNumber
if(matrix(cellNumber) > 0.004045){
matrix(cellNumber) = pow(((matrix(cellNumber)+0.055)/1.055), 2.4);
} // if
} // for cellNumber
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The float version of exp() is expf(), and the float version of 1.0 is 1.0f. The compiler will not demote your expressions to float. If you are expecting the compiler to change your double code to float in order to vectorize, you will be frustrated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the answer.
JimDempseyAtTheCove:For operations on the entire matrix add an overload for single cell reference. i.e. permit access via either matrix(row, col) or via matrix(cellNumber).
Is is faster to do that?
The reason I am not doing it is because each row is 16-bytes aligned, hence there can be up to 15 extra pixels per row. That is not much because most of the matrices I am dealing with are 3000x2000, but still. Do you think I should reconsider? I'd be glad to hear any recommendations on this issue.
JimDempseyAtTheCove:// if a cell is < 0.04045 make it even smaller
for all cellNumber
if(matrix(cellNumber) < 0.04045){
matrix(cellNumber) /= 12.92;
} // if
} // for cellNumber// Now catch the cells >= 0.04045
// Avoid repeating binary fraction problem
// Note, any cell that was < 0.04045 is now < 0.04045/12.92
// And therefore is also < 0.04045/10.
// Therefore 0.004045 is known to be > any modified cell above
// as well as known to be < any unmodified cell abovefor all cellNumber
if(matrix(cellNumber) > 0.004045){
matrix(cellNumber) = pow(((matrix(cellNumber)+0.055)/1.055), 2.4);
} // if
} // for cellNumberJim Dempsey
This is very interesting. What is the benefit of doing that? Or more specifically, is this vectorizable when written your way?
Thanks again, and looking forward to your answer.
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18:The float version of exp() is expf(), and the float version of 1.0 is 1.0f. The compiler will not demote your expressions to float. If you are expecting the compiler to change your double code to float in order to vectorize, you will be frustrated.
Soit looks like I completely misunderstood then.
So, just to make sure, if I call exp() on a float variable, the variable will first be casted from float to double, and then the computation will be done in double, and the output will also be double? If I want to have a float-only pipeline, I have to call expf() instead of exp()?
Thanks a lot for the help!
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
The array is the same size. In the matrix(row,col) you reduce this to a single index using a [row*rowSize+col]. The alternate matrix(cellNumber access the array using [cellNumber]. My suggestion was to eliminate the computation for the indexing.
As for the two loops vs one loop.
The two loops eliminate the else clause (branch inside loop). So each loop will run faster and may vectorize (atleast the first loop). As to if the sum of the two loop times is less than the single loop - give it a try.
Your matrix size may exceed your cache size. Therefor recode the two loops to do part of the matrix then enclose the two loops in a major loop to advance through the parts of the matrix. e.g. inner loops work on col, outer loop advances row.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18:You could test with #includewith -std=c99 option, to see whether that makes exp() change to expf() when only float arguments are present. You never said anything about tgmath, and your expressions are promoted to double anyway by the use of double constants.
I did not know about
As for the use of double constants, that was not my intention.I am actually trying to write generic code that can work on both float and double. Does someone know a way to write a floating-point constant so that the compiler will interpret it as float when the template argument is float, and double with double argument? To illustrate, it should be as follow
template [typename T]
T func(T value){
return exp(value * 8.0);
}
If T=float, the expression should become expf(value * 8.0f);
If T=double, the expression should become exp(value * 8.0);
Any ideas/hints/suggestions _more_ than welcome.
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the example you give, where the constant is represented exactly in either float or double, 8.0f gives no problem. The compiler can decide whether it is more efficient to promote to double at compile or run time.
More generally, it is a bit awkward. You might define each of your constants with a name, using a T typedef, e.g. const T eight = 8; or put (T) casts on all your constants.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18:The compiler can decide whether it is more efficient to promote to double at compile or run time.
My only worry is speed, hence I want to make sure that all computations are donewith the correct datatype, and with no cast operations.Worst case would be for an expression like "exp(v
Is this issue discussed somewhere in Intel's reference manuals?
tim18:More generally, it is a bit awkward. You might define each of your constants with a name, using a T typedef, e.g. const T eight = 8; or put (T) casts on all your constants.
any ideas for LUTs? They are not defined in functions' bodies, hence T is not known.
Anyway, thanks a lot for the terrific answers.
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My point is that multiplying a double array by 8.0f is no problem, as the compiler should promote the constant to double at compile time, at least when that will improve vectorization. In that case, there is no reason to worry about changing the constant data type when you change the array from float to double.
Forcing calculations into double unnecessarily must be avoided, as you have pointed out. We have seen large applications which were designed for x87, which did not vectorize effectively until the problem with double constants mixed with float variables was corrected. This could prevent vectorization, or it could involve splitting up the loop ("distribution"), which you would see by PARTIAL LOOP VECTORIZED reports. If the loop is big enough that the data don't reside in L1 cache between the split loops, the purpose of vectorization may be defeated, even when all the bits are vectorized.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page