Hello,

Anirudh · ‎06-25-2013

I am trying to do polynomial evaluation using MKL or IPP. Can you guide me as to which functions should I be looking at ? my polynomials are going to be < 4 th order.

TimP · ‎06-25-2013

I don't think you gave enough information to indicate why you would consider MKL or IPP for this. A macro expression such as

a[0] + x * (a[1] +x * a[2] + x*x * (a[3] + x * a[4]))

should be sufficient.

Gennady_F_Intel · ‎06-25-2013

there are no such functions neither in MKL or IPP which make polynomials calculations. Tim gave you good example how you can compute it.

in the case if the inputs would be vectors, then you can reuse some of VML functions for that.

Anirudh · ‎06-26-2013

Thanks for the insight. I was looking for polynomial evaluation, because i was trying to implement 2D interpolation, such that i can interpolate using Cubic splines. There i would have to evaluate cubic polynomials. I could not use functions in image processing library (I believe) because they offer me evaluating the 2D at specific intervals. My new coordinates which i want to interpolate at are randomly placed in the original xy grid.

Do you think there is some function that can do the 2D interpolation for such a case ?

Andrey_N_Intel · ‎06-26-2013

Hello,

Present version of Intel(R) MKL supports 1D spline based interpolation only. Data Fitting component of the library provides routines for spline construction and relevant computations such as interpolation, search and integration. We support linear, quadratic, cubic and user-defined splines.

Are you interested entirely in 2D interpolation, or 1D is also of your use?

Also, can you please provide, if possible, additional details behind your usage model of 2D interpolation such as problem dimension, type of spline (only cubic or other types are possible as well), number of interpolation sites, data types (integer, floating-point), etc? Is 2D interpolation is a compute intensive part of your application?

Andrey

Anirudh · ‎06-26-2013

We did try using the 1D spline from MKL and found that the speed up was about 2x from our implementation, for a case where we were evaluating 10k different points on the same spline. I think MKL is faster in the case where many points are to be evaluated on the same spline. If you evaluate just one point on each of the hundreds of splines then i dont get any advantage.

Right now in my application we have to evaluate 900 splines, 1 point at a time, in which case MKL was slower than our implementation. I cannot create a vector of input points to the splines, as I only get the points to be evaluated on the spline, after some other processing.

I my application splines evaluation are taking 40% of the time (analyzed via Parallel Studio analyzer). I have a 2D matrix of 900x11 elements. I create 900 splines with 11 knots. Every loop i need to evaluate one point on each spline.

I was thinking if I make it 2D interpolation (900x11 original grid) instead, I can evaluate all the 900 points with the same function call and get a speed up.

Andrey_N_Intel · ‎06-27-2013

Thank you much for the addtional details. While your request about support of 2D case is valid for the further analysis, let me provide additional details about 1D case.

As you mention, performance advantages of Intel(R) MKL Data Fitting routines are observed in the vector case when number of interpolation sites is, say, at least several hundreeds.

We are also aware about importance of scalar case for applications, e.g., when number of interpolation sites is 1 and number of functions is 1.
For those reasons in Intel(R) MKL 11.0. Update 2 we improved performance of interpolation routines for a scalar case, in particular speed of search and polynomial computations. As the interface overheads related to testing of function parameters are visible on small problem dimensions, we also added support of parameter DF_CHECK_FLAG in editor dfiEditVal. When disabled with the editor this flag helps to avoid extra parameter checks; by default this flag is enabled. However, please use this flag carefully, after you complete development/debuging of your code. See additional details in Release Notes http://software.intel.com/en-us/articles/intel-mkl-110-release-notes/ and Data Fitting Chapter of Intel(R) MKL Manual. But still, in the scalar case interface overheads may be visible.

Did you try the latest Intel(R) MKL 11.0 to test performance of the interpolation functions in the scalar case? If yes, did it help you in the scalar case?

API of Data Fitting component supports different use cases, including parallelization on level of the user's application.
As your application uses 900 independent splines I wounder if you apply parallelization in your interpolation code?

Something like this:

double breaks[900*11];
double sites[900];

#pragma omp parallel for
for ( i = 0; i < 900; i++ )
{
DFTaskPtr task;

     status = dfdNewTask1D( &task, 11, &breaks[i*11], xhint, ny, y, yhint );
     status = dfdEditPPSpline1D(...);
     status = dfiEditVal(task, DF_CHECK_FLAG, DF_DISABLE_CHECK_FLAG );
     status = dfdConstruct1D( task, DF_PP_SPLINE, DF_METHOD_STD );
     status = dfdInterpolate1D( task, DF_INTERP, DF_METHOD_PP,1,&sites, ...);
     status = dfDeleteTask( &task );
}

If this loop is done inside of the other loop, the construction (and de-struction) of the Data Fitting task can be done once, outside of the inner loop. In the inner loop you either provide pointers to the new input parameters (e.g., breaks or/and function values) using relevant Data Fitting editors or copy the new values of input parameters into the same memory as before.

Please, let me know if this helps.
Andrey

Bernard · ‎06-29-2013

TimP (Intel) wrote:

I don't think you gave enough information to indicate why you would consider MKL or IPP for this. A macro expression such as

a[0] + x * (a[1] +x * a[2] + x*x * (a[3] + x * a[4]))

should be sufficient.

It could be also vectorized when the input are scalar compnents of some vector.

SergeyKostrov · ‎06-30-2013

Take a look at ippr.h header file: ... /* // Name: ipprResize_"mode" // Purpose: Performs RESIZE transform of the source volume // by xFactor, yFactor, zFactor and xShift, yShift, zShift // |X'| |xFactor 0 0 | |X| |xShift| // |Y'| = | yFactor 0 | * |Y| + |yShift| // |Z'| | 0 0 zFactor| |Z| |zShift| // Parameters: // pSrc pointer to source volume data (8u_C1V, 16u_C1V, 32f_C1V modes) // or array of pointers to planes in source volume data // srcVolume size of source volume // srcStep step in every plane of source volume // srcPlaneStep step between planes of source volume (8u_C1V, 16u_C1V, 32f_C1V modes) // srcVOI volume of interest of source volume // pDst pointer to destination volume data (8u_C1V and 16u_C1V modes) // or array of pointers to planes in destination volume data // dstStep step in every plane of destination volume // dstPlaneStep step between planes of destination volume (8u_C1V, 16u_C1V, 32f_C1V modes) // dstVOI volume of interest of destination volume // xFactor they specify fraction of resizing in X direction // yFactor they specify fraction of resizing in Y direction // zFactor they specify fraction of resizing in Z direction // xShift they specify shifts of resizing in X direction // yShift they specify shifts of resizing in Y direction // zShift they specify shifts of resizing in Z direction // interpolation type of interpolation to perform for resizing the input volume: // IPPI_INTER_NN nearest neighbor interpolation // IPPI_INTER_LINEAR trilinear interpolation // IPPI_INTER_CUBIC tricubic polynomial interpolation // including two-parameter cubic filters: // IPPI_INTER_CUBIC2P_BSPLINE B-spline filter (1, 0) // IPPI_INTER_CUBIC2P_CATMULLROM Catmull-Rom filter (0, 1/2) // IPPI_INTER_CUBIC2P_B05C03 special filter with parameters (1/2, 3/10) // pBuffer pointer to work buffer // Returns: // ippStsNoErr no errors // ippStsNullPtrErr pSrc == NULL or pDst == NULL or pBuffer == NULL // ippStsSizeErr width or height or depth of volumes is less or equal zero // ippStsWrongIntersectVOI VOI hasn't an intersection with the source or destination volume // ippStsResizeFactorErr xFactor or yFactor or zFactor is less or equal zero // ippStsInterpolationErr (interpolation != IPPI_INTER_NN) && // (interpolation != IPPI_INTER_LINEAR) && // (interpolation != IPPI_INTER_CUBIC) && // (interpolation != IPPI_INTER_CUBIC2P_BSPLINE) && // (interpolation != IPPI_INTER_CUBIC2P_CATMULLROM) && // (interpolation != IPPI_INTER_CUBIC2P_B05C03) // Notes: // "mode" are 8u_C1V or 16u_C1V or 32f_C1V or 8u_C1PV or 16u_C1PV or 32f_C1PV */ ...