data fitting df?Interpolate1D

Petros_Mamales · ‎04-18-2013

Hi,

I am new in the data fitting land of mkl and have a couple of questions.

a) the main interpolate function, is the cell input used if provided or is recalculated ( assuming that the pointer is not NULL)?

The reason I am asking is that, in the case of calibration of a parametric function the points over which the updated function is

recalculated are given and therefore the search for their cell position redundant. My guess is no, since I don't see a flag that would

signify that the cell variable pointed memory is valid (i.e. not available for writting only).

b) the are two flags for the type input. If one uses the DF_CELL, is this different from a call to

df?SearchCells1D, other than the fact that the latter can be done on a task that has no function information ?

Thank you very much in advance for your help,

Petros

ps: Suggestion: the readability of the C-API of the data fitting routines would greatly benefit from constness wualifiers on pointers and data.

Andrey_N_Intel · ‎04-18-2013

Hi Petros,

API of Intel(R) MKL df?interpolate1d() routine allows you to get cell search results, in addition to interpolation results. To do this, provide pointer to memory for search results, parameter "cells" and specify type computations DF_INTERP | DF_CELL. The function recalculates cell indices for the same partition and same array of sites each time you call it.

If you provide the same partition/sites to Intel(R) MKL df?searchcells1d() function you would get search results identical to the results returned by the interpolation function given DF_CELL type of the computations and the same input parameters.

If you need to re-use results of cell search in the subsequent calls to inteporlation routine, please use extended version of df?interpolate1d() routine, df?interpolateex1d(). Specify user-defined call back function search_cb which returns cells indices computed earlier.

Please let me know, if this addresses your questions, or you need additional details.

Regarding your suggestion on qualifier to pointers/parameters, can you please clarify in more details what you expect? Say, add "const" qualifier to API fo spline functions or something different?

Andrey

Petros_Mamales · ‎04-19-2013

Hi Andrey,

Thank you for your response and suggestion. I think I have something to work with now.

As far as const qualifiers ( apologies for the typo before), yes this is what I mean. For example if instead of the argument

MKL_INT* cell,

there was

- MKL_INT const * const cell

or even

- MKL_INT const * cell

a lot would be possible to infer just by the function signature.

Thank you again,

Petros

Andrey_N_Intel · ‎04-19-2013

Hi Petros, thanks for clarificaton. Const qualifiers in spline function API should already be available in the latest version of Intel(R) MKL. Andrey

Petros_Mamales · ‎04-20-2013

Hi Andrey,

Thank you for the heads up. One more clarification: if instead of the callback function, I exploit the datahint input, should not I be able to accomplish the same. Waht I am thinking, schematically, would be something like this :

double *pDataHint = new double [ nsites + 4];

MKL_INT * cell = new MKL_INT[nsites] ;

for( size_t i = 0; i != nsites; ++i ) pDataHint = double( cell[i+4] ) ;

pDataHint[1] = DF_APRIORI_MOST_LIKELY_CELL ;

(assuming that an initial search call populating cell has been made ).

Would this provide me with an equally efficient alternative to the callback writting?

TYVMIA,

Petros

PS: the sites are assumed to be ordered.

Andrey_N_Intel · ‎04-20-2013

Hi Petros,

I believe that "most likely cell" approach is intended to cover the case when all of your interpolation sites populate neighbourhood of the cell whose index is provided in pDataHint array. In other words, the present version of the library does not support the use scenario you describe above. Currently, you can re-use results of the initial search with callbacks. Please, let me know, if you need additional details on how to use them.

Thanks for your questions,

Andrey

Petros_Mamales · ‎04-20-2013

Hi Andrey,

Thank you very much for clarifying. I was hoping for a different answer ;-).

(Callbacks are fine, my biggest concern is templatizing the C-API - getting a bit tired of keeping track of types and variable names ; -) )

Maybe in the future you would like to consider a flag like: DF_APRIORI_EXACT_CELL ?

On another note, in the integration routine for a scalar function ( please. let me know if I should open another case ) :

- is the routine smart enough so that if I ask for integrals with a single common start point, to not recalculate the overlapping integrals or should I take care of it on the outside ?

- how are the results ordered ? The input is a 2d set ( left and right limits ). As far as I saw in the documentation, there is only mention of the order with respect to the function coordinates (for when this is a vector-valued function).

Thank you for all your help (especially on a Saturday - please, no reason for you to respond on your weekend ).

Petros

Update: the second I understand now to be the order of the index of the vectors . So please ignore it.

Andrey_N_Intel · ‎04-20-2013

Hi Petros,

Returning back to your original question. If I correctly understand you fix partition and interpolation sites. For tuning you vary function values only in points of the partition. If this understanding is correct, I wonder if vector case would work for you? That is, you define vector valued function on the same partition, each coordinate of the function - modification of your original function. During interpolation search results would be re-used for all coordinates of the vector function. Please, note that current version of Intel(R) MKL Data fitting support the same boundary conditions for all coordinates of the function.

Regarding to your DF_APRIORI_EXACT_CELL request: can you please submit feature request via Intel Premier support system? We would make analysis of the request to understand if it can be addressed.

For each coordinate of the function integration results are packed in natural order: 1st integral - for 1st pair of integration limits, 2nd integral for the 2nd pair of integration limits, etc.

For performance reasons API of dfIntegrate1D() function supports hints which describe structure of integration limits such as "sorted" or "uniform". It helps to speed-up computations of integrals by re-using overlapping integrals when possible. So, if possible, please sort your right integration limits before the call to the integration routine and provide proper hint to the library (I assume that all your left integration limits are the same).

Thanks,

Andrey

Petros_Mamales · ‎04-20-2013

Hi Andrey,

Thank you for the recommendation. Actually, I need this feaure only for a small subset of splines- the mos trivial ones where no boundary conditions are needed. What I really need is the expandability of the partition set on demand - as in a bootstrapping operation - where the new parameters corresponding to the additioal interval would be tuned so that some condition is satisfied. I can do this by using the editing capability of the library.(Actually, I meant to ask, how come there is no querying functionality for the task structure? this necessitates that one has to carry around the setting values for their own referencing ).Typically there would be ~100 points (partition) and that, if I understand your suggestion, would send me to 100-dimensional objects, something that would be an overkill for my needs but very useful idea in case I wanted to calculate Jacobians on entire curve (thank you!).

Regarding the request, I will folow up ( although I am not sure how right now, but I will )..

{ You see, there are a lot of scenaria where unbundling the interpolation into a step for setting the sites and one for the calculation can be beneficiary, An example is MonteCarlo. Yes I realize, although I have not seen a concrete example of it yet, that there is the vfector-function capability and mkl has provided with a solution there. However, if one talks about large samples, then memory requirement can become very expensive ( the coefficients for example ).. One would think then to break-up the sample into manageable blocks and go from there. Since the cell parameter would be read-only, there would be no problem in sharing it in threads. }

Finally, for the integration question, this is excellent news ! very often I find my self needing integrals on a range of sorted upper limits (with the

same lower one). It is great that mkl does the bookkeeping for me.

Thank you very much.

Have a nice weekend,

Petros

Gennady_F_Intel · ‎04-21-2013

Petros. the FR (DF_APRIORI_EXACT_CELL) has been submitted. here is the number of this request for your reference: DPD200335069

Petros_Mamales · ‎04-21-2013

Thank you for doing this Gennady,

Petros

Petros_Mamales · ‎04-21-2013

Hi Andrey,

So now I wrote the callbacks, but have a simple question:

In the case of integrating with search callbacks, what does the callback do?

The difference being that, now we have pair of points to specify the position.

One simple answer would be the 2*nInterval limits. Is this what to do? If yes, as pairs of lef-right limits or as consecutive arrays ?

If no, what is expected/happenning ?

TIA,

Petros

Andrey_N_Intel · ‎04-23-2013

Hi Petros,

Just to double-check on your question "how come there is no querying functionality for the task structure?" Do you mean routines which return values of Data Fitting Structure? For example, something like GetPartitionSize()? If this is the case, the reason behind absense of such routines - Data Fitting design decisions. Parameters such as break points or interpolation sites are held on the side of the user's application, and the Data Fitting task descriptor holds only pointers to the data (no copying of the arrays into internal structures of the library is done). In particular, it also means that you should not de-allocate the arrays passed into the library over the whole life of the task. At the same time, task descriptor holds scalar values such as number of breakpoints; those values are naturally in user's application. From perspective of performance, it does not make sense to call a library routine to read, say, size of partition.

Answering your question on callback for integration routine. From perspective of API of search callback there is no difference between interpolatation and integration functions. Purpose of the search callback - to return indices of cells which contains given points, interpolation sites or left/right integration limits. So, during computation of the integrals the integration function makes call to your search callback function to get indices of left limits and then indices of right limits (and this order of calls to the search callback is not important). Internally, the library will properly process the results of the search callback. Also, please keep in mind, that generally the library can require the callback to return vector of indices, that is first parameter n of the callback can be greater than one.

Please, let me know, if you need more details/clarifications.

Andrey

Petros_Mamales · ‎04-23-2013

Hi Andrey,

Per the first question, you are absolutely right. It might happen though that you want to have access at scopes other than the ones where the task is created. Since very likely the arrays will live on the heap this is entirely possible. Now say I want to use the same task accross threads.I will have to copy around the references to breakpoints, function values while the info is there in the task pointer!. I know I can encapsulate all this and the task pointer and "distribute" the encapsulated struct, but why if there was querying functionality. I guess some of it is programming style and habits.

On the integration usage of the callback routine, I understand that you are saying that this callback will be called twice, one for the left and once for the right integration limits. But then, since I only have one search callback how will I know to point it it to a buffer of the cells corresponding to either of them ? It would seem to me that I need either two callbacks or a flag. And me putting the latter in the parameters is not going to be picked up by the mkl code. Am I missing something ?

Than you very much for your help,

Petros

Andrey_N_Intel · ‎04-24-2013

Hi Petros,

thanks for your additional inputs related to querying functionality. We would do additional analysis of this request. Currently "parameters" is intended for convenience of the user and his/her application and is not read/modified by the library. Getting flag/info about left/right integration limit from the library would help you to implement this callback in more effective way. Let's analyze this request as well. Currently, as a workaround (as far as I understand you plan to sort right limits) checking limits against some boundaries in the callback would help to indentify type of the integration limit, left or right. Would this work-around work for you?

Thanks,

Andrey

Petros_Mamales · ‎04-24-2013

Hi Andrey,

I don't understand how this would work. I prefer that I used the non-callback formula for now, until this is fixed at a later mkl release (It seems to me that there is an interface problem here, i.e. there should be two callback inputs for the integation and at some point will have to get fixed) .

Thank you for your help,

Petros

Andrey_N_Intel · ‎04-24-2013

Hi Petros,

let's consider general case - I need to compute n integrals over the interval [ llim, rlim ), i=1,...,n. Limits are stored as arrays say of double type: double llim={}, rlim[]={}; I hold indices of cells which contain the limits in the arrays MKL_INT llim_idx[]={}, rlim_idx={}. The library passes array of m integration limits lim[] into callback. In the callback I should find if the limit provided by the library is right or left and return proper index of the cell - something like this:

for ( i = 0; i < m; i++ ) {

for( j = 0; j < n; j++ ) {

if ( lim == llim ) {lim_type = LEFT;break;}

if ( lim == rlim ) {lim_type = RIGHT;break;}

}

cell = ( lim_type == LEFT ) ? llim_idx : rlim_idx;

}

status = DF_STATUS_EXACT_RESULT;

return status;

This would be done in more effective way, if the library returned type of integration limit. Also, info about structure of limits (say, they are sorted) can be used. Anyway, we would analyze what can be done on the side of the library to provide info about type of integration limit into callback.

Andrey

Petros_Mamales · ‎04-24-2013

Hi Andrey,

So effectively ( I got a bit confused with the m's and the n's - there might be a small glitz in your pseudo-code, maybe ? ) you are saying to check the sites vector against the stored site vector and identify it as a left or right limit. This already has the benefit that it is an operation over the number of integration intervals rather than the nuber of knots and can even be done -very likely-with a memcmp. I guess this would work - although not terribly clean ;-))

Thank you very much for the suggestion and for your help.

Petros

Petros_Mamales · ‎04-25-2013

There is another issue with the search callback in that it exects the cells to be (long long) while the cell varable in the interpolation routine is MKL_INT ( simple int).

Actually, the cells don't show up in the integration routines except for the callback (as a matter of fact, all callbacks have been defined for MKL_INT64, which is the long long of the documentation (I guess?) !! ).

Is this the intended behavior ? The user is to maintain the type integrity between the MKL_INT and MKL_INT64 pointers ?

Suppose, I maintain the cells from the search routine. These come as MKL_INT. Then I have to turn them into MKL_INT64 in order to pass them to the callback. I will not care for cell output of the interpolation routine. Can I assume things will work properly, or there is an oversight here ?

TIA,

Petros

Andrey_N_Intel · ‎04-26-2013

Hi Petros,

Array of cell indices, parameter "cell" in API of callback function for search is 64-bit integer. Basically, this is the pointer to the buffer of size n internally allocated by the library for search results that come from a user. Array of cell search results returned by interpolation routine is MKL_INT which is either 32-bit or 64-bit integer depending on your enviornment/settings.

In your callback function you copy MKL_INT indices into MKL_INT64 (wider data type) indices, and this operation should be safe. Below I use part of the code above:

MKL_INT64 cell[]; // pointer to internal library buffer

MKL_INT llim_idx[], rlim_idx[]; // pointers to integration limits stored on side of the user

...

cell = ( lim_type == LEFT ) ? llim_idx : rlim_idx;

and no additional efforts from you are necessary, as a compiler is expected to properly work with the data types.

Andrey

Petros_Mamales · ‎04-26-2013

Actually there is more,

The result of the search call ( the first call that needs to be made to identify the cells of the sites )will return MKL_INT. These will have to be translated to MKL_INT64 to conform to the callback signature.

All this is a bit strange, hence I asked for the clarification.

Thank you for your help,

Petros

Note : There is a difference in the signature style between :

_Mkl_Api(

int,dfdIntegrate1d,(DFTaskPtr , MKL_INT , MKL_INT ,double[], MKL_INT ,double[], MKL_INT ,double[],double[],double[], MKL_INT ))

and

_Mkl_Api(

int,dfsIntegrate1D,(DFTaskPtr , MKL_INT , MKL_INT ,float [], MKL_INT ,float [], MKL_INT ,float [],float [],float [], MKL_INT ))

i.e. upper vs lower case function name suffix ('d') and the usage of pointers for the int types in the signature (mkl 11)