I'd like to suggest thatall "int len" parameters ofIPP functions shouldbe changed to "size_t len", especially the ipps* (signal processing) functions, which work with large arrays. This is because int is not enough when working with very large arrays in Intel 64 architecture. On the other hand, size_t is defined as 32-bitwhen compiled forIA-32 and 64-bitwhen compiled for x64, so it would correctly refer to the maximum lengths supported by each architecture without breaking compatibility.
This is holding me back when using IPP with 64-bit applications that work with large memory buffers.
What particular functions you consider may benefit from ability to process more then 2GB buffers?
BTW, usually it is possible to split processing by chunks of data, which will increase data locality and minimize amount of memory required by application.
Let us consider an example of md5 checksum operation.
I want to write an application that will perform md5 checksum of a large .ISO file (say 8GB).
I can do that in several ways:
1. Most popular method is to allocate a buffer and then read the file chunk by chunk until whole file has been processed. That is convenient for interactive applications where user wants to see the progress and be able to cancel the operation.
2. Another way is to use memory mapped I/O. That means I can map the whole file into application's virtual address space, and let the operating system worry about the rest. I can then call the function just once and let it run in the background until it is done (MD5 checksum is an example of this, I don't care how long it takes, I just want the result, canceling is not an option).
With current IPP, method #2 is not possible because of length restriction.
I also dislike IPP functions for having so many parameters. In my opinion, it would be much nicer if function parameters were packed into structures and if we passed them into IPP functions via structure pointers.
This is off topic, but I like all the parameters. Back in the IPL days I was either locked into the IPL image structure, or would have to take my parameters and copy them into an IPL structure. I like being able to wrap up image parameters in my own class and then just call the IPP routines without having to go through another image structure layer.
seems what Peter says aboutIPP parametersis quite reasonable. And in factsuch goal was behind IPP design. If you like high level functionality with minimum paramaters what stops youfrom developing your own high level layer on top of IPP?
I agree with using size_t instead of int, simply because that is better for 64-bit, and we are all moving to that.
As for replacing a bunch of parameter with a struct, I disagree. Parameters works fine, and it is the fastest method. It is also more clean, as then you do not need to declare, fill, and use the struct.
As Igor pointed out, memory mapped files are a common way to benefit from more than 2GB address space, and also large memory buffers such asthose of abig database or dataset, for example.
Splitting it into several calls of 2GB chunks may not be beneficial ifthebuffer is accessedsequentially. In fact, the performance may decrease due to the overhead of splitting data and doingmorecalls.
For instance, consider a memset function implemented withippsSet_8u. Instead of a single call, you would have to check if the size of the array is greater than 2GB, and then call it many times until the whole array is covered. This adds branching and arithmetic that would be avoided with a single call. I believe all IPP functions that handle arrays would benefit from this.
Regarding size_t, it is defined by the C standard, so it must be supported by all compilers. All C Runtime functions that handle memory buffers (memcpy, memset, strlen, etc)use it, because it conveniently assumes the correct size that your architecture can handle. But, of course therecouldbe acustom IPP type with the same behavior, just to be consistent.
Also, it would be transparent to current users, since the new parameterwouldhaveat least the size of the current int parameter, except maybe a warning due to a conversion from signed to unsigned, although this is not a big deal - no length can be less than zero anyway.
I was not discussing number of parameters alone.
My issues with functions which take lot of parameters are as follows:
1. Function call is harder to write -- you lose track which parameter you are entering because there are so many of them.
2. Function call is harder to read -- you have to count parameters to find the one you want to look up in the documentation.
3. You have to define and initialize all parameters explicitly:
int a, b, c, d, e, f, g;
a = 3;
b = c = d = e = f = 0;
g = 1;
foo(a, b, &c, &d, &e, f, g);
While if it were a structure:
memset(&s, 0, sizeof(s));
s.a = 3;
s.g = 1;
4. Passing a single pointer to a structure is faster than passing dozens of parameters on stack (less important with 64-bit mode but still relevant).
5. You can copy a structure with parameters and change only some of those parameters easily:
d.g = 2;
While with separate parameters you have to define and initialize another set of variables unless you have a habit of calling all your variables tmp:
int a1, b1, c1, d1, e1, f1, g1;
a1 = 3;
b1 = c1 = d1 = e1 = f1 = 0;
g1 = 2;
foo(a1, b1, &c1, &d1, &e1, f1, g1);
6. With structures you can order parameters such that their memory usage is optimal.
7. With structures you can quickly dump them to a file for debugging purposes.
FILE *fp = fopen("dump.bin", "wb");
fwrite(&s, 1, sizeof(s), fp);
I personally tend to agree with linux kernel coding style -- if there is more than 5-10 local variables in a single function then something is amiss. Unfortunately, when calling IPP functions, we almost always have to use more.
Finally, regarding 64-bit parameters, I requested that in one of my issues long time ago.
the design goal of IPP was and still is to provide developers with primitive building blocks (optimized kernels) which used to construct more complicated algorithms. It might be more complex to use comparing with high level API libraries but it is performance oriented product. You may think of this parameters vs structures dilemmaas ASM vs C# programming. If programming on ASM is difficultone canuse higher level languages. Looking at Intel Performance Libraries past - it is possible to implement IJL (old Intel JPEG Library) and IPL (old Intel Image Processing Library) with IPP and enjoy with higher level API ease of use. But it is not possible to access low-level kernels from IJL or IPL libraries in case you need the max performance for some particular needs.
Regarding 64-bit type of parameters across all IPP librray I do not think it is right time to do that major change yet. Most of existing applications today still widely use 32-bit integers even in their 64-bit builds. Such a global changes will negativelyimpact the majority of IPP customers. And to be honest, theexamples of memset or MD5 mentioned in the thread above does not provide compelling enough reasons for such change. The main problem IPPis intendedto solve for customers is extracting close to metall performance from modern Intel platforms. Workloads processing more then 4 Gb data at onceusuallylimited by memory bandwith rather than computations. That is why splitting data and processing bysmaller chunkscan provide more performanceeven with some loop overhead.
Regarding low-level .vs. high-level -- our views obviously differ considerably so there is no point in discussing it further, at least not in this thread.
"That is why splitting data and processing bysmaller chunkscan provide more performanceeven with some loop overhead."
You are only looking at one side of an equation.
- Primary goal of using a performance library is to get better performance.
- Secondary goal of using a library is to be able to write and maintain less code.
If we have to allocate and manage buffers, to do pointer arithmetic, and to call multiple functions to accomplish a single and simple task (such as calculating MD5 checksum or encrypting/decrypting a file with AES) then we cannot accomplish this secondary goal by using your library.
Please note that I am not saying that you should remove existing APIs which use 32-bit lengths -- what I am asking for is a way to utilize memory mapped file I/O when using certain IPP functions which I mentioned earlier. Unfortunately, without 64-bit lengths that is not possible.
Memory mapped file I/O is preferred for large amounts of data for several reasons:
- It abstracts memory allocation and pointer arithmetic -- you only have one pointer, there are no additional buffers and no need for data splitting. An operating system is managing the paging of data according to the amount of memory present in the target system so there is also no need to know the amount of available memory.
- It significantly reduces the cost of file operations -- there are no reads and writes to intermediary buffers, there is no needless caching on several API levels, there are no multiple system calls per operation and finally no data copying between kernel and user space.
I really don't see how adding APIs with 64-bit lengths could negatively affect other IPP customers?
your second goal can be accomplished by developing some high-level layer on top of IPP library to abstract from IPP's low level details and hide these memory allocations and pointer arithmetic under some nice and esae of use high level API. That is how you can still use IPP to obtain better performance and write/maintain less code in future.
Writing additional glue/wrapper code in order to be able to use a library goes against the principle of having and using a library in the first place. Library serves to minimize the need for developing your own code -- library should not require you to develop additional code before its use becomes convenient.
Hiding allocation and pointer arithmetic (as opposed to eliminating them by using memory mapped file I/O) will not improve performance.
The architecture and design of IPP library is pretty well aligned with principles which were put for this product. I think I've already mention these principles so I do not see reason to repeat it again.
Hiding allocation and other staff in high level code by itself will not and should not improve performance. The point is that you can build higher level on top of low level library. If this higher level layer is designed well then you hopefully will not loose performance provided by optimized low level library and have a benefit of easier of use high level API. That is how complex software stacks are usually build in today's environment.
Thanks for your suggestion on the 64 bit length support. We noticed a few other feedbacks on that, and may consider it in the future release. To understand the important functions in the customer application, could you provide the list of the function used your application? You can use the following tool to find IPP APIs your application used:
If you do not want to publish it, you can submit it into our premier support website.
Personally, I would consider all IPP functions that could process data directly from a file, or whose output could be written directly to a file. An example of such functions would be all compression, cryptography and checksum calculating functions but there may be functions from other domains as well.