I'm working on a DFT filter for images using a Real filter.
I forward transform the image using ippiDFTFwd_RToPack_32f_C1R(). The result is packed complex.
I then want to multiply my filter into the transform, but there is a problem: I don't know how to create a packed filter, and there is no variant of ippiMulPack_32f that takes an non-packed filter as source.
Of course, I could create a complex filter, then convert that to packed complex, but that extra step is costly.
Is there an ippiMulPack that takes unpacked real input?
Does anybody know how to write a packed complex image?
(w*h 32fc image array)
I would not say "well described" for Intells explanation/documentation of their RCPack2D format.
The PDF indicates shaded colum only for Even, but there is no shaded column...
Also, I was looking for a quick solution, on how to take an Ipp32f image (data is Real, Im=0), and shuffle that into a packed Ipp32f.
Normally, I'd look elsewhere, but since this format is from Intel, I'm hoping they have a quick solution, and of cource, all in the light of speed/high performance.
No, I mean, instead of Create Real Filter Cplx, followed by ippiCplxExtendToPack, followed by ippiMulPack_32f, I'm looking to see if there is a way to skip one step, doing Create Real Filter 32f, followed by some unknown function ippiMulRToPack_32f.
This way, the filter uses less memory, ippiCplxExtendToPack does not have to be called == more performance.
Alternatively, I there is a function to pack a Real image, or if I could write my own function (I can't, because documentation is too low), then I'd just use the packed functions directly, also with high performance.
Your 90 lines of code are what prevents me to start recoding that myself!
I still feel its a pity Intel does not supply this for us. Looking at UIC, Intel hands out tons of code to implement image compression. For Fourier Transforms, Intel does not really have a good sample. Of course there is some sample code, but that is more for the beginning, on how to prepare the buffers and call the functions.
I really miss a good DFT/FFT sample, one that does some optimized image filtering, for instance, a Butterworth sharpening filter. One that includes all border management (still a problem for many), and a useful real to packed function.
Anyway, you've told me about your copyrighted code, so what now? Can you give a hint with some pseudo code? :))
We've developed (a collegue) the transformation based on the information of RCPack2D format described in ippiman.pdf (March 2009) on page 718-719. There is a table on how the data is located in memory. I don't think that there is a better way to describe the format than with a table which is good to imagine.
The 90 lines of code include also
Maybe you need none of these points.
I only need the first point.
In the PDF, "Table 10-5, RCPack2D Storage for Even Number of Rows", I see that the first row is boxed, why is that?
If I ignore the box, I see an incomplete pattern:
First row (odd rows):
Re A(0,0) Re A(0,1) Im A(0,1) ... Re A(0,(N-1)/2) Im A(0,(N-1)/2) Re A(0,N/2)
How am I supposed to know what is in the "..." ?
If it is A(I,J), I can see that I is 0 for the row.
I can see that J goes 0,1,1, but what comes after that? I don't like guesswork in my programming!
Could it be (j+1)/2, where j=0..N/2 ?
Then the selection of Re/Im components. I seen Re, Re, Im, ..., how am I supposed to known what is in "..."?
Is the next Im or Re?
Is the full pattern Re, Re, Im, Im, Re, Re, Im, Im etc, or is it Re, Re, Im, Re, Re, Im, etc?
Any help highly appreciated!
You think completely in the wrong direction.
Let's look at first row: Re Re Im .... Re Im Re -> the "..." here means pairs of Re/Im for each row index, only the first and the last element are real parts.
Now look at the columns from row 2 on, the first and last column are pairs of Re/Im while the columns between are completely real or imaginary parts.
And now the last point for the even # of rows the last row is symmetric to the 1. one.
I guess you mean "means pairs of Re/Im for each column index...", no?
Lets for arguments sake say that N=32, therefore there are 16 columns in the packed array (even width). Is the first row then this (according to you, not according to the PDF)? :
Re(0,0), Re(0,1), Im(0,1), Re(0,2), Im(0,2), Re(0,3), Im(0,3), Re(0,4), Im(0,4), Re(0,5), Im(0,5), Re(0,6), Im(0,6), Re(0,7), Im(0,7), Re(0,8)
The PDF does not say this.
>> "means pairs of Re/Im for each column index..." yes of course
>> Lets for arguments sake say that N=32, therefore there are 16 columns in the packed array (even width).
No, if N==32, then 30 columns are in the packed array, just Im(0,0) and Im(0,N/2) are removed
Ahh, yes, source items are Re+Im.
So, for N=32, there are 32 source items (with Re+Im) in a row, and the packed array will have 32 float items (packed array has same number of floats as source complex array).
I guess I'll have to decide speed from between writing my filter directly to a packed array versus writing a simple float array, and then calling ippiCplxExtendToPack. Since I only do one single image when filtering, the computation always includes generating the filter.
- Generate Real filter
- Generate packed filter
(C) future, whished for, IPP function
- Generate Real filter
Which one would be fastest?
>>Which one would be fastest?
Of course it will be (A) or (B)
I need to generate the filter for every image since each image has another (unknown) dimension.
I guess I should decide if
- Generate Real filter Cplx
is slower/faster than
- Generate packed filter
Since ippiDFTFwd_RToPack+ippiMulPack_32f+ippiDFTInv_PackToR is rather time consuming anyway, I guess I might as well use (A), and drop my ideas of generating a packed filter.
I do not immediately agree that 4 multiplications can be done at once in SSE because complex mulitiplication is not just using multiply, but also add.
In fact, since multiplying a real number to a complex number only involves multiplying the real number into each of Re, Im, that would be very fast if it was supported by IPP.
Of course complex multiplication can be fast (see entry "Complex multiply by conjugate" within this forum where I posted some SSE3 code doing this.
>>In fact, since multiplying a real number to a complex number only involves multiplying the real number into each of Re, Im, that would be very fast if it was supported by IPP.
No because your real data buffer has a completely different data aligning compared with the packed data. So you need a loop with jumps or an aligning correction which will be slower than just streaming data through the SSE unit.
The slowest step in your code is the DFT. What you need is a FFT for arbitrary sizes, which isn't supported in IPP. Maybe you should take a look at FFTW.
>What you need is a FFT for arbitrary sizes, which isn't supported in IPP. Maybe you should take a look at FFTW.
Maybe you can let us know some details about the limitation of DFT functions:
ippiFFTFwd_/ippiFFTInv_ functions needs the data lengths must be integer powers of 2.
ippiDFTFwd_/ippiDFTInv_ functions, the N and M can take on arbitrary integer non-negative values.
The limitation of the DFT is its complexity O(N^2), while FFT has a complexity of O(N log N). For more details about arbitrary length FFT's, you can take a look at wikipedia (prime-factor fft algorithm)
Since I know that IPP convolution filters will internally use Fourier transforms if it determines it would be faster, I'd like to know if IPP DFT internally uses FFT for the same reasons?
Those internal IPP decisions are also a bit less documented. Is there a specific location for more docs?
The name "Convolution" does not reason to the internal algorithm which is used. The name "DFT" of course. I wouldn't name it DFT if it is a FFT in some cases :)
But will be easy to find out, just take an image with a power of 2 for width/height and test the DFT/FFT
With convolution I mean ippiFilter32f. If I'm informed well, it uses DFT/FFT internally, if its size is larger than X.
I did take a look at FFTW, but it has a price tag for non-GPL applications.