ippiTranspose_32f_C1R: Wrong output

ZVere · ‎07-09-2019

Hello,

Based on Intel's example, I used the following code:

Ipp32f src[8*4] = {1, 2, 3, 4, 8, 8, 8, 8,

1, 2, 3, 4, 8, 8, 8, 8,

1, 2, 3, 4, 8, 8, 8, 8};

Ipp32f dst[4*4];

IppiSize srcRoi = { 4, 4 };

ippiTranspose_32f_C1R ( src, 8, dst, 4, srcRoi );

The output is:

{ 1, 2, 3, 4,

8, 8, 2, 0xCCCCCCCC,

0xCCCCCCC, 0xCCCCCC, 0xCCCCCCCC, 0xCCCCCCCC,

0xCCCCCCC, 0xCCCCCC, 0xCCCCCCCC, 0xCCCCCCCC}

Can you please explain what is wrong in my code ?

The original code is using Ipp8u.

Thank you,

Zvika

Ivan_G_Intel1 · ‎07-10-2019

Hello, Zvika!

The "srcStep" and "dstStep" parameters are in bytes. The Ipp8u data type is 8-bit (1 byte) long, so the code in example works as intended. The Ipp32f data type is 32-bit, i.e. 4 bytes, long. So you should multiply the distance in elements by sizeof(Ipp32f).

Here is an example:

Ipp32f src[8 * 4] = { 1, 2, 3, 4, 8, 8, 8, 8,
                      1, 2, 3, 4, 8, 8, 8, 8,
                      1, 2, 3, 4, 8, 8, 8, 8,
                      1, 2, 3, 4, 8, 8, 8, 8 };
Ipp32f dst[4 * 4];
IppiSize srcRoi = { 4, 4 };
ippiTranspose_32f_C1R(src, 8*sizeof(Ipp32f), dst, 4*sizeof(Ipp32f), srcRoi);

Always glad to help.

Best regards,

Ivan Galanin.

ZVere · ‎07-10-2019

Hi Ivan,

Thank you very much. Works great !

In case I need an in-place transpose, with: ippiTranspose_32f_C1IR, I can use only src [4 * 4], roi [4, 4]

Am I right ?

Best regards,

Zvika

Ivan_G_Intel1 · ‎07-10-2019

Zvika,

For in-place operations, roiSize.width must be equal to roiSize.height (meaning roi must be a square). src width and height equality is not required.

So, even for in-place operation src [8 * 4] is correct, but roi [8 * 4] will be incorrect.

Always glad to help.

Best regards,
Ivan Galanin.

ZVere · ‎07-10-2019

Hi Ivan,

My src has 512 rows X 4096 columns of float Ipp32f . A row is consecutive in RAM.

I want to transpose the left up corner of this matrix which is: 379 rows X 2400 columns.

Ipp32f src [4096 * 512];

Ipp32f dst[4096 * 512];

IppiSize srcRoi = { 2400, 379 };

ippiTranspose_32f_C1R(src, 4096*sizeof(Ipp32f), dst, 4096*sizeof(Ipp32f), srcRoi);

I got exception in ippiTranspose_32f_C1R.

Can you please tell what is wrong in my code ?

Thank you,

Zvika

ZVere · ‎07-10-2019

Hi Ivan, All,

I think I found my mistake:

In order to transpose 2D matrix with:

#define COLS 2400

#define ROWS 379

ippiTranspose_32f_C1R(src, COLS * sizeof(Ipp32f), dst, ROWS * sizeof(Ipp32f), srcRoi);

Thank you,

Zvika

Ivan_G_Intel1 · ‎07-12-2019

Hello, Zvika,

The parameters srcStep and dstStep are distances, in bytes, between the starting points of consecutive lines in the source and destination images/matrices respectively (i.e. columns count multiplied by the size of a data type). In the previous message your code was:

ippiTranspose_32f_C1R(src,  4096*sizeof(Ipp32f), dst, 4096*sizeof(Ipp32f), srcRoi);

Which is correct if source and destination images/matrices have 4096 columns.
The ROI can be smaller than the whole image/matrix, but in this case you still have to use image/matrix width in a calculation of srcStep and dstStep.

The problem is that your destination height is smaller than the ROI width (it's not about memory amount you use, but how you use it and how you and the function handle it). Formally, the "srcRoi.width" number of columns are copied to rows in a resulting image/matrix (2400 into 512).
Because of that the result of the transpose is unpredictable for different ROI dimensions and can result in an error (as it does).

Here is an example that works:

	Ipp32f* src = ippsMalloc_32f(4096 * 512 * sizeof(Ipp32f));
    //standard memory allocation, same for the source
	Ipp32f* dst = ippsMalloc_32f(4096 * 512 * sizeof(Ipp32f));
	IppiSize srcRoi = { 2400, 379 };

    //handle the memory as if the destination image dimensions were 512 columns and 4096 rows
	ippiTranspose_32f_C1R(src, 4096 * sizeof(Ipp32f), dst, 512 * sizeof(Ipp32f), srcRoi);

	ippsFree(src);
	ippsFree(dst);

here is more optimized version that can be more illustrative:

	int srcStep, dstStep;
	// The values of srcStep and dstStep are calculated automatically
	// by ippiMalloc functions, if you check them, they will be
	// 4096 * sizeof(Ipp32f) and 
	// 512 * sizeof(Ipp32f) respectively
	Ipp32f* src = ippiMalloc_32f_C1(4096, 512, &srcStep);
	Ipp32f* dst = ippiMalloc_32f_C1(512, 4096, &dstStep);
	IppiSize srcRoi = { 2400, 379 };

	ippiTranspose_32f_C1R(src, srcStep, dst, dstStep, srcRoi);

	ippiFree(src);
	ippiFree(dst);

So, the destination height must be greater than or equal to your ROI width.

Always glad to help.

Best regards,
Ivan Galanin.

ZVere · ‎07-12-2019

Hi Ivan,

Thank you very much for detailed explanation.

Currently, there is no "ippiTranspose_32f_C2R".

We need it to transpose a 2D complex float matrix. Our signal processing is heavily based on complex numbers.

I wrote such transpose but my code is "naive". I'm sure it can be optimized.

Can Intel consider developing an "ippiTranspose_32f_C2R" ?

Or - If I will get the ippiTranspose_32f_C1R maybe I can port the code to ippiTranspose_32f_C2R.

Best regards,

Zvika

Igor_A_Intel · ‎07-15-2019

Hi Zvi,

in IPP notation your request will be "to develop ippiTranspose_32fc_C1R", that is the same as ippiTranspose_64f_C1R - that we will consider as a feature request. We deprecated and removed "complex" image support in IPP 9.0, therefore the flavor will be 64f_C1R - it can be easily type-casted to 32fc_C1R.

regards, Igor