Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® Integrated Performance Primitives
- Most Optimized way to make operations on column of matrix

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Itzhak_B_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-29-2013
03:56 AM

110 Views

Most Optimized way to make operations on column of matrix

Hi All.

I have matrix m(r,c).

I need to make operations on each column of matrix.

I need to calculate mean of each column, substract it from the column and make fft on the column.

What is best optimized way to do it using IPP?

The subcode of what I need to do:

{

static const unsigned ROWS = 8192, COLUMNS = 4*4096*64;

float m[ROWS ][COLUMNS ];

float sum = 0, mean[COLUMNS];

unsigned r, c;

// calculating mean;

for (c = 0; c < COLUMNS ; ++c) {

sum = 0;

for (r = 0; r < ROWS ; ++r)

sum += m

mean

}

// substraction mean

for (c = 0; c < COLUMNS ; ++c) {

for (r=0; r < 64; ++r)

m

}

// calculate fft on each column of matrix

...

}

It was simple to do all that on row of matrix because IPP function use input parameter as array of float.

So One way is to do code above is just transpose the matrix and make all the operations( mean, substract and fft) on trasposed matrix.

But it seems heavy operations.

There is some IPP functions that can make operations (mean, substract and fft) on column of matrix.

Thank you,

Itzhak

Link Copied

5 Replies

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-29-2013
07:22 AM

110 Views

Itzhak_B_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-01-2013
01:14 AM

110 Views

OK. Thank you.

I will try do it by transposing the matrix.

Igor_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-05-2013
12:59 AM

110 Views

hi Itzhak,

it's better (from the performance point of view) to perform transpose of 8 or 16 columns at once to a temporal buffer and then execute all required operations in this buffer - in such way you'll guarantee data locality in L1. 8 or 16 depends on your data - complex or real - in order to load 64 aligned bytes from each row - it's L1 cache row width. IPP realization of 2D FFT uses such technique internally.

regards, Igor

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-05-2013
05:27 AM

110 Views

Itzhak_B_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-11-2013
12:53 AM

110 Views

Igor Astakhov (Intel) wrote:

hi Itzhak,

it's better (from the performance point of view) to perform transpose of 8 or 16 columns at once to a temporal buffer and then execute all required operations in this buffer - in such way you'll guarantee data locality in L1. 8 or 16 depends on your data - complex or real - in order to load 64 aligned bytes from each row - it's L1 cache row width. IPP realization of 2D FFT uses such technique internally.

regards, Igor

Igor, Thank you.

It will not improve perfomance a lot because I can use temporal buffer only for transpose and calculating mean of 16 columns.

In order to subtract mean and calculate FFT I need to use all columns.

Regards,

Itzhak

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.