Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development Topics
- Intel® Moderncode for Parallel Architectures
- matrix multiply parallel programming

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

zhangzhe65

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-16-2009
10:50 AM

100 Views

matrix multiply parallel programming

void matrixp(int a

{

int i,j,k,sum=0;

#pragma omp parallel private(i,j,k) // reduction(+:sum)

for(i=0;i

sum=0;

//c

#pragma omp for reduction(+:sum)

for(k=0;k

sum+=a

c

}

}

}

Link Copied

9 Replies

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-16-2009
11:51 AM

100 Views

You would want at least to attempt improvement on the code which you would get from icc -O3 -parallel operating on source code such as you would find in libgfortran matmul, perhaps looking at both float and int data types, noting that this code expects you to substitute a netlib threaded BLAS equivalent when the problem size exceeds a threshold such that threading would become useful.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-16-2009
06:39 AM

100 Views

Quoting - tim18

You would want at least to attempt improvement on the code which you would get from icc -O3 -parallel operating on source code such as you would find in libgfortran matmul, perhaps looking at both float and int data types, noting that this code expects you to substitute a netlib threaded BLAS equivalent when the problem size exceeds a threshold such that threading would become useful.

tmpmat=transpose(vy(:25,:25))

c$omp parallel do private(j,i,tmp,tmp1)

do j= 1,n

i= 1

if(iand(25,1) == 1)then

px(1,j)= px(1,j)+dot_product(tmpmat(:,1),cx(:,j))

i= 2

endif

do i= i,24,2

tmp= 0

tmp1= 0

do k= 1,25

tmp= tmp+tmpmat(k,i)*cx(k,j)

tmp1= tmp1+tmpmat(k,i+1)*cx(k,j)

enddo

px(i,j)= px(i,j)+tmp

px(i+1,j)= px(i+1,j)+tmp1

enddo

enddo

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-16-2009
03:09 PM

100 Views

Tim,

I think you have a cut and paste problem

is a constant expression that is always true

Jim Dempsey

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-16-2009
03:20 PM

100 Views

Tim,

I think transposition of larger matricies will work well too provided you do not transpose the entire matrix at once.

i.e. transposing the number of columns that fit within the SSE registers may have merrit(2 for REAL(8) and 4 for REAL(4), and double that when AVX comes out). This would be true if you can pipeline the transpostion with the now vectorized multiplication. The ranspositioning can be done using the integer instruction set while the multiplication can use the SSE FP path. On HT systems this might show additional improvements.

Jim Dempsey

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-16-2009
04:43 PM

100 Views

is a constant expression that is always true"

Yes, I put this in just so I don't forget that this line is conditional on the size of the matrix.

Thanks,

Tim

Tudor

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-17-2009
03:06 AM

100 Views

[cpp]void OpenMPMatrixMultiply() { int i, j, k; #pragma omp parallel for private(j, k) for (i = 0; i < size1; i++) { for (j = 0; j < size3; j++) { int partial = 0; for (k = 0; k < size2; k++) { partial += matrix1* matrix2 ; } result1 += partial; } } }[/cpp]

Just place your matrices as parameters.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-22-2009
02:35 PM

100 Views

Quoting - tim18

do k= 1,25

tmp= tmp+tmpmat(k,i)*cx(k,j)

tmp1= tmp1+tmpmat(k,i+1)*cx(k,j)

enddo

tmp=dot_product(tmpmat(:,i),cx(:,j))

on Core 2. Register re-use apparently out-weighs lack of vectorization, with multi-threading.

Even on the Core i7, the vector code doesn't come out ahead when running 4 threads. On that platform, the compiler shouldn't be worrying about the alignments, yet it still reserves vectorization for the single dot product.

I wrote an SSE2 intrinsics version, but it's not worth the trouble.

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-23-2009
11:05 AM

100 Views

Quoting - tim18

tmp=dot_product(tmpmat(:,i),cx(:,j))

on Core 2. Register re-use apparently out-weighs lack of vectorization, with multi-threading.

Even on the Core i7, the vector code doesn't come out ahead when running 4 threads. On that platform, the compiler shouldn't be worrying about the alignments, yet it still reserves vectorization for the single dot product.

I wrote an SSE2 intrinsics version, but it's not worth the trouble.

Tim,

When the first dimension is even (real*8) or multiple of 4 (real*4), and tmpmat aligned to cache line, is the SSE of the loop you posted faster/slower?

Jim Dempsey

mahmoudgalal1985

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-26-2009
05:35 AM

100 Views

Quoting - Tudor

[cpp]void OpenMPMatrixMultiply() { int i, j, k; #pragma omp parallel for private(j, k) for (i = 0; i < size1; i++) { for (j = 0; j < size3; j++) { int partial = 0; for (k = 0; k < size2; k++) { partial += matrix1* matrix2 ; } result1 += partial; } } }[/cpp]

Just place your matrices as parameters.

Thank you ,works well...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.