Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Is there a more efficient way for element-by-element multiplication for matrix?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Lewis__Rubin

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-19-2016
05:29 AM

117 Views

Is there a more efficient way for element-by-element multiplication for matrix?

Hi, everyone.

I was writing some FORTRAN codes for computing the **DOT of two matrix** which idea is as same as dot of vectors. Firstly, it calculates the element-wise product of the two matrix. Secondly, it calculates the sum of all the elements of the matrix returned in the first-step calculation. So I thought about two ways that would help.

In the first way, I will transfer the two matrix to respective vectors. And then, I can use the "**dot subroutine of vectors**" in BLAS(MKL) directly. Considering some specific problems, I prefer matrix calculation to vector calculation. In the second and prefered way, I directly calculate the element-by-element product of the two matrix, and then sum up all the elements of resulting product matrix using the "sum subroutine".

However, I doubt if the two alternative solutions is most efficient, since the matrix are extremly large.

Any suggestions?

Link Copied

4 Replies

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-20-2016
06:58 PM

117 Views

Hi,

Could you please tell some information like

How the matrix is stored? in continuous way or not. and what is the matrix's size ?

Your OS and cpu processor etc, Intel fortran compiler ?

do you use threaded MKL (mkl_intel_thread.x) or sequential MKL (mkl_sequential)

In generally, the continuous matrix in Fortran should be same as vector, so you can use the MKL dot of vector.

And the MKL dot are threaded. it may run in parallel in multi-core machine.

Threaded BLAS Level1 and Level2 Routines

In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the

value of s, d, c, or z.

The following routines are threaded with OpenMP* for Intel® Core™2 Duo and Intel® Core™ i7 processors:

• Level1 BLAS:

?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot

• Level2 BLAS:

?gemv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv

Regarding "directly calculate the element-by-element product of the two matrix, and then sum up all the elements of resulting product matrix using the "sum subroutine". so are they two loops to element-by-element product with Intel Fotran compiler , then sum function?

Considering Intel Fotran compiler can optimize such loop code, like Some fortran routine or Array notation https://software.intel.com/en-us/articles/explicit-vector-programming-in-fortran, ; whatever your matrix looks like, you may compare two implementation and select the better performance.

Best Regards,

Ying H.

Intel MKL Support

Lewis__Rubin

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-29-2016
07:57 PM

117 Views

Thanks for your suggetions an very sorry for the late response.

Here is my supplementary information acording to your request.

Most of the matrices are stored in a discontious way. And the matrix's size might be up to, for example, 100*100*80, such as **REAL(8) :: A(100,100,80)**. So does that mean the optimization for the loop won't work?

At present I only consider **the ****sequential MKL**. But in the future i might have to use the parallel MKL in case of the low efficiency.

My program will run in Windows OS. And I built it with **Intel® Parallel Studio XE Cluster Edition for students.( https://software.intel.com/en-us/qualify-for-free-software/student ).**

** **

Many thanks in advance.

** **Rubin.

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-30-2016
01:21 AM

117 Views

Hi Rubin,

Thank you for the information. as you understand, you may want to do bunch of **REAL(8) :: AI(100,100,80) , REAL(8) :: BI(100,100,80) sum( AI*.BI), right? ** Then I may suggest you try some MKL function(example) in MKL install directory, for example,

ddotx.f

ddot(100x100x80, A1(:,:,:), 1, B1(:,:,:), 1)

and if these operation are batched, you can consider combine these matrix to do

one dgemm

or batched dgemm.

dgemm_batchx.f

etc.

the optimization for the loop can work. and ** both sequential MKL and ** parallel MKL can work also. If you work on multi-core cpu, the parallel MKL may have better efficiency.

Best Regards,

Ying

Lewis__Rubin

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-30-2016
04:58 PM

117 Views

Ying,

Thanks. I will try this: **ddot(100x100x80, A1(:,:,:), 1, B1(:,:,:), 1)**.

Rubin.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.