Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

simatcopy VS somatcopy performance

JoaoAlves95
Novice
172 Views

Good Afternoon,

I've noticed that simatcopy outperforms somatcopy for n x n square matrices. Only the execution time of simat/somat function was measured. These functions were called with the following parameters:

  mkl_simatcopy('R' /* row-major ordering */,
                               'T' /* A will be transposed */,
                               n /* rows */,
                               n /* cols */,
                               1. /* scales the input matrix */,
                               src /* source matrix */,
                               n /* src_stride */,
                               n /* dst_stride */);

  mkl_somatcopy('R' /* row-major ordering */,
                                'T' /* A will be transposed */,
                                n /* rows */,
                                n /* cols */,
                                1. /* scales the input matrix */,
                                src /* source matrix */,
                                n /* src_stride */,
                                dst /* destination matrix */,
                                n /* dst_stride */);

 

From what I understood in-place matrix transposition should be less efficient than its out-of-place counterpart. Isn't this true for square matrices?

I would also appreciate any insight on simatcopy memory complexity and which optimization techniques  were used on this function.

 

Best Regards,

João Alves

 

0 Kudos
0 Replies
Reply