Processors
Processors (Intel® Core™, Intel® Xeon®, etc); processor utilities and programs (Intel® Processor Identification Utility, Intel® Extreme Tuning Utility, Intel® Easy Streaming Wizard, etc.)
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
11687 Discussions

Optimizing the backward solve for a sparse lower triangular linear system

watch_the_crab
Beginner
244 Views

I have a routine for the back solve as follows:

void backsolve(const int*__restrict__ Lp, const int*__restrict__ Li, const double*__restrict__ Lx, const int n, double*__restrict__ x) { for (int i=n-1; i>=0; --i) { for (int j=Lp[i]; j<Lp[i+1]; ++j) { x[i] -= Lx[j] * x[Li[j]]; } } }

compiling with gcc-8.3 -mfma -mavx -mavx512 results in

backsolve(int const*, int const*, double const*, int, double*): lea eax, [rcx-1] movsx r11, eax lea r9, [r8+r11*8] test eax, eax js .L9 .L5: movsx rax, DWORD PTR [rdi+r11*4] mov r10d, DWORD PTR [rdi+4+r11*4] cmp eax, r10d jge .L6 vmovsd xmm0, QWORD PTR [r9] .L7: movsx rcx, DWORD PTR [rsi+rax*4] vmovsd xmm1, QWORD PTR [rdx+rax*8] add rax, 1 vfnmadd231sd xmm0, xmm1, QWORD PTR [r8+rcx*8] vmovsd QWORD PTR [r9], xmm0 cmp r10d, eax jg .L7 .L6: sub r11, 1 sub r9, 8 test r11d, r11d jns .L5 ret .L9: ret

Vtune says the line

vmovsd QWORD PTR [r9], xmm0

is taking the bulk of the time here. I asked on stackoverflow (https://stackoverflow.com/questions/60232977/optimizing-the-backward-solve-for-a-sparse-lower-triang...), and the answers I got seem to suggest that there is not much I can do to speed up the function. Using MKL for the solve was also much slower.

 

System: Xeon Skylake

 

Appreciate any insight or suggestions!

0 Kudos
1 Solution
AlHill
Super User
123 Views

Wrong forum. You want to be [somewhere] here: https://software.intel.com/en-us/forum

 

Doc

 

View solution in original post

1 Reply
AlHill
Super User
124 Views

Wrong forum. You want to be [somewhere] here: https://software.intel.com/en-us/forum

 

Doc

 

View solution in original post

Reply