I stumbled upon an issue when solving a symmetric positive definite system using trsv. The matrix is already factorized, the issue is with the solution - trsv with a transposed factor followed by another trsv. The second trsv segfaults in a very specific situation.
See https://github.com/jakub-homola/onemkl_trsv_issue for an example of the code (the relevant part starts at line ~150 in `source.cpp`). To run it, simple `make run` should be enough. I am using the intel oneAPI devcloud, but experienced the exact same problem on our cluster as well.
I am using the buffer versions of the `oneapi::mkl::sparse::trsv` function. I create a CSR matrix from buffers and a right-hand-side vector. The buffer sent to `trsv` as the right-hand-side vector is a subbuffer of a rhs vector buffer. I call the first `trsv` with the subbuffer as the rhs and a temporary buffer as the solution vector. Then the second `trsv` is called with the temporary buffer as the rhs and the mentioned subbuffer as the solution vector. Before the first call to `trsv`, there is a simple sycl kernel that modifies the first value of the rhs subbuffer data (with a read_write accessor). This code segfaults when run (`./program.x 1 0`). Now the fun part:
- If I don't submit the one small kernel that modifies the rhs subbuffer, the code runs fine.(`./program.x 0 0`)
- If I don't use the temporary buffer, but use the subbuffer as both input and output argument of trsv, the code runs fine (`./program.x 1 1`)
- If I use a regular buffer instead of a subbuffer, the code runs fine (`./program.x 1 2`)
- If I don't call the first `trsv`, just the second one, the code runs fine (`./program.x 1 3`)
This behaviour seems very strange to me. Am I misunderstanding or missing some major detail, or is there a bug? I would be glad for any help with this.
If I use valgrind on it, the segfault is somewhere in the second call to trsv, I am not able to debug this.
I use a cpu selector, so the cpu is the device used. All integers used are 32-bit, I use and link with 32-bit-integer MKL. I use the 2023.0 version on the devcloud (but 2022.1.0 we have on our cluster shows the same behaviour).
Thanks for reaching out to us about your issue. I see you mentioned about linking to 32-bit oneMKL, and I also see it in your Makefile#L5 . Our SYCL functionality doesn't support linking to the 32-bit library (-lmkl_intel_lp64). As of now you must link to the 64-bit library (-lmkl_intel_ilp64); you can still use 32-bit integers through our SYCL APIs with that. Please use the exact compilation lines provided by the Intel Link Line Advisor to properly set the compiler flags when compiling and linking against oneMKL.
Can you please let us know if making this change fixes your issue?
Thank you for the response. I didn't know I cannot link with the 32-bit MKL. I updated the Makefile such that 64-bit MKL is used.
However, the problem still persists, in the exact same form as described in the original post.
I would be glad for additional help.
Looking deeper into your code, we see that your code block under "trsv_args_variant==1" would never give you correct results (even if it appears to run to completion in your case). What you are trying to do in that code block is pass the RHS and solution vector as the same buffer; that is not an acceptable use case for sparse::trsv(); The RHS and solution vectors must point to different memory locations. We are continuing to look into your code further, but meanwhile, following are a few more things you could look into on your end:
If oneapi::mkl::diag::nonunit is selected, all diagonal values must be present in the sparse matrix. This is not necessary for the oneapi::mkl::diag::unit
Could you please confirm here if you have all diagonal values in your sparse matrix, since you are using diag::nonunit? The trsv() documentation also says that transpose operation is currently not supported (but it may work in this particular case of using CPU device; we are looking into it and will get back to you about this.). Is TRSV with transpose operation an important part of your planned application?
Another thing you could try is calling oneapi::mkl::sparse::optimize_trsv() once before each of your first calls to TRSV for trans and non-trans operations and see if that helps.
Please know that the way you are using oneapi::mkl::sparse::set_csr_data and oneapi::mkl::sparse::release_matrix_handle APIs is now deprecated (although it should currently work just fine); please see the set_csr_data and release_matrix_handle documentation for the new usage. We would also recommend using USM APIs instead of sycl::buffer APIs, although either should work for now.
> pass the RHS and solution vector as the same buffer; that is not an acceptable use case for sparse::trsv()
Thanks for that information. I was actually looking for in the specification if I can do that or not, but could not find anything. Neither here I could not find it. Experimentally it worked and gave me correct results in another project. I actually asked a question on the oneapi spec github to find out if I can actually use trsv that way or not, with no response yet. Could you please direct me to where I can find the information that I cannot use trsv in this way?
> Could you please confirm here if you have all diagonal values in your sparse matrix?
I can confirm that. The upper-triangular factor I am using is the matrix U.txt also located in the repository, and it contains all diagonal values.
> The trsv() documentation also says that transpose operation is currently not supported
Yes, I noticed it now. I tested it on a GPU, and the program failed with an exception that transposition is not supported. But using a CPU it worked fine, so I kept using it.
> Is TRSV with transpose operation an important part of your planned application?
Well, this is one of the algorithms for direct solution of a sparse matrix system, as far as I know. You do a cholesky factorization of an SPD matrix, get the upper (or lower) triangular factor, and then solve the system using two calls to trsv. From A*x=b you get Ut*U*x=b (where Ut means U transposed). Solving this involves two calls to trsv, one using the transposed U, and the other one using the non-transposed U. I understand I can use pardiso to solve the whole system on a cpu, but I have a usecase where I need to perform the factorization in a separate library and then solve the system using onemkl. So yes, TRSV with transposed matrix is an important part of my planned application. (although I could probably just manually transpose it and call trsv with nontrans, but it just doesn't seem right)
> Another thing you could try is calling oneapi::mkl::sparse::optimize_trsv()
I tried that, and the segfault now actually occured in the optimize_trsv function before the second call to trsv.
> Please know that the way you are using oneapi::mkl::sparse::set_csr_data and oneapi::mkl::sparse::release_matrix_handle APIs is now deprecated
Thanks, I did not notice that yet. I will take that into consideration.
> We would also recommend using USM APIs instead of sycl::buffer APIs
I considered the two approaches equivalent, so I chose the buffer approach. I will find more info about the USM approach.
Thank you very much for your response, you have been incredibely helpful in making me a better sycl+onemkl programmer already.
PS: I am relatively a newbie in onemkl and sycl, so sorry for all the beginner mistakes
PS: this forum should support Markdown, as it is practically a standard today