Solved: Re: second call to oneapi::mkl::sparse::trsv segfaults in a specific case

Jakub_H · ‎01-07-2023

Hello,

I stumbled upon an issue when solving a symmetric positive definite system using trsv. The matrix is already factorized, the issue is with the solution - trsv with a transposed factor followed by another trsv. The second trsv segfaults in a very specific situation.

See https://github.com/jakub-homola/onemkl_trsv_issue for an example of the code (the relevant part starts at line ~150 in `source.cpp`). To run it, simple `make run` should be enough. I am using the intel oneAPI devcloud, but experienced the exact same problem on our cluster as well.

I am using the buffer versions of the `oneapi::mkl::sparse::trsv` function. I create a CSR matrix from buffers and a right-hand-side vector. The buffer sent to `trsv` as the right-hand-side vector is a subbuffer of a rhs vector buffer. I call the first `trsv` with the subbuffer as the rhs and a temporary buffer as the solution vector. Then the second `trsv` is called with the temporary buffer as the rhs and the mentioned subbuffer as the solution vector. Before the first call to `trsv`, there is a simple sycl kernel that modifies the first value of the rhs subbuffer data (with a read_write accessor). This code segfaults when run (`./program.x 1 0`). Now the fun part:

- If I don't submit the one small kernel that modifies the rhs subbuffer, the code runs fine.(`./program.x 0 0`)

- If I don't use the temporary buffer, but use the subbuffer as both input and output argument of trsv, the code runs fine (`./program.x 1 1`)

- If I use a regular buffer instead of a subbuffer, the code runs fine (`./program.x 1 2`)

- If I don't call the first `trsv`, just the second one, the code runs fine (`./program.x 1 3`)

This behaviour seems very strange to me. Am I misunderstanding or missing some major detail, or is there a bug? I would be glad for any help with this.

If I use valgrind on it, the segfault is somewhere in the second call to trsv, I am not able to debug this.

I use a cpu selector, so the cpu is the device used. All integers used are 32-bit, I use and link with 32-bit-integer MKL. I use the 2023.0 version on the devcloud (but 2022.1.0 we have on our cluster shows the same behaviour).

Jakub

Gajanan_Choudhary · ‎02-07-2024

Hi Jakub,

This issue will be fixed in the next oneAPI toolkit release (i.e., 2024.1 release). The problem was actually in the oneAPI DPC++ compiler side, with some incorrect interaction between sub-buffers of sycl::buffers and host_task() submissions after SYCL kernels. The Intel compiler team has fixed the issue on their end, and it will be available to you in the 2024.1 oneAPI toolkit release. At least in my local tests, this issue appears to be fixed.

After the next oneAPI toolkit release, which should be in the next month or two, we will drop in a reminder for you here to check on your end if your issue is resolved. At that time, please respond back if you can on what you find.

Thanks for reaching out to us about this issue and helping us improve our products!

Regards,

Gajanan

View solution in original post

Gajanan_Choudhary · ‎01-09-2023

Hi @Jakub_H,

Thanks for reaching out to us about your issue. I see you mentioned about linking to 32-bit oneMKL, and I also see it in your Makefile#L5 . Our SYCL functionality doesn't support linking to the 32-bit library (-lmkl_intel_lp64). As of now you must link to the 64-bit library (-lmkl_intel_ilp64); you can still use 32-bit integers through our SYCL APIs with that. Please use the exact compilation lines provided by the Intel Link Line Advisor to properly set the compiler flags when compiling and linking against oneMKL.

Can you please let us know if making this change fixes your issue?

Regards,

Gajanan

Jakub_H · ‎01-09-2023

Hi,

Thank you for the response. I didn't know I cannot link with the 32-bit MKL. I updated the Makefile such that 64-bit MKL is used.

However, the problem still persists, in the exact same form as described in the original post.

I would be glad for additional help.

Thanks,

Jakub

Gajanan_Choudhary · ‎01-09-2023

Hi @Jakub_H,

Looking deeper into your code, we see that your code block under "trsv_args_variant==1" would never give you correct results (even if it appears to run to completion in your case). What you are trying to do in that code block is pass the RHS and solution vector as the same buffer; that is not an acceptable use case for sparse::trsv(); The RHS and solution vectors must point to different memory locations. We are continuing to look into your code further, but meanwhile, following are a few more things you could look into on your end:

I see you are using the transpose operation with sparse::trsv() in your code at main/source.cpp#L181-L238 . The oneMKL documentation for sparse::trsv() has a note about diagonal values:

If oneapi::mkl::diag::nonunit is selected, all diagonal values must be present in the sparse matrix. This is not necessary for the oneapi::mkl::diag::unit case.

Could you please confirm here if you have all diagonal values in your sparse matrix, since you are using diag::nonunit? The trsv() documentation also says that transpose operation is currently not supported (but it may work in this particular case of using CPU device; we are looking into it and will get back to you about this.). Is TRSV with transpose operation an important part of your planned application?

Another thing you could try is calling oneapi::mkl::sparse::optimize_trsv() once before each of your first calls to TRSV for trans and non-trans operations and see if that helps.

Please know that the way you are using oneapi::mkl::sparse::set_csr_data and oneapi::mkl::sparse::release_matrix_handle APIs is now deprecated (although it should currently work just fine); please see the set_csr_data and release_matrix_handle documentation for the new usage. We would also recommend using USM APIs instead of sycl::buffer APIs, although either should work for now.

Regards,

Gajanan

Jakub_H · ‎01-09-2023

Hi,

> pass the RHS and solution vector as the same buffer; that is not an acceptable use case for sparse::trsv()

Thanks for that information. I was actually looking for in the specification if I can do that or not, but could not find anything. Neither here I could not find it. Experimentally it worked and gave me correct results in another project. I actually asked a question on the oneapi spec github to find out if I can actually use trsv that way or not, with no response yet. Could you please direct me to where I can find the information that I cannot use trsv in this way?

> Could you please confirm here if you have all diagonal values in your sparse matrix?

I can confirm that. The upper-triangular factor I am using is the matrix U.txt also located in the repository, and it contains all diagonal values.

> The trsv() documentation also says that transpose operation is currently not supported

Yes, I noticed it now. I tested it on a GPU, and the program failed with an exception that transposition is not supported. But using a CPU it worked fine, so I kept using it.

> Is TRSV with transpose operation an important part of your planned application?

Well, this is one of the algorithms for direct solution of a sparse matrix system, as far as I know. You do a cholesky factorization of an SPD matrix, get the upper (or lower) triangular factor, and then solve the system using two calls to trsv. From A*x=b you get Ut*U*x=b (where Ut means U transposed). Solving this involves two calls to trsv, one using the transposed U, and the other one using the non-transposed U. I understand I can use pardiso to solve the whole system on a cpu, but I have a usecase where I need to perform the factorization in a separate library and then solve the system using onemkl. So yes, TRSV with transposed matrix is an important part of my planned application. (although I could probably just manually transpose it and call trsv with nontrans, but it just doesn't seem right)

> Another thing you could try is calling oneapi::mkl::sparse::optimize_trsv()

I tried that, and the segfault now actually occured in the optimize_trsv function before the second call to trsv.

> Please know that the way you are using oneapi::mkl::sparse::set_csr_data and oneapi::mkl::sparse::release_matrix_handle APIs is now deprecated

Thanks, I did not notice that yet. I will take that into consideration.

> We would also recommend using USM APIs instead of sycl::buffer APIs

I considered the two approaches equivalent, so I chose the buffer approach. I will find more info about the USM approach.

Thank you very much for your response, you have been incredibely helpful in making me a better sycl+onemkl programmer already.

Jakub

PS: I am relatively a newbie in onemkl and sycl, so sorry for all the beginner mistakes

PS: this forum should support Markdown, as it is practically a standard today

Jakub_H · ‎01-18-2023

Hi @Gajanan_Choudhary ,

is there any update on this? Are you still investigating it? Is the issue on my side, or on oneMKL's side?

It's not a big priority for me, but it would be nice to know what exactly is wrong.

Thanks,

Jakub

ShanmukhS_Intel · ‎01-17-2023

Hi Jakub,

Haa the workarounds provided helped? Could you please let us know if you need any other information?

Best Regards,

Shanmukh.SS

Jakub_H · ‎01-17-2023

Hi,

Gajanan did give me some helpful information, but not the solution, the problem is not solved yet.

Jakub

ShanmukhS_Intel · ‎01-24-2023

Hi Jakub,

Thanks for letting us know. We will get back to you soon with an update.

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎01-31-2023

Hi Jakub,

We are discussing your issue internally. We will get back to you soon with an update.

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎02-07-2023

Hi Jakub,

Our development team is investigating your issue. We will get back to you on this soon!

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎02-07-2023

Hi Jakub,

Thanks for reporting this issue. We were able to reproduce it and we have informed the development team about it.

Best Regards,

Shanmukh.SS

Gajanan_Choudhary · ‎02-07-2024

Hi Jakub,

This issue will be fixed in the next oneAPI toolkit release (i.e., 2024.1 release). The problem was actually in the oneAPI DPC++ compiler side, with some incorrect interaction between sub-buffers of sycl::buffers and host_task() submissions after SYCL kernels. The Intel compiler team has fixed the issue on their end, and it will be available to you in the 2024.1 oneAPI toolkit release. At least in my local tests, this issue appears to be fixed.

After the next oneAPI toolkit release, which should be in the next month or two, we will drop in a reminder for you here to check on your end if your issue is resolved. At that time, please respond back if you can on what you find.

Thanks for reaching out to us about this issue and helping us improve our products!

Regards,

Gajanan

Jakub_H · ‎02-07-2024

That's good to hear!

It took a year, but the solution seems to be reached.

I'm looking forward to the update, will test it then and let you know.

Gennady_F_Intel · ‎02-08-2024

just FYI - We will make an announcement about this release on the top of this oneMKL forum when this update will officially available.

Gennady

Jakub_H · ‎08-23-2024

Hello,

so I just updated the example code to 2024.2.0, and I can confirm the issue is fixed, the program no longer segfaults.

Thanks,

Jakub

second call to oneapi::mkl::sparse::trsv segfaults in a specific case

Error