Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

How do I debug MKL functions?

gamersensual14
New Contributor I
997 Views

Hi! I would like to ask how am I supposed to know if a function ended up correctly, and by this I mean:

 

I had these cuBLAS and cuSOLVER functions calls:

cublas_error = cublasDgemm(handle_gemm,CUBLAS_OP_T, CUBLAS_OP_N, bands, bands, N, &alpha, image, N, image, N, &beta, Cov, bands);
cusolver_status = cusolverDnDgesvd(cusolverHandle,'N', 'N', bands, bands, Cov, bands, CovEigVal, U, bands, VT, bands, work, lwork, rwork, info);
cusolver_status = cusolverDnDgesvd(cusolverHandle,'N', 'N', bands, bands, Corr, bands, CorrEigVal, U, bands, VT, bands, work, lwork, rwork, info);

Thanks to the cusolver_status and cublas_error variables and the error codes that the functions return I was able to debug them.

 

Now I'm trying to translate this using the MKL functions, I've come up with this:

oneapi::mkl::blas::column_major::gemm(my_queue, yesTrans, nonTrans, bands, bands, N, alpha, image, N, image, N, beta, Cov, bands);
oneapi::mkl::lapack::gesvd(my_queue, oneapi::mkl::jobsvd::novec, oneapi::mkl::jobsvd::novec, bands, bands, Cov, bands, CovEigVal, U, bands, VT, bands, work, lwork);
oneapi::mkl::lapack::gesvd(my_queue, oneapi::mkl::jobsvd::novec, oneapi::mkl::jobsvd::novec, bands, bands, Corr bands, CorrEigVal, U, bands, VT, bands, work, lwork);

 And I also did this before the gesvd() functions once (as I believe its enough in this case):

lwork = oneapi::mkl::lapack::gesvd_scratchpad_size<double>(my_queue, oneapi::mkl::jobsvd::novec, oneapi::mkl::jobsvd::novec, bands, bands, bands, bands, bands);
double* work = sycl::malloc_device<double>(lwork, my_queue);

 

I believe they are well translated (correct me here), the program doesn't crash but I'm not getting the same results as the other version. So my question is, how do I debug these functions? I tried using standard exception handler:

 

try {
 // function
}
catch(...) {
// some handler or printf()
}

 

But nothing happens. Am I doing something wrong?

 

Thank you!

0 Kudos
1 Solution
VidyalathaB_Intel
Moderator
972 Views

Hi Adrian,

 

Thanks for reaching out to us.

 

>>how do I debug these functions? I tried using standard exception handler:....But nothing happens.

 

Regarding the error handling mechanisms, we do have some information that is documented over here

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-reference/top/error-handling.html

which describes oneMKL error handling which relies on the mechanism of C++ exceptions.

"oneMKL introduces a class that defines the base class in the hierarchy of oneMKL exception classes. All oneMKL routines throw exceptions inherited from this base class.

In the hierarchy of oneMKL exceptions, oneapi::mkl::exception is the base class inherited from the std::exception class. All other oneMKL exception classes are derived from this base class."

In the above link, you can find the list of all oneMKL problem-specific exceptions.

 

With respect to lapack routines, you can use mkl::lapack::exception which helps you to determine the position of the incorrect argument by the get_info() method of the exception object

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-reference/top/lapack-routines/gesvd.html

 

We do have some mkl examples which come with the installation and help you to understand it better, you can find them under

"C:\Program Files (x86)\Intel\oneAPI\mkl\2022.0.2\examples\examples_dpcpp\dpcpp" for Windows

"/opt/intel/oneapi/mkl/latest/examples/dpcpp" for linux

 

So we suggest you to please refer the examples from mkl and you can get back to us if you still encounter any issues by providing us the sample reproducer code along with your OS details so that we can work on it from our end.

 

Hope the provided information helps in resolving the issue.

 

Regards,

Vidya.

 

View solution in original post

0 Kudos
3 Replies
VidyalathaB_Intel
Moderator
973 Views

Hi Adrian,

 

Thanks for reaching out to us.

 

>>how do I debug these functions? I tried using standard exception handler:....But nothing happens.

 

Regarding the error handling mechanisms, we do have some information that is documented over here

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-reference/top/error-handling.html

which describes oneMKL error handling which relies on the mechanism of C++ exceptions.

"oneMKL introduces a class that defines the base class in the hierarchy of oneMKL exception classes. All oneMKL routines throw exceptions inherited from this base class.

In the hierarchy of oneMKL exceptions, oneapi::mkl::exception is the base class inherited from the std::exception class. All other oneMKL exception classes are derived from this base class."

In the above link, you can find the list of all oneMKL problem-specific exceptions.

 

With respect to lapack routines, you can use mkl::lapack::exception which helps you to determine the position of the incorrect argument by the get_info() method of the exception object

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-reference/top/lapack-routines/gesvd.html

 

We do have some mkl examples which come with the installation and help you to understand it better, you can find them under

"C:\Program Files (x86)\Intel\oneAPI\mkl\2022.0.2\examples\examples_dpcpp\dpcpp" for Windows

"/opt/intel/oneapi/mkl/latest/examples/dpcpp" for linux

 

So we suggest you to please refer the examples from mkl and you can get back to us if you still encounter any issues by providing us the sample reproducer code along with your OS details so that we can work on it from our end.

 

Hope the provided information helps in resolving the issue.

 

Regards,

Vidya.

 

0 Kudos
gamersensual14
New Contributor I
941 Views

Hey! For everyone wondering how this went: I indeed found the examples (omg I still think Intel hides so well the examples...) and learnt from them.

 

I'm going to put an example for anyone looking for it. Here is how you would check for exceptions in one of the functions I listed above (at least one of the ways):

 

try {
auto event1 = oneapi::mkl::blas::column_major::gemm(my_queue, yesTrans, nonTrans, bands, bands, N, alpha, image_dev, N, image_dev, N, beta, Cov_dev, bands);
event1.wait_and_throw();
}
catch(oneapi::mkl::lapack::exception const& e) {
std::cout << "Unexpected exception caught during synchronous call to LAPACK API:\ninfo: " << e.info() << std::endl;
return *info;
}

 

And in case someone is looking for how to declare the scratchpad size for anything here is how it is supposed to be done:

try {
lwork = oneapi::mkl::lapack::gesvd_scratchpad_size<double>(my_queue, oneapi::mkl::jobsvd::novec, oneapi::mkl::jobsvd::novec, bands, bands, bands, bands, bands);
my_queue.wait_and_throw();
}
catch(oneapi::mkl::lapack::exception const& e) {
printf("Error in gesvd_scratchpad_size()...\nExiting...\n");
return -1;
}

double* work_dev = static_cast<double*>(sycl::malloc_device(lwork * sizeof(double), device, context));

if (lwork != 0 && !work_dev) {
printf("Error allocating scratchpad in the device memory...\nExiting...\n");
return -1;
}

 Note: You can use malloc_shared if you want instead of malloc_device when allocating the actual scratchpad (in my case it's called "work_dev"). Also, my scratchpad_size is called "lwork".

 

"device" and "context" can be easily obtained from the queue.

 

Thank you Vidya.

0 Kudos
VidyalathaB_Intel
Moderator
928 Views

Hi Adrian,


Glad to know that the provided information helps.

Thanks for accepting our solution and I appreciate your efforts as well.

As the issue is resolved we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.


Have a Great Day!


Regards,

Vidya.


0 Kudos
Reply