- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ported the QAT compression code from Ceph version 17 into a demo program. When running it, I encountered the error: Error userStarMultiProcess(-1), switch to SW if permitted g_process.qz_init_status = QZ_NO_HW.
I resolved this issue in the demo by either:
Setting export QAT_SECTION_NAME="SSL", which allowed the demo to run normally, or
Changing the section name from SSL to SHIM in my 4xxx_dev0.conf file, which also made it work.
However, when I enable QAT compression in Ceph, it fails with the following errors:
ADF_UIO_PROXY err: get_bundle_from_dev_cached: failed to open uio dev /dev/uio4
[error] SalCtrl_ServiceInit() - : Failed to initialise all service instances
ADF_UIO_PROXY err: adf_user_subsystemInit: Failed to initialise Subservice SAL
[error] SalCtrl_ServiceEventStart() - : Private data is NULL
ADF_UIO_PROXY err: adf_user_subsystemStart: Failed to start Subservice SAL
ADF_UIO_PROXY err: get_bundle_from_dev_cached: failed to open uio dev /dev/uio12
[error] SalCtrl_ServiceInit() - : Failed to initialise all service instances
ADF_UIO_PROXY err: adf_user_subsystemInit: Failed to initialise Subservice SAL
[error] SalCtrl_ServiceEventStart() - : Private data is NULL
ADF_UIO_PROXY err: adf_user_subsystemStart: Failed to start Subservice SAL
[error] SalCtrl_AdfServicesStartedCheck() - : Sal Ctrl failed to start in given time
[error] do_userStart() - : Failed to start services
ADF_UIO_PROXY err: icp_adf_subsystemUnregister: Failed to shutdown subservice SAL.
ADF_UIO_PROXY err: icp_adf_subsystemUnregister: Failed to shutdown subservice SAL.
Error userStarMultiProcess(-1), switch to SW if permitted
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Are you following any specific guide for this? If so, can you please share the link here?
I can take a look and see if I can provide any suggestions for you.
Regards,
Diego V.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for reply:
I follow the guide:https://www.intel.com/content/www/us/en/content-details/632506/intel-quickassist-technology-intel-qat-software-for-linux-getting-started-guide-hardware-version-2-0.html
and follow the guide from qatzip-1.1.2,README
https://github.com/intel/QATzip/blob/v1.1.2/README.md#build-intel-quickassist-technology-driver
Now I have replaced the 4xxx_dev0.conf from the QATzip 1.1.2 version into the /etc directory, and I can run the program normally. However, when I enable compression in Ceph and upload a single object, there is no problem. But when I use FIO to upload objects, I encounter the following timeout errors:
-66> 2025-11-27T09:06:01.058+0800 7fa2b2bfb640 0 bluestore(/var/lib/ceph/osd/ceph-1) log_latency slow operation observed for compress@_do_alloc_write, latency = 6.407702446s
-65> 2025-11-27T09:06:01.058+0800 7fa2b33fc640 0 bluestore(/var/lib/ceph/osd/ceph-1) log_latency slow operation observed for compress@_do_alloc_write, latency = 6.407705784s
-64> 2025-11-27T09:06:01.058+0800 7fa2b77dc640 0 bluestore(/var/lib/ceph/osd/ceph-1) log_latency slow operation observed for compress@_do_alloc_write, latency = 6.407700539s
-63> 2025-11-27T09:06:01.058+0800 7fa2b8fdf640 0 bluestore(/var/lib/ceph/osd/ceph-1) log_latency slow operation observed for compress@_do_alloc_write, latency = 6.407549381s
-62> 2025-11-27T09:06:01.058+0800 7fa2b4bff640 0 bluestore(/var/lib/ceph/osd/ceph-1) log_latency slow operation observed for compress@_do_alloc_write, latency = 6.407548904s
-61> 2025-11-27T09:06:01.059+0800 7fa2b43fe640 0 bluestore(/var/lib/ceph/osd/ceph-1) log_latency slow operation observed for compress@_do_alloc_write, latency = 6.407576561s
Then a segmentation fault occurs:
1: /usr/lib64/libc.so.6(+0x40ef0) [0x7fa2edbe2ef0] 2: /usr/lib64/libc.so.6(+0x88280) [0x7fa2edc2a280] 3: pthread_mutex_lock() 4: qzInit() 5: qzCompressCrcExt() 6: qzCompressExt() 7: qzCompress() 8: (QatAccel::compress(ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list&, boost::optional<int>&)+0xce) [0x7fa2e6cb6c6e] 9: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>&, BlueStore::WriteContext*)+0x4f8) [0x557ea26ee6e8] 10: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2c5) [0x557ea2742055] 11: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x8e) [0x557ea2742bee] 12: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1ba0) [0x557ea2746230]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I found this Ceph Configuration Guide for Intel QAT in the Ceph documentation. Perhaps you already found it too.
I haven't followed this setup before, but from the documentation I see some key elements that you should pay attention to. Let me list them below:
- The first step is to confirm the Intel QAT out-of-tree driver and the Intel QATZip library are working properly. From your description, it looks like this is already done but I'm including the information here just in case there is something you missed.
- Follow the Getting Started Guide to install and test the Intel QAT out-of-tree driver. This process will be successful if you are able to run the cpa_sample_code described in the guide.
- Important: Since you are using a 4xxx Intel QAT device, then the right Intel QAT out-of-tree driver to use is the following: Intel® QuickAssist Technology (Intel® QAT) Driver for Linux* for Hardware Version 2.0. The current latest version is 1.2.30-00109.
- Once the sample code is working, proceed to setup the Intel QATZip library. Note that the current latest version is 1.3.1 but you are using 1.1.2. Make sure to install the library from the Master branch to always use the latest version. This process will be successful if you are able to run the run_perf_test.sh script referenced in the installation instructions.
- Important: To properly setup the Intel QATZip library, you have to update the QAT configuration file using one of the sample files available in the library. This is because the [SHIM] section has to be added. This configuration will be updated later when setting up Ceph.
- Follow the Getting Started Guide to install and test the Intel QAT out-of-tree driver. This process will be successful if you are able to run the cpa_sample_code described in the guide.
- The next step, once the Intel QAT out-of-tree driver and the Intel QATZip library are up and running, is to configure Intel QAT in Ceph.
- The QAT configuration file for all devices should be updated with a new user process instance for [CEPH]. The Ceph guide above has an example to use.
- To support Intel QAT compression, Ceph must be built with the following options: ./do_cmake.sh -DWITH_QATDRV=ON -DWITH_QATZIP=ON -DWITH_SYSTEM_QATZIP=ON -DWITH_QATLIB=OFF
- The [CEPH] user process instance in the QAT configuration files must be clarified in an environment variable: export QAT_SECTION_NAME=CEPH
- Update the ceph.conf file to enable Intel QAT compression. Refer to the Ceph guide above for details.
According to the Ceph documentation, this should be the required configuration to make Intel QAT compression using QATZip work with Ceph. Can you please verify and confirm you are following this setup correctly?
I can help review your QAT configuration files if you can provide them, as well as the ceph.conf file.
Please note my expertise and support coverage is specifically for Intel QAT components - namely the QAT driver and QATzip library. While I can help ensure these components are properly configured and functioning, any issues related to Ceph's integration, configuration, or behavior fall outside my support scope and would need to be addressed through Ceph community channels. That said, I'm happy to help verify that the QAT side of the integration is set up correctly, as this often resolves integration issues.
Regards,
Diego V.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your reply again.
Currently, I am using Ceph version 16. The operation guide for this version does not include settings like "out-of-tree"; during compilation, it only requires adding -DWITH_QAT. During my testing, using a single thread with rados bench to write 4MB data does not cause any issues. However, as soon as I increase the number of threads, the previous core dump occurs.
I have reviewed the Ceph code and found that during compression, Ceph creates new temporary objects, which also means creating new QAT sessions. Could the cause of this issue be related to having too many sessions? I am not entirely sure, but I noticed that in later versions of Ceph, session pool management was added for handling sessions. Could having too many sessions potentially lead to this problem?
Attachment is my QAT configuration. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the extra details.
What you just described sounds exactly like the reason for the failure: too many sessions being created simultaneously.
Something you can try to see if there is any difference is using a multiple thread optimized configuration rather than a multiple process optimized configuration.
Your current QAT configuration is set for multiple process optimization, where you have the maximum possible processes but only one single data compression instance. Based on your description, it sounds more like you have multiple threads within one or few processes.
A sample for multiple thread optimization is available here: https://github.com/intel/QATzip/tree/master/config_file/4xxx/multiple_thread_opt. Give it a try with this configuration for all QAT devices.
If this configuration still doesn't resolve the issue, I would recommend considering an upgrade to a newer Ceph version that includes session pool management. The Ceph documentation referenced above mentions specific configuration settings to define the maximum number of sessions. The fact that these settings are documented suggests that your current issue was common enough to motivate the addition of session handling at the Ceph level for QAT compression with QATzip.
Hope this helps.
Regards,
Diego V.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page