Intel® QuickAssist Technology (Intel® QAT)
For questions and discussions related to Intel® QuickAssist Technology (Intel® QAT).
63 Discussions

In recent times, we have encountered several null pointer issues related to QAT (QuickAssist Technol

liuhao1
Beginner
540 Views
The call stack information is as follows:
[   43.012914] usdm_drv: Loading USDM Module Version 0.7.1 ...
[   43.018510] usdm_drv: IOCTLs: c0507100, c0507101, 7102, c0047104
[   43.031324] QAT: Stopping all acceleration devices.
[   43.036228] c3xxx 0000:01:00.0: qat_dev0 stopped 6 acceleration engines
[   43.043230] c3xxx 0000:01:00.0: Resetting device qat_dev0
[   43.048647] c3xxx 0000:01:00.0: Function level reset
[   43.051142] igb_uio 0000:09:00.1: mapping 1K dma=0x467395000 host=ffff880467395000
[   43.051145] igb_uio 0000:09:00.1: unmapping 1K dma=0x467395000 host=ffff880467395000
[   43.169770] c3xxx 0000:01:00.0: Enabling default configuration
[   43.175614] c3xxx 0000:01:00.0: Enabling default configuration
[   43.181728] c3xxx 0000:01:00.0: Starting acceleration device qat_dev0.
[   43.205221] c3xxx 0000:01:00.0: Enabling default configuration
[   43.268086] c3xxx 0000:01:00.0: Resetting device qat_dev0
[   43.273507] c3xxx 0000:01:00.0: Function level reset
[   43.282198] BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
[   43.290051] IP: [<ffffffffc031a0e6>] qat_uclo_wr_all_uimage+0x46/0xe70 [intel_qat]
[   43.297657] PGD 0 
[   43.299691] Oops: 0000 [#1] SMP 
[   43.302957] Modules linked in: usdm_drv(OE) igb_uio(OE) uio_pci_generic intel_powerclamp coretemp iosf_mbi kvm_intel qat_c3xxx(OE-) kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel intel_qat(OE) lrw gf128mul glue_helper iTCO_wdt ablk_helper uio cryptd iTCO_vendor_support pcspkr ipmi_ssif ipmi_devintf sg ipmi_msghandler i2c_i801 i2c_ismt shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic igb ixgbe ahci libahci crct10dif_pclmul mdio libata crct10dif_common ptp i2c_algo_bit i2c_core pps_core crc32c_intel dca dm_mirror dm_region_hash dm_log dm_mod
[   43.354218] CPU: 3 PID: 1516 Comm: adf_ctl Tainted: G           OE  ------------   3.10.0-693.el7.x86_64 #1
[   43.363946] Hardware name: 0 0/Default string, BIOS 5.13 (Z169-004) 05/11/2020
[   43.371163] task: ffff88046a650000 ti: ffff88045ff60000 task.ti: ffff88045ff60000
[   43.378637] RIP: 0010:[<ffffffffc031a0e6>]  [<ffffffffc031a0e6>] qat_uclo_wr_all_uimage+0x46/0xe70 [intel_qat]
[   43.388649] RSP: 0018:ffff88045ff63bb0  EFLAGS: 00010246
[   43.393957] RAX: ffff8804694e9068 RBX: ffff8804694e9240 RCX: 000000000000003c
[   43.401086] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804694e9068
[   43.408213] RBP: ffff88045ff63c88 R08: 0000000000000000 R09: ffffc90002401954
[   43.415343] R10: ffffc90002401850 R11: ffffc900024017d0 R12: ffff88045ff63cb0
[   43.422473] R13: ffff88047f80bc00 R14: ffffc90001bb6000 R15: ffff8804694e9060
[   43.429600] FS:  00007f49a2300740(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000
[   43.437695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.443435] CR2: 0000000000000280 CR3: 0000000465717000 CR4: 00000000003407e0
[   43.450572] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.457700] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   43.464827] Stack:
[   43.466846]  ffffffffc031af83 ffff88017fc03b00 0000000000019bc0 0000000000000008
[   43.474313]  ffff8804694e9068 000000000000b96a 0000000000040cdc ffffffff811e0613
[   43.481779]  ffffffffc031b76e ffff88046b649380 ffffc900023e1128 ffff8804667e3420
[   43.489244] Call Trace:
[   43.491706]  [<ffffffffc031af83>] ? qat_uclo_map_obj+0x73/0xa00 [intel_qat]
[   43.498675]  [<ffffffff811e0613>] ? __kmalloc+0x1e3/0x230
[   43.504074]  [<ffffffffc031b76e>] ? qat_uclo_map_obj+0x85e/0xa00 [intel_qat]
[   43.511125]  [<ffffffffc031b13c>] ? qat_uclo_map_obj+0x22c/0xa00 [intel_qat]
[   43.518172]  [<ffffffff8132bd82>] ? strlcpy+0x42/0x60
[   43.523224]  [<ffffffffc03187b0>] adf_gen2_ae_fw_load+0x150/0x2b0 [intel_qat]
[   43.530353]  [<ffffffffc031381d>] adf_dev_init+0x11d/0x4f0 [intel_qat]
[   43.536885]  [<ffffffffc0310edc>] adf_ctl_ioctl+0x8bc/0xc00 [intel_qat]
[   43.543526]  [<ffffffffc027b50c>] ? xfs_iunlock+0xec/0x130 [xfs]
[   43.549528]  [<ffffffff812151cd>] do_vfs_ioctl+0x33d/0x540
[   43.555009]  [<ffffffff816affd1>] ? __do_page_fault+0x171/0x450
[   43.560924]  [<ffffffff81215471>] SyS_ioctl+0xa1/0xc0
[   43.565973]  [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[   43.571975] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 80 7f 28 00 0f 85 59 01 00 00 48 8b 47 10 4d 8b 07 48 89 c7 48 89 85 48 ff ff ff <41> 8b 80 80 02 00 00 48 89 45 c0 44 8b 8f 58 cc 00 00 45 85 c9 
[   43.591922] RIP  [<ffffffffc031a0e6>] qat_uclo_wr_all_uimage+0x46/0xe70 [intel_qat]
[   43.599599]  RSP <ffff88045ff63bb0>
[   43.603089] CR2: 0000000000000280
[   43.606406] ---[ end trace 857b665cf7c8a552 ]---
[   43.612330] Kernel panic - not syncing: Fatal exception
[   43.617565] Kernel Offset: disabled
0 Kudos
6 Replies
Ronny_G_Intel
Moderator
488 Views

Hi liuhao1,


After reviewing the provided log, I can conclude that the kernel panic is likely caused by a NULL pointer dereference in the Intel QAT driver during the initialization of the QAT device.

Please provide more information about the system and QAT configuration:

  • Provide the icp_dump output for a better understanding of the system. You can find the script for this in the QAT SDK package at the following path: ICP_ROOT/quickassist/utilities/release-files/debug_tool/icp_dump.sh.
  • Please confirm if this issue is reproducible and the total number of systems affected.
  • Are you using any static analysis tools like cppcheck or clang-analyzer to detect potential issues with uninitialized pointers?
  • You mentioned that this issue has been occurring recently. Was there any change or update to the system that may have introduced this issue?


Regards,


Ronny G



0 Kudos
liuhao1
Beginner
463 Views

Hi Ronny G,

There are quite a few issues with this problem. After several days of troubleshooting, we found that CentOS loads QAT-related configurations during kernel boot-up. However, at this time, our software product also runs scripts to bind igb_uio, which leads to the device failing to start due to timing issues. The only solution right now is to wait for the kernel to recover automatically.

How can I prevent the kernel from loading QAT, or could you tell me which file the kernel uses to load QAT?

Can I adjust the order in which the kernel loads QAT?

 

Regards,

liuhao1

 

0 Kudos
Ronny_G_Intel
Moderator
390 Views

Hi liuhao1,


I would recommend preventing the Kernel from Loading QAT and then troubleshoot the QAT configuration.

You can prevent the kernel from loading the QAT driver by blacklisting it. This involves adding the QAT driver module to the blacklist configuration file.

Create or edit the file /etc/modprobe.d/blacklist.conf and add the following line: blacklist qat

If the QAT driver has been configured with a specific module name (e.g., qat_c62x), use that instead: blacklist qat_c62x

After making this change, update the initial RAM filesystem to ensure the changes take effect: dracut -f


I have very little details regarding your configuration, I dont know what QAT hardware and software version you are running or configuration details. Also, was there any change or update to the system that may have introduced this issue?


Thanks,

Ronny G


0 Kudos
Ronny_G_Intel
Moderator
307 Views

Hi liuhao1,


I am following up to see if there are any updates on this issue. Could you please inform me if you have implemented my previous recommendation and share the results?

Additionally, please provide any further configuration details if additional support is needed.


Regards,

Ronny G


0 Kudos
Ronny_G_Intel
Moderator
247 Views

Hi liuhao1,


Kindly inform me if you require further assistance with this issue.


Regards,

Ronny G


0 Kudos
Ronny_G_Intel
Moderator
146 Views

Hi liuhao1,


I haven't heard from you in some time, so I will be closing the internal case we opened for this issue. If you still need assistance, please create a new community post, as I will no longer be monitoring this community.


Regards,

Ronny G


0 Kudos
Reply