- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am currently trying to use the Data Streaming Accelerator (DSA) to (1) read two inputs, (2) do XOR, and (3) store the outputs.
According to the “Intel® Data Streaming Accelerator Architecture Specification (v3.0)”, this kind of compute operation appears to be supported by the hardware.
However, when using the Intel Data Movement Library (DML) or when using DSA directly via idxd.h, I cannot find any interface for XOR operations -- only memory copy, memory fill, ...
Are the compute operations described in the DSA architecture spec currently unavailable in software stacks?
If they are available, is there any guideline, header, or example for using them?
Thanks.
// idxd.h
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi asdasf,
Greetings for the day!
Thank you for reaching out to Intel Support. We acknowledge receipt of your concern and would like to assure you that assisting you is our top priority.
To assist you further, we require some additional information from your end.
Kindly provide the system details and the processor model for which you are seeking the necessary information.
This will help us review the complete details and assist you further.
We appreciate your understanding!
Best regards,
Poojitha N
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your prompt response.
Regarding your request, please find the system details below:
Product / Platform
Intel Xeon Platinum 8558 (2× sockets)
OS / Kernel / Drivers
OS: Ubuntu 25.04 LTS
Kernel: 6.14.0-1007-intel
DSA driver: idxd v1.0
Library: Using the linux/idxd.h UAPI headers for descriptor submission
accel-config version: accel-config 4.1.8+
Issue summary
We are attempting to submit a descriptor for a Reduce/XOR operation through the user-space write(fd, &desc, …)submission path using /dev/dsa/wq*. The device returns a completion status 0x10 (DSA_COMP_BAD_OPCODE). The same descriptor pipeline works for DSA_OPCODE_MEMMOVE.
This raises the question of whether Reduce/XOR opcodes are currently supported on this CPU/driver combination or require a newer DSA specification / microcode / driver stack.
If you need additional traces, PCIe capability dumps, DSACAP registers, or accel-config dumps (accel-config list -i), I will gladly provide them.
Thank you again for your assistance, and please let me know if further details are required.
Best regards,
Juntaek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello asdasf,
Thank you for providing the detailed issue summary regarding the DSA Reduce/XOR descriptor submission failure.
To proceed with our analysis, could you please share the following details from the affected system:
1) DSA capability registers
2) Full DSA configuration and work queue information:
3) PCIe capability and device information for the DSA device:
4) Kernel log messages related to DSA initialization:
Regards
Pujeeth_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are the results of several commands that may contain the details you are looking for. Please let me know if you need any additional information.
$ sudo accel-config list -i
[
{
"dev":"dsa0",
"read_buffer_limit":0,
"max_groups":4,
"max_work_queues":8,
"max_engines":4,
"work_queue_size":128,
"numa_node":0,
"op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
"gen_cap":"0x40915f0107",
"version":"0x100",
"state":"enabled",
"max_read_buffers":96,
"max_batch_size":1024,
"configurable":1,
"pasid_enabled":1,
"cdev_major":509,
"clients":0,
"groups":[
{
"dev":"group0.0",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96,
"grouped_workqueues":[
{
"dev":"wq0.0",
"mode":"dedicated",
"size":64,
"group_id":0,
"priority":1,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"cdev_minor":0,
"type":"user",
"name":"swq",
"driver_name":"user",
"threshold":0,
"ats_disable":0,
"state":"enabled",
"clients":0
}
],
"grouped_engines":[
{
"dev":"engine0.0",
"group_id":0
},
{
"dev":"engine0.1",
"group_id":0
},
{
"dev":"engine0.2",
"group_id":0
},
{
"dev":"engine0.3",
"group_id":0
}
]
},
{
"dev":"group0.1",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group0.2",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group0.3",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
}
],
"ungrouped workqueues":[
{
"dev":"wq0.1",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq0.2",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq0.3",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq0.4",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq0.5",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq0.6",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq0.7",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
}
]
},
{
"dev":"dsa1",
"read_buffer_limit":0,
"max_groups":4,
"max_work_queues":8,
"max_engines":4,
"work_queue_size":128,
"numa_node":2,
"op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
"gen_cap":"0x40915f0107",
"version":"0x100",
"state":"disabled",
"max_read_buffers":96,
"max_batch_size":1024,
"configurable":1,
"pasid_enabled":1,
"cdev_major":509,
"clients":0,
"groups":[
{
"dev":"group1.0",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group1.1",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group1.2",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group1.3",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
}
],
"ungrouped workqueues":[
{
"dev":"wq1.0",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.1",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.2",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.3",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.4",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.5",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.6",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
},
{
"dev":"wq1.7",
"mode":"shared",
"size":0,
"priority":0,
"block_on_fault":0,
"max_batch_size":32,
"max_transfer_size":2097152,
"type":"none",
"name":"",
"driver_name":"",
"threshold":0,
"ats_disable":0,
"state":"disabled",
"clients":0
}
],
"ungrouped_engines":[
{
"dev":"engine1.0"
},
{
"dev":"engine1.1"
},
{
"dev":"engine1.2"
},
{
"dev":"engine1.3"
}
]
}
]
$ lspci -nn | grep -Ei 'data streaming|I/O Accel|idxd|0b25'
6a:01.0 System peripheral [0880]: Intel Corporation Device [8086:0b25]
e7:01.0 System peripheral [0880]: Intel Corporation Device [8086:0b25]
$ sudo lspci -vvv -s 6a:01.0
6a:01.0 System peripheral: Intel Corporation Device 0b25
Subsystem: Intel Corporation Device 0000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
NUMA node: 0
IOMMU group: 4
Region 0: Memory at afffff20000 (64-bit, prefetchable) [size=64K]
Region 2: Memory at afffff00000 (64-bit, prefetchable) [size=128K]
Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, IntMsgNum 0
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag+ RBE+ FLReset+ TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq+ OBFF Disabled, EETLPPrefixBlk-
Capabilities: [80] MSI-X: Enable+ Count=9 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [90] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq+ ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [160 v1] Transaction Processing Hints
Device specific mode supported
Steering table in TPH capability structure
Capabilities: [170 v1] Virtual Channel
Caps: LPEVC=1 RefClk=100ns PATEntryBits=1
Arb: Fixed+ WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=1 ArbSelect=Fixed TC/VC=02
Status: NegoPending- InProgress-
Capabilities: [200 v1] Designated Vendor-Specific: Vendor=8086 ID=0005 Rev=0 Len=24 <?>
Capabilities: [220 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [230 v1] Process Address Space ID (PASID)
PASIDCap: Exec- Priv+, Max PASID Width: 14
PASIDCtl: Enable+ Exec- Priv+
Capabilities: [240 v1] Page Request Interface (PRI)
PRICtl: Enable+ Reset-
PRISta: RF- UPRGI- Stopped+ PASID+
Page Request Capacity: 00000200, Page Request Allocation: 00000200
Kernel driver in use: idxd
Kernel modules: idxd
$ sudo dmesg | grep -Ei 'dsa|idxd'
[ 13.462756] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[ 13.476012] idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4
[ 13.476325] idxd 0000:6a:01.0: No in-kernel DMA with PASID. -22
[ 13.528386] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[ 13.528513] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[ 13.542617] idxd 0000:e7:01.0: failed to attach device pasid 1, domain type 4
[ 13.543174] idxd 0000:e7:01.0: No in-kernel DMA with PASID. -22
[ 13.559001] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[153908.260496] idxd dsa0: attribute deprecated, see max_read_buffers.
[153908.260631] idxd dsa0: attribute deprecated, see read_buffer_limit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi asdasf,
Greetings for the day!
As checked, we could see that the processor is a tray processor. We request you to contact your Intel account representative or the place of purchase for further assistance on this query.
Thanks for your understanding
Regards
Jerome
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi asdasf,
Greetings for the day!
Meanwhile, we will check with our internal resources regarding the requested details and will provide an update once available.
We appreciate your understanding!
Best regards,
Poojitha N
Intel Customer Support Technician
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page