Intel® Xeon® Processor and Server Products
Intel® Xeon® Processors, Data Center Products including boards, integrated systems, and RAID Storage
5248 Discussions

Compute in data streaming accelerator

asdasf
Beginner
487 Views

Hi,

 

I am currently trying to use the Data Streaming Accelerator (DSA) to (1) read two inputs, (2) do XOR, and (3) store the outputs.

 

According to the “Intel® Data Streaming Accelerator Architecture Specification (v3.0)”, this kind of compute operation appears to be supported by the hardware.

 

However, when using the Intel Data Movement Library (DML) or when using DSA directly via idxd.h, I cannot find any interface for XOR operations -- only memory copy, memory fill, ...

 

Are the compute operations described in the DSA architecture spec currently unavailable in software stacks?

If they are available, is there any guideline, header, or example for using them?

 

Thanks.

 

Screenshot 2026-01-17 at 1.58.41 PM.png

 

// idxd.h

/* Opcode */
enum dsa_opcode {
DSA_OPCODE_NOOP = 0,
DSA_OPCODE_BATCH,
DSA_OPCODE_DRAIN,
DSA_OPCODE_MEMMOVE,
DSA_OPCODE_MEMFILL,
DSA_OPCODE_COMPARE,
DSA_OPCODE_COMPVAL,
DSA_OPCODE_CR_DELTA,
DSA_OPCODE_AP_DELTA,
DSA_OPCODE_DUALCAST,
DSA_OPCODE_TRANSL_FETCH,
DSA_OPCODE_CRCGEN = 0x10,
DSA_OPCODE_COPY_CRC,
DSA_OPCODE_DIF_CHECK,
DSA_OPCODE_DIF_INS,
DSA_OPCODE_DIF_STRP,
DSA_OPCODE_DIF_UPDT,
DSA_OPCODE_DIX_GEN = 0x17,
DSA_OPCODE_CFLUSH = 0x20,
};

 

0 Kudos
7 Replies
Poojitha
Employee
443 Views

Hi asdasf,


Greetings for the day!


Thank you for reaching out to Intel Support. We acknowledge receipt of your concern and would like to assure you that assisting you is our top priority.


To assist you further, we require some additional information from your end.


Kindly provide the system details and the processor model for which you are seeking the necessary information.


This will help us review the complete details and assist you further.


We appreciate your understanding!


Best regards,

Poojitha N

Intel Customer Support Technician


0 Kudos
asdasf
Beginner
393 Views

Thank you for your prompt response.

 

Regarding your request, please find the system details below:

 

Product / Platform

  • Intel Xeon Platinum 8558 (2× sockets)

OS / Kernel / Drivers

  • OS: Ubuntu 25.04 LTS

  • Kernel: 6.14.0-1007-intel

  • DSA driver: idxd v1.0

  • Library: Using the linux/idxd.h UAPI headers for descriptor submission

  • accel-config version: accel-config 4.1.8+

Issue summary

We are attempting to submit a descriptor for a Reduce/XOR operation through the user-space write(fd, &desc, …)submission path using /dev/dsa/wq*. The device returns a completion status 0x10 (DSA_COMP_BAD_OPCODE). The same descriptor pipeline works for DSA_OPCODE_MEMMOVE.

This raises the question of whether Reduce/XOR opcodes are currently supported on this CPU/driver combination or require a newer DSA specification / microcode / driver stack.

If you need additional traces, PCIe capability dumps, DSACAP registers, or accel-config dumps (accel-config list -i), I will gladly provide them.

 

Thank you again for your assistance, and please let me know if further details are required.

 

Best regards,

Juntaek

0 Kudos
pujeeth
Employee
378 Views

Hello asdasf,


Thank you for providing the detailed issue summary regarding the DSA Reduce/XOR descriptor submission failure.


To proceed with our analysis, could you please share the following details from the affected system:


1) DSA capability registers

2) Full DSA configuration and work queue information:

3) PCIe capability and device information for the DSA device:

4) Kernel log messages related to DSA initialization:


Regards

Pujeeth_Intel



0 Kudos
asdasf
Beginner
321 Views

Here are the results of several commands that may contain the details you are looking for. Please let me know if you need any additional information.

$ sudo accel-config list -i
[
  {
    "dev":"dsa0",
    "read_buffer_limit":0,
    "max_groups":4,
    "max_work_queues":8,
    "max_engines":4,
    "work_queue_size":128,
    "numa_node":0,
    "op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
    "gen_cap":"0x40915f0107",
    "version":"0x100",
    "state":"enabled",
    "max_read_buffers":96,
    "max_batch_size":1024,
    "configurable":1,
    "pasid_enabled":1,
    "cdev_major":509,
    "clients":0,
    "groups":[
      {
        "dev":"group0.0",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96,
        "grouped_workqueues":[
          {
            "dev":"wq0.0",
            "mode":"dedicated",
            "size":64,
            "group_id":0,
            "priority":1,
            "block_on_fault":0,
            "max_batch_size":32,
            "max_transfer_size":2097152,
            "cdev_minor":0,
            "type":"user",
            "name":"swq",
            "driver_name":"user",
            "threshold":0,
            "ats_disable":0,
            "state":"enabled",
            "clients":0
          }
        ],
        "grouped_engines":[
          {
            "dev":"engine0.0",
            "group_id":0
          },
          {
            "dev":"engine0.1",
            "group_id":0
          },
          {
            "dev":"engine0.2",
            "group_id":0
          },
          {
            "dev":"engine0.3",
            "group_id":0
          }
        ]
      },
      {
        "dev":"group0.1",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group0.2",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group0.3",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      }
    ],
    "ungrouped workqueues":[
      {
        "dev":"wq0.1",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.2",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.3",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.4",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.5",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.6",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.7",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      }
    ]
  },
  {
    "dev":"dsa1",
    "read_buffer_limit":0,
    "max_groups":4,
    "max_work_queues":8,
    "max_engines":4,
    "work_queue_size":128,
    "numa_node":2,
    "op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
    "gen_cap":"0x40915f0107",
    "version":"0x100",
    "state":"disabled",
    "max_read_buffers":96,
    "max_batch_size":1024,
    "configurable":1,
    "pasid_enabled":1,
    "cdev_major":509,
    "clients":0,
    "groups":[
      {
        "dev":"group1.0",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group1.1",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group1.2",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group1.3",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      }
    ],
    "ungrouped workqueues":[
      {
        "dev":"wq1.0",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.1",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.2",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.3",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.4",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.5",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.6",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.7",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      }
    ],
    "ungrouped_engines":[
      {
        "dev":"engine1.0"
      },
      {
        "dev":"engine1.1"
      },
      {
        "dev":"engine1.2"
      },
      {
        "dev":"engine1.3"
      }
    ]
  }
]
$ lspci -nn | grep -Ei 'data streaming|I/O Accel|idxd|0b25'
6a:01.0 System peripheral [0880]: Intel Corporation Device [8086:0b25]
e7:01.0 System peripheral [0880]: Intel Corporation Device [8086:0b25]
$ sudo lspci -vvv -s 6a:01.0
6a:01.0 System peripheral: Intel Corporation Device 0b25
        Subsystem: Intel Corporation Device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        IOMMU group: 4
        Region 0: Memory at afffff20000 (64-bit, prefetchable) [size=64K]
        Region 2: Memory at afffff00000 (64-bit, prefetchable) [size=128K]
        Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, IntMsgNum 0
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+ FLReset+ TEE-IO-
                DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                         AtomicOpsCtl: ReqEn-
                         IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
                         10BitTagReq+ OBFF Disabled, EETLPPrefixBlk-
        Capabilities: [80] MSI-X: Enable+ Count=9 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [90] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq+ ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [160 v1] Transaction Processing Hints
                Device specific mode supported
                Steering table in TPH capability structure
        Capabilities: [170 v1] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=1
                Arb:    Fixed+ WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
                VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=1 ArbSelect=Fixed TC/VC=02
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Designated Vendor-Specific: Vendor=8086 ID=0005 Rev=0 Len=24 <?>
        Capabilities: [220 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable+, Smallest Translation Unit: 00
        Capabilities: [230 v1] Process Address Space ID (PASID)
                PASIDCap: Exec- Priv+, Max PASID Width: 14
                PASIDCtl: Enable+ Exec- Priv+
        Capabilities: [240 v1] Page Request Interface (PRI)
                PRICtl: Enable+ Reset-
                PRISta: RF- UPRGI- Stopped+ PASID+
                Page Request Capacity: 00000200, Page Request Allocation: 00000200
        Kernel driver in use: idxd
        Kernel modules: idxd

$ sudo dmesg | grep -Ei 'dsa|idxd'
[   13.462756] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[   13.476012] idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4
[   13.476325] idxd 0000:6a:01.0: No in-kernel DMA with PASID. -22
[   13.528386] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[   13.528513] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[   13.542617] idxd 0000:e7:01.0: failed to attach device pasid 1, domain type 4
[   13.543174] idxd 0000:e7:01.0: No in-kernel DMA with PASID. -22
[   13.559001] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[153908.260496] idxd dsa0: attribute deprecated, see max_read_buffers.
[153908.260631] idxd dsa0: attribute deprecated, see read_buffer_limit.

 

0 Kudos
Steve_Jerome22
Employee
273 Views

Hi asdasf,


Greetings for the day!


As checked, we could see that the processor is a tray processor. We request you to contact your Intel account representative or the place of purchase for further assistance on this query.


Thanks for your understanding


Regards

Jerome

Intel Customer Support Technician


0 Kudos
Poojitha
Employee
171 Views

Hi asdasf,


Greetings for the day!


Meanwhile, we will check with our internal resources regarding the requested details and will provide an update once available.


We appreciate your understanding!


Best regards,

Poojitha N

Intel Customer Support Technician


0 Kudos
Subhashish
Employee
144 Views

Hello asdasf,

 

This is regarding the ongoing issue. After reviewing carefully, we would like to share with you our findings.

 

The Intel® Data Streaming Accelerator (DSA) primarily supports data movement and transformation operations such as memory copy, fill, compare, CRC, DIF, delta, and flush. However, XOR operations or similar compute operations are not explicitly mentioned as supported functionalities in the current software stacks or libraries like Intel Data Movement Library (DML) or idxd.h.

 

If XOR operations are described in the architecture specification but not available in the software stack, it might indicate that these operations are either not implemented in the current software or require specific configurations or updates. You can refer to the Intel® DSA Architecture Specification and User Guide for further details and guidelines:

 

 

For further clarification or updates on the availability of XOR operations, you may need to consult development forums.

 

Also, based on the provided logs, here is a basic analysis:

 

Intel DSA Configuration and Capabilities

1. Device Configuration:

  • The logs show two DSA devices (dsa0 and dsa1) with configurations for work queues, engines, and groups.
  • dsa0 is enabled, while dsa1 is disabled. This indicates that only one device is actively configured for operations.

2. Work Queue Details:

  • dsa0 has one dedicated work queue (wq0.0) enabled, with a size of 64 and a maximum transfer size of 2 MB. This work queue is configured for user mode operations.
  • Other work queues are in shared mode but are disabled, which limits the operational capacity of the device.

3. PASID and Virtualization:

  • PASID (Process Address Space ID) is enabled for dsa0, which supports virtualization and user-level Shared Virtual Memory (SVM). However, the logs indicate issues with PASID attachment for in-kernel DMA operations (No in-kernel DMA with PASID).

4. Operational Capabilities:

  • The op_cap field indicates supported operations, including memory move, fill, compare, and transformation tasks like CRC generation and DIF. However, XOR operations are not explicitly supported.

 

Error Analysis

1. PASID Attachment Issues:

The error failed to attach device pasid 1, domain type 4 suggests that the kernel driver is unable to attach PASID for DMA operations. This could be due to hardware or software limitations in the current setup.

2. Deprecated Attributes:

The logs mention deprecated attributes (max_read_buffers and read_buffer_limit). This indicates that newer configurations or driver updates may be required to fully utilize the DSA capabilities.

3. PCIe Error Handling:

The logs show PCIe-related errors during device initialization. These errors are marked as correctable, but they may impact the stability and performance of the DSA.

 

Recommendations

1. Driver and Firmware Updates:

Update the DSA driver and firmware to address PASID attachment issues and deprecated attributes.

2. Configuration Review:

Enable additional work queues and optimize their configurations for specific workloads. Ensure that the device is properly configured for virtualization and SVM.

3. Error Mitigation:

Investigate PCIe error handling and ensure that the kernel is configured to handle DSA-related errors effectively.

4. Documentation Reference:

Refer to the Intel® Data Streaming Accelerator Architecture Specification and User Guide for detailed configuration and optimization guidelines.

 

 

Please note that this is shared as a best effort from our end (Xeon Hardware break fix team) while this issue is more related to software/ OS configurations.

 

Hope this information helps and if there is anything else that we may assist you with, please feel free to write back to us.

 

Happy Troubleshooting!

 

 

 

Regards,

Subhashish_Intel.

 

0 Kudos
Reply