Intel® Xeon® Processor and Server Products
Intel® Xeon® Processors, Data Center Products including boards, integrated systems, and RAID Storage
5246 Discussions

Compute in data streaming accelerator

asdasf
Beginner
330 Views

Hi,

 

I am currently trying to use the Data Streaming Accelerator (DSA) to (1) read two inputs, (2) do XOR, and (3) store the outputs.

 

According to the “Intel® Data Streaming Accelerator Architecture Specification (v3.0)”, this kind of compute operation appears to be supported by the hardware.

 

However, when using the Intel Data Movement Library (DML) or when using DSA directly via idxd.h, I cannot find any interface for XOR operations -- only memory copy, memory fill, ...

 

Are the compute operations described in the DSA architecture spec currently unavailable in software stacks?

If they are available, is there any guideline, header, or example for using them?

 

Thanks.

 

Screenshot 2026-01-17 at 1.58.41 PM.png

 

// idxd.h

/* Opcode */
enum dsa_opcode {
DSA_OPCODE_NOOP = 0,
DSA_OPCODE_BATCH,
DSA_OPCODE_DRAIN,
DSA_OPCODE_MEMMOVE,
DSA_OPCODE_MEMFILL,
DSA_OPCODE_COMPARE,
DSA_OPCODE_COMPVAL,
DSA_OPCODE_CR_DELTA,
DSA_OPCODE_AP_DELTA,
DSA_OPCODE_DUALCAST,
DSA_OPCODE_TRANSL_FETCH,
DSA_OPCODE_CRCGEN = 0x10,
DSA_OPCODE_COPY_CRC,
DSA_OPCODE_DIF_CHECK,
DSA_OPCODE_DIF_INS,
DSA_OPCODE_DIF_STRP,
DSA_OPCODE_DIF_UPDT,
DSA_OPCODE_DIX_GEN = 0x17,
DSA_OPCODE_CFLUSH = 0x20,
};

 

0 Kudos
6 Replies
Poojitha
Employee
286 Views

Hi asdasf,


Greetings for the day!


Thank you for reaching out to Intel Support. We acknowledge receipt of your concern and would like to assure you that assisting you is our top priority.


To assist you further, we require some additional information from your end.


Kindly provide the system details and the processor model for which you are seeking the necessary information.


This will help us review the complete details and assist you further.


We appreciate your understanding!


Best regards,

Poojitha N

Intel Customer Support Technician


0 Kudos
asdasf
Beginner
236 Views

Thank you for your prompt response.

 

Regarding your request, please find the system details below:

 

Product / Platform

  • Intel Xeon Platinum 8558 (2× sockets)

OS / Kernel / Drivers

  • OS: Ubuntu 25.04 LTS

  • Kernel: 6.14.0-1007-intel

  • DSA driver: idxd v1.0

  • Library: Using the linux/idxd.h UAPI headers for descriptor submission

  • accel-config version: accel-config 4.1.8+

Issue summary

We are attempting to submit a descriptor for a Reduce/XOR operation through the user-space write(fd, &desc, …)submission path using /dev/dsa/wq*. The device returns a completion status 0x10 (DSA_COMP_BAD_OPCODE). The same descriptor pipeline works for DSA_OPCODE_MEMMOVE.

This raises the question of whether Reduce/XOR opcodes are currently supported on this CPU/driver combination or require a newer DSA specification / microcode / driver stack.

If you need additional traces, PCIe capability dumps, DSACAP registers, or accel-config dumps (accel-config list -i), I will gladly provide them.

 

Thank you again for your assistance, and please let me know if further details are required.

 

Best regards,

Juntaek

0 Kudos
pujeeth
Employee
221 Views

Hello asdasf,


Thank you for providing the detailed issue summary regarding the DSA Reduce/XOR descriptor submission failure.


To proceed with our analysis, could you please share the following details from the affected system:


1) DSA capability registers

2) Full DSA configuration and work queue information:

3) PCIe capability and device information for the DSA device:

4) Kernel log messages related to DSA initialization:


Regards

Pujeeth_Intel



0 Kudos
asdasf
Beginner
164 Views

Here are the results of several commands that may contain the details you are looking for. Please let me know if you need any additional information.

$ sudo accel-config list -i
[
  {
    "dev":"dsa0",
    "read_buffer_limit":0,
    "max_groups":4,
    "max_work_queues":8,
    "max_engines":4,
    "work_queue_size":128,
    "numa_node":0,
    "op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
    "gen_cap":"0x40915f0107",
    "version":"0x100",
    "state":"enabled",
    "max_read_buffers":96,
    "max_batch_size":1024,
    "configurable":1,
    "pasid_enabled":1,
    "cdev_major":509,
    "clients":0,
    "groups":[
      {
        "dev":"group0.0",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96,
        "grouped_workqueues":[
          {
            "dev":"wq0.0",
            "mode":"dedicated",
            "size":64,
            "group_id":0,
            "priority":1,
            "block_on_fault":0,
            "max_batch_size":32,
            "max_transfer_size":2097152,
            "cdev_minor":0,
            "type":"user",
            "name":"swq",
            "driver_name":"user",
            "threshold":0,
            "ats_disable":0,
            "state":"enabled",
            "clients":0
          }
        ],
        "grouped_engines":[
          {
            "dev":"engine0.0",
            "group_id":0
          },
          {
            "dev":"engine0.1",
            "group_id":0
          },
          {
            "dev":"engine0.2",
            "group_id":0
          },
          {
            "dev":"engine0.3",
            "group_id":0
          }
        ]
      },
      {
        "dev":"group0.1",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group0.2",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group0.3",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      }
    ],
    "ungrouped workqueues":[
      {
        "dev":"wq0.1",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.2",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.3",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.4",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.5",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.6",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq0.7",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      }
    ]
  },
  {
    "dev":"dsa1",
    "read_buffer_limit":0,
    "max_groups":4,
    "max_work_queues":8,
    "max_engines":4,
    "work_queue_size":128,
    "numa_node":2,
    "op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
    "gen_cap":"0x40915f0107",
    "version":"0x100",
    "state":"disabled",
    "max_read_buffers":96,
    "max_batch_size":1024,
    "configurable":1,
    "pasid_enabled":1,
    "cdev_major":509,
    "clients":0,
    "groups":[
      {
        "dev":"group1.0",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group1.1",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group1.2",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      },
      {
        "dev":"group1.3",
        "read_buffers_reserved":0,
        "use_read_buffer_limit":0,
        "read_buffers_allowed":96
      }
    ],
    "ungrouped workqueues":[
      {
        "dev":"wq1.0",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.1",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.2",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.3",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.4",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.5",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.6",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      },
      {
        "dev":"wq1.7",
        "mode":"shared",
        "size":0,
        "priority":0,
        "block_on_fault":0,
        "max_batch_size":32,
        "max_transfer_size":2097152,
        "type":"none",
        "name":"",
        "driver_name":"",
        "threshold":0,
        "ats_disable":0,
        "state":"disabled",
        "clients":0
      }
    ],
    "ungrouped_engines":[
      {
        "dev":"engine1.0"
      },
      {
        "dev":"engine1.1"
      },
      {
        "dev":"engine1.2"
      },
      {
        "dev":"engine1.3"
      }
    ]
  }
]
$ lspci -nn | grep -Ei 'data streaming|I/O Accel|idxd|0b25'
6a:01.0 System peripheral [0880]: Intel Corporation Device [8086:0b25]
e7:01.0 System peripheral [0880]: Intel Corporation Device [8086:0b25]
$ sudo lspci -vvv -s 6a:01.0
6a:01.0 System peripheral: Intel Corporation Device 0b25
        Subsystem: Intel Corporation Device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        IOMMU group: 4
        Region 0: Memory at afffff20000 (64-bit, prefetchable) [size=64K]
        Region 2: Memory at afffff00000 (64-bit, prefetchable) [size=128K]
        Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, IntMsgNum 0
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+ FLReset+ TEE-IO-
                DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                         AtomicOpsCtl: ReqEn-
                         IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
                         10BitTagReq+ OBFF Disabled, EETLPPrefixBlk-
        Capabilities: [80] MSI-X: Enable+ Count=9 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [90] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq+ ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [160 v1] Transaction Processing Hints
                Device specific mode supported
                Steering table in TPH capability structure
        Capabilities: [170 v1] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=1
                Arb:    Fixed+ WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
                VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=1 ArbSelect=Fixed TC/VC=02
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Designated Vendor-Specific: Vendor=8086 ID=0005 Rev=0 Len=24 <?>
        Capabilities: [220 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable+, Smallest Translation Unit: 00
        Capabilities: [230 v1] Process Address Space ID (PASID)
                PASIDCap: Exec- Priv+, Max PASID Width: 14
                PASIDCtl: Enable+ Exec- Priv+
        Capabilities: [240 v1] Page Request Interface (PRI)
                PRICtl: Enable+ Reset-
                PRISta: RF- UPRGI- Stopped+ PASID+
                Page Request Capacity: 00000200, Page Request Allocation: 00000200
        Kernel driver in use: idxd
        Kernel modules: idxd

$ sudo dmesg | grep -Ei 'dsa|idxd'
[   13.462756] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[   13.476012] idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4
[   13.476325] idxd 0000:6a:01.0: No in-kernel DMA with PASID. -22
[   13.528386] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[   13.528513] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[   13.542617] idxd 0000:e7:01.0: failed to attach device pasid 1, domain type 4
[   13.543174] idxd 0000:e7:01.0: No in-kernel DMA with PASID. -22
[   13.559001] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[153908.260496] idxd dsa0: attribute deprecated, see max_read_buffers.
[153908.260631] idxd dsa0: attribute deprecated, see read_buffer_limit.

 

0 Kudos
Steve_Jerome22
Employee
116 Views

Hi asdasf,


Greetings for the day!


As checked, we could see that the processor is a tray processor. We request you to contact your Intel account representative or the place of purchase for further assistance on this query.


Thanks for your understanding


Regards

Jerome

Intel Customer Support Technician


0 Kudos
Poojitha
Employee
14 Views

Hi asdasf,


Greetings for the day!


Meanwhile, we will check with our internal resources regarding the requested details and will provide an update once available.


We appreciate your understanding!


Best regards,

Poojitha N

Intel Customer Support Technician


0 Kudos
Reply