Connect with Intel® experts on FPGAs and Programmable Solutions
213 Discussions

Accelerating Memory Bound AI Inference Workloads with Intel® Stratix® 10 MX Devices

0 0 798
Artificial intelligence (AI) systems are increasingly constrained by memory solutions that overly constrict available memory bandwidth. For example, Recurrent Neural Networks (RNNs) used in AI applications such as finance, genome mapping, and speech AI – including Automatic Speech Recognition (ASR), Natural Language Processing/Understanding (NLP/NLU) – have two common traits:

  • They are memory intensive

  • They require very low latency

Consequently, RNN applications can become memory bound when implemented with the wrong memory architecture. Intel® Stratix® 10 MX FPGAs – with integrated, in-package, 3D stacked HBM2 DRAM – provide 10X more memory bandwidth with better performance per watt compared to conventional memory solutions such as DDR SDRAMs1.

Manjeera Digital Systems has developed a Universal Multifunction Accelerator (UMA) IP that solves memory-bound bottlenecks for applications like RNNs. The Manjeera UMA is a scalable, programmable datapath processor that delivers the performance of a hardware datapath while retaining software-programmable flexibility. Manjeera’s UMA implemented in an Intel® FPGA like the Intel Stratix 10 MX FPGA is called a Programmable Inference Engine (PIE), which can accelerate a wide variety of deep neural network (DNN) workloads including RNNs.

When instantiated in an Intel Stratix 10 MX FPGA, the Manjeera PIE connects to all sixteen HBM2 DRAM stack’s pseudo-channels and partitions the available memory in the HBM2 stack into sixteen independent blocks, resulting in an aggregate data transfer rate of 170 GBps per HBM2 stack. (An Intel Stratix 10 MX FPGA incorporates one or two 3D stacked HBM2 memories.) This high available data rate maximizes the PIE’s use of the available HBM2 DRAM stack’s bandwidth and delivers significantly more performance relative to the bandwidth available from external DDR SDRAM. High memory bandwidth has proven to be a key factor in achieving low-latency RNN performance.

The PIE is integrated into the OpenVino environment for direct import of TensorFlow models. The PIE also comes with a software stack for direct import of Keras models.

A new Intel White Paper titled “Accelerating Memory Bound AI Inference Workloads with Intel® Stratix® 10 MX Devices” provides additional technical details on this topic. To download this White Paper, click here to access the Intel FPGA Partner Solution page, scroll down to the Manjeera Digital Systems section, and click on the White Paper link.

Intel’s silicon and software portfolio empowers our customers’ intelligent services from the cloud to the edge.



Notices and Disclaimers

1 Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.
Tags (2)