Connect with Intel® experts on FPGAs and Programmable Solutions
213 Discussions

Addressing the Greatest Memory and Compute Challenges with Intel® Agilex™ M-Series FPGAs

0 0 3,782

The way the world uses the Internet is undergoing a paradigm shift: from centralized clusters of compute and data storage to a more distributed architecture that processes data everywhere - in the cloud, at the edge, and at all points in between. This evolving cloud-to-edge infrastructure model combines unprecedented scale and compute capacity in the cloud with exponential increases in network bandwidth. This infrastructure model extends all the way to the edge, where keeping data close at hand allows better decisions to be made faster. Intel believes this new model is a technology superpower that’s shaping the world’s digital transformation and spurring a digital renaissance, which is why we are focused on introducing an array of new technologies that enable ubiquitous compute, the cloud-to-edge infrastructure, pervasive connectivity, and AI. This global paradigm shift is driving explosive growth in semiconductor use around the world and represents a huge opportunity for both Intel and the entire technology industry. Intel’s highly flexible and programmable logic portfolio plays an essential role in accelerating these technology inflections.

 We are enabling our customers to achieve higher performance with better power efficiency across all end markets and applications with our flagship Intel® Agilex™ FPGA family.  We continue to build the momentum, with the Intel® Agilex™ M-Series family variant. Intel Agilex M-Series FPGAs incorporate several new functional innovations and new features that, together, significantly boost absolute performance and performance/watt through several dimensions that are critical to the development of new, ever more powerful, and more efficient systems. These innovations and new features include:

  • The industry’s highest memory bandwidth for an FPGA1
  • The industry’s highest DSP compute density in an HBM enabled FPGA2
  • Greater than 2X fabric performance per watt vs competitive 7 nm FPGAs3

The abilities of the new Intel Agilex M-Series FPGAs provide the industry with the high-speed networking, computing, and storage acceleration required to meet ever more ambitious performance and capability goals for network, cloud, and embedded edge applications.


“M” is for Memory

The “M” in “M-Series” means “memory.” More and faster memory is certainly one prominent and important benefit incorporated into Intel Agilex M-Series FPGAs. Almost without exception, advanced applications require a memory hierarchy that ranges from fast, to faster, to fastest while allowing design teams to trade off memory bandwidth and latency versus memory capacity. Intel Agilex M-Series FPGAs feature a wide and flexible memory hierarchy, which encompasses ultra-low latency, ultra-high bandwidth, on-chip SRAM; higher-capacity, high-bandwidth, in-package memory in the form of HBM2e (High-Bandwidth Memory) DRAM stacks; support for fast, high-capacity external synchronous DRAM (SDRAM) including DDR4, DDR5, and LPDDR5; and ultra-high-capacity, non-volatile Intel® Optane™ persistent memory.

All Intel Agilex FPGAs, including members of the M-Series, include fast, on-chip SRAM in the form of MLAB and M20K blocks. These SRAMs are integrated into the FPGA’s programmable-logic fabric and are therefore located immediately adjacent to the logic that will exchange data with these memories. Some Intel Agilex M-Series FPGAs also incorporate in-package HBM in the form of HBM2e memory stacks, managed by hardened memory controllers.

Intel Agilex M-Series FPGAs push the HBM envelope even further by incorporating two HBM2e DRAM stacks. The two in-package HBM2e DRAM stacks in the Intel Agilex M-Series FPGAs provide a maximum of 32 Gbytes and memory bandwidth up to 410 Gbytes/second per HBM2e stack, for a total in-package, HBM2e memory bandwidth of up to 820 Gbytes/second. That’s a 60% bandwidth increase compared to the prior generation Intel Stratix 10 MX FPGAs, which makes it possible for designers to use Intel Agilex M-Series FPGAs in more challenging system designs.4

For applications that require additional high-speed DRAM capacity, Intel Agilex M-Series FPGAs also support external DDR5 and LPDDR5 SDRAMs through integrated, hardened, high-efficiency memory controllers. Intel Agilex M-Series FPGAs along with Intel® Optane™ persistent memory gives system designers additional flexibility and even more memory capacity when constructing a system-specific memory hierarchy.

DDR5 and LPDDR5 SDRAMs are currently the fastest mainstream SDRAM DIMMs available. Each of the independent memory controllers in the Intel Agilex M-Series FPGAs can operate DDR5 SDRAM at 5600 MTransfers/second with a data width as wide as 40 bits per channel (plus ECC bits). When the HBM2e and DDR5 memory bandwidths are combined, Intel Agilex M-series FPGAs with eight attached DDR5 SDRAM DIMMs deliver a theoretical maximum memory bandwidth of 1.099 TBytes/second.

Channeling a Tbyte/second of data between the FPGA’s programmable-logic fabric and the HBM2e and DDR5 memories could prove challenging. This potential bottleneck drove the development of another Intel Agilex M-Series FPGA innovation: a dual, hardened Memory Network on Chip (NoC). This Memory NoC acts as a superhighway that connects the in-package HBM2e DRAM, the external DDR5 SDRAM, and the FPGA’s high-speed external I/O to the FPGA fabric. The dual Memory NoC’s aggregate peak bandwidth is 7.52 TBytes/second – a truly massive amount of on-chip bandwidth that consumes none of the FPGA’s on-chip programmable logic resources – which greatly reduces the potential for a memory bottleneck.


Fast I/O and Fast Compute

Smoothly moving data into and out of this memory hierarchy and throughout the FPGA is critical to meeting aggressive, system-level performance goals, but it’s also essential that designers have access to other high-bandwidth resources that can move data between the external system and the FPGA. The Intel Agilex M-Series FPGAs incorporate as many as 72 high-speed SERDES transceivers, including as many as eight transceivers that can each operate at 116 Gbps using PAM4 modulation, to exchange data between the FPGA and the rest of the system at very high data rates. Intel Agilex M-Series SERDES transceivers support a variety of today’s emerging, industry-standard, high-speed serial protocols including 400G Ethernet and can be directly interfaced to advanced CPUs, including the newest Intel® Xeon® CPUs using the PCIe Gen 5 and CXL interface protocols.

Finally, once data has entered the FPGA, it must generally be processed through a wide variety of computing algorithms. The programmable-logic fabric in the Intel Agilex M-Series FPGAs is extremely fast and can implement a variety of computing algorithms at high data rates. In addition, this programmable-logic fabric incorporates up to 12,300 variable-precision, floating-point DSP blocks capable of delivering 18.5 single-precision TFLOPS or 88.6 INT8 TOPS, for handling even heavier computational loads.

For more detailed information about the Intel Agilex M-Series FPGAs including a Device Overview, a White Paper, and a Solution Brief, click here.




Legal Notices and Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​.  

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates.  See backup for configuration details.  No product or component can be absolutely secure. 

Your costs and results may vary. 

Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.  ​

  1. Intel Agilex M-Series theoretical maximum bandwidth of 1.099 TBps with 2 banks of HBM2e using ECC as data and 8 DDR5 DIMMs as compared to Xilinx Versal HBM memory bandwidth of 1.056 TBps from https://www.xilinx.com/products/silicon-devices/acap/versal-hbm.html#productAdvantages and from https://www.xilinx.com/content/dam/xilinx/support/documentation/selection-guides/versal-hbm-product-selection-guide.pdf as of October 14, 2021 and to Achronix Speedster 7t memory bandwidth of 0.5 TBps from https://www.achronix.com/sites/default/files/docs/Speedster7t_Product_Brief_PB033.pdf as of October 14th 2021
  2. Intel Agilex M-Series DSP compute density projected at 88.6 INT8 TOPs and 18.45 FP32 TFLOPs, compared to Xilinx Versal HBM at 74.9 INT8 TOPs and 17.5 FP32 TFLOPs from https://www.xilinx.com/content/dam/xilinx/support/documentation/selection-guides/versal-hbm-product-selection-guide.pdf as of October 14, 2021 and to Achronix Speedster 7t at 61.4 INT8 TFLOPs and no support for FP32, from https://www.achronix.com/machine-learning-processor as of October 14, 2021.
  3. Agilex M-Series >2x fabric performance/W results are based on projections of Agilex AGM039-R31B compared to measurements on Agilex AGI027-R31B, and power comparison of AGF014-2 to a Xilinx Versal FPGA fabric of equivalent density, where Agilex AGI027-R31B is projected to have the same core fabric performance/watt as measured on AGF014-2. Comparison assumes Xilinx Versal HBM has the same core fabric as similar Versal devices without HBM as of October 2021.
  4. Intel Agilex M-Series compute density is projected at 18.45 FP32 TFLOPs, HBM memory bandwidth is projected at 410 GBps per stack, and EMIF DDR5 performance projected at 5600 MT/s. Prior generation Stratix 10 MX compute density is 6.3 FP32 TFLOPs, HBM memory bandwidth is 256 GBps per stack, and EMIF DDR4 performance is 2667 MT/s.
About the Author
Sabrina is Director of Marketing, FPGA Platforms and has over 20 years of experience serving a wide breadth of markets including data center, communications and industrial with a strong technical background in FPGAs. She has a Bachelor of Science degree in Electrical Engineering from San Jose State University.