Why Building Products with Integrated AI is Easier with FPGAs

Deepali_Trehan · ‎09-18-2023

It seems like GPT-4 has blown the doors off all earlier expectations for Artificial Intelligence (AI). There is little doubt that AI will be a part of the Digital Transformation in a big way going forward. At Intel®, we’re embracing AI in all aspects of our business, from the data center to the cloud, within the network, and in embedded applications at the edge of the internet.

We’re all familiar with self-driving cars and robots, but there is a myriad of applications that are less talked about but no less exciting. Applying AI and machine learning (ML) to medical imaging and diagnosis, industrial inspection, speech recognition, video and image processing and recognition, smart city, smart signage, and many, many more embedded applications will soon result in products with amazing capabilities. AI will also become a competitive requirement in virtually all product areas.

Architects and engineers are increasingly adopting FPGAs for their AI solutions because they have many characteristics that make them ideal for the purpose. The programmable logic fabric within an FPGA makes them excellent implementation targets for neural networks and other AI workloads. They provide very high-performance parallel processing with very low power requirements. In addition, FPGAs have an enormous choice of external I/O options that are generally not available in other hardware architectures. These can be used to connect to a wide variety of sources such as radar, audio, vibration, and vision.

Solution providers are taking advantage of the inherent strengths of Intel Agilex® FPGAs to implement AI. Leveraging Intel’s leading chiplet technology, Intel Agilex FPGAs combine the key building blocks of dedicated AI processing, custom logic, memory, RF and digital signal processing, standard CPUs, and fast I/O in a small, low-power, secure package. The alternative to FPGAs generally involves multiple discrete devices. The higher integration provided by FPGAs enables a more efficient implementation in terms of performance, power efficiency, and size.

To meet the need for AI in new systems, Intel Agilex® 5 FPGAs are the first, and currently the only AI-enhanced FPGA product family. A case in point is Intel Agilex 5’s ability to deliver 5X higher INT8 capability for AI operations compared to our previous generation products1. While GPU approaches typically use FP32 or FP16 operations for training, newer AI training models and techniques employ “scalable precision” to increase speed and reduce power. Intel Agilex 5 AI Tensor Blocks deliver the speed of INT8 with the precision of FP16, in most cases eliminating the need for additional “quantization aware training” (QAT) that extends development time.

Intel Agilex 5 FPGAs also feature significant increases in inferencing speed to reduce response latency. Intel Agilex 5 delivers 3.8x higher frames per second (FPS)2 on the RESNET-50 AI benchmark compared to our previous generation, and 69% higher FPS than competitor FPGAs3. Intel Agilex 5 FPGAs provide low AI latency, high power efficiency, hardened security, and fit into packages as small as 15mm x 15mm. They give you all the features you need to build custom deep-learning FPGA solutions that can easily integrate with other system elements through advanced PCIe and CXL interface IP.

Once you’ve chosen the perfect Intel Agilex FPGA for your AI application, how do you put all the elements of your solution together? The Intel FPGA design environment now has a component called Intel FPGA AI Suite to simplify the job of incorporating AI into custom FPGA solutions. Intel FPGA AI Suite works with Intel Quartus® Prime in a single, integrated design flow that takes you from a trained machine learning (ML) model to an AI IP block that you drop into your Intel Quartus Prime environment to integrate with memory, I/O, CPUs, and custom logic for a finished Intel Agilex 5 FPGA solution.

chart 1.png

The Intel FPGA AI Suite includes the Intel release of OpenVINO™, which accepts a trained TensorFlow, Caffe, Keras, ONNX, PyTorch, mxnet, or Paddle AI model. OpenVINO optimizes the model to reduce its size, and then provides information to Intel FPGA AI Suite, which maps the resources needed to implement the model in an Intel Agilex 5 FPGA.

Intel FPGA AI Suite also includes Architecture Optimizer, which helps users fine-tune device resources (e.g., memory vs. processing elements) to achieve maximum throughput. The ability to customize Intel Agilex 5 FPGAs enables architects to explore the design space and optimize the footprint of AI applications for size, weight, and power. Once your AI block is defined, you can use the Intel Quartus Prime toolset to integrate it with other custom logic, CPU, memory, I/O, and other IP blocks.

Architecture Optimizer allows you to fine-tune device resources (e.g., memory vs. processing elements) to achieve maximum throughput and minimum power consumption.

One of our customers, Exor International, used Intel FPGA AI Suite with an Intel Agilex FPGA to build an industrial optical inspection system that identifies defects on display units in real-time. The system leverages a deep learning autoencoder neural network trained to detect possible imperfections using a model trained on display units without anomalies. Exor’s measurements show their neural network running on an Intel FPGA designed with Intel FPGA AI Suite is 19.1 times faster than a CPU alone. According to Exor, “Intel FPGA AI Suite and OpenVINO™ toolkit saved us months of time to add FPGA AI inference to Exor’s GigaSOM gS01 kit. The example designs provided with the suite enabled the team to quickly evaluate different algorithms for different image sources. Intel FPGA AI Suite and the Intel® Distribution of OpenVINO toolkit enables data scientists and FPGA engineers to seamlessly work together to develop optimized deep learning inference for medical applications."

Now is the time to learn about Intel Agilex FPGAs and how they can help you stay at the crest of the AI wave. There is a lot to learn and I invite you to take advantage of the resources Intel offers to bring you up to speed, such as this white paper on AI in FPGAs. Also check out our range of FPGA products at intel.com/fpga and sign up for our FPGA newsletter to stay abreast of many new products and technologies coming soon. More about our Intel Quartus Prime FPGA development tools with Intel FPGA AI Suite, FPGA development kits, and a wealth of training are also available.

Notes/disclaimers: Performance varies by use, configuration and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation. Intel does not control or audit third-party data.  You should consult other sources to evaluate accuracy. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others. 

5x versus earlier Intel FPGAs à Intel Agilex 5-Series versus Intel Agilex 9-Series, 7-Series FPGAs, Arria 10, Cyclone V, Stratix 10 FPGAs.
Agilex D Series AGD064 estimated device performance = 578 FPS / 13 Watts. Projected performance with generic CNN architecture of Intel FPGA AI Suite (not measured or simulated). Host CPU and DDR capacity can affect estimated performance.
- FPS estimated performance with two instances of inference IP, 90% utilization in mid speed grade.
- Performance and power may not reflect future hardware measurements.
Competitor FPGA information (VE2302 at ~20Watts) based on published AMD Versal AI Edge performance and utilization information as of August 24th, 2023; may change in future revisions.
- tf_resnetv1-50_3.5 @ 4793 FPS in VE2802 scales to 342 FPS in VE2302
- DPU VE2802 /C20B14CU1/169.16 TOPS/4793 FPS scaled to DPU VE2302 / C20B1CU1/12.08 TOPS / 342 FPS
- https://docs.xilinx.com/r/en-US/pg425-dpucv2dx8g/Resource-Utilization
- https://xilinx.github.io/Vitis-AI/3.5/html/docs/reference/ModelZoo_Github_web.htm
- https://www.xilinx.com/publications/presentations/versal-ai-edge-product-announcement.pdf