HPC
Consult with Intel® experts on HPC topics
19 Discussions

DAOS Momentum Demonstrated with New IO500 Rankings and Community DAOS Traction

Rick_Johnson
Employee
0 0 4,617

Posted on behalf of Andrey Kudryavtsev

 

DAOS1 continues to gain traction and benefit users of high-performance application services, including traditional high-performance computing (HPC), artificial intelligence/machine learning/deep learning (AI/ML/DL), high-performance data analytics (HPDA), and converged enterprise IT. In this blog, I share some of the most recent development around DAOS, many of which were revealed at ISC 2022.

Intel Continues Its Strong Commitment/Execution of the DAOS Roadmap

DAOS remains a key part of the Intel HPC/AI technology strategy. The project continues to march along at its intended cadence of releases. DAOS 2.0.x releases have continued this year with features and fixes generally on target with the DAOS community roadmap. Release information for version 2.x can be found here. Version 2.2 updates are scheduled for after ISC22.  

DAOS interest and deployments continue to grow. From seven rankings on the IO500 announced at ISC ’21, to nine rankings later that year at SC ’21, and now 13 rankings just announced at ISC ’22—five in the top 10 rankings and all within the top 50—DAOS is showing to be a leading solution for low-latency, high-performance storage in the HPC/AI/Analytics and cloud spaces.

At ISC this year, the Intel DAOS demo was a leading presentation in the Intel booth. The DAOS architecture roadmap had always looked toward integrating new middleware to interface the DAOS API with other applications. This year, the DAOS demo included TensorFlow integration, allowing AI applications using the TensorFlow framework to directly interface with the DAOS stack. CosmoFlow in the demo). You can view the demo in the Intel Digital Lounge.

Additionally at ISC ’22, DAOS was part of the official program with a Birds of Feather (BoF) session that included Intel, Argonne National Laboratory—where DAOS is the primary storage system for the Argonne Computing Leadership Facility’s (ACLF) Aurora Exascale supercomputer—and the Edinburgh Parallel Computing Centre (EPCC). Members of the DAOS community were able to gather, share their experiences running the storage stack, and collaborate on ideas for future enhancements.

DAOS Deployments

At the University of Cambridge, DAOS has moved from a research project in the University of Cambridge Open Zettascale Lab to a production testbed on the Cumulus supercomputer run within the University of Cambridge’s Service for Data Driven Discovery (CSD3), which constitutes one of the UK's largest and most widely used National HPC capabilities.

The Cumulus heterogeneous supercomputer was ranked ninth on the IO500.org June 2022 list just released at ISC ’22. Cumulus storage performance delivered 283.19 GB/s bandwidth at 4,328.68k IOPS. It also has a computational performance of 10 petaflops and is powered by Dell EMC PowerEdge servers. CSD3 supports a wide range of UK Research and Innovation communities funded via the Science and Technology Facilities Council's DiRAC and IRIS programs, the Engineering and Physical Sciences Research Council Tier 2, and the Medical Research Council. CSD3 enables leading UK health projects across cancer genomics, Covid modeling and patient care and advanced medical imagining. It also supports breakthrough research in climate change prediction, green energy production and frontier experimental physics and astronomy projects.

Additionally, phase one of the DAOS storage system on SuperMUC-NG at LRZ, the Leibniz Supercomputing Center, was completed this year. SuperMUC-NG ranked 8 on the IO500.org June 2022 list with 321.75 GB/s of bandwidth and 5,844.40k IOPS. The DAOS installation is part of a SuperMUC-NG Phase 2 deployment, designed specifically to include higher capacity high-performance DAOS storage and more compute to support demand of more AI workloads and projects. When completed, the system, built by Lenovo, will include 4th Generation Intel® Xeon® Scalable processors and the Intel® data center GPU (formerly code named Ponte Vecchio). The IO500 rating for DAOS at LRZ is expected to further improve once the full cluster is available.

New Tech Support Services Partners

The DAOS software-defined storage stack is an open source project. Several OEMs are partnering with Intel to deploy and support solutions built on DAOS and Intel® technologies. To assist OEM customers as more deploy DAOS, Intel support strategies include working with its OEM partners to provide technical support through the partnerships.

 “Lenovo and Intel have partnered to deliver an unrivalled DAOS storage solution, built on Lenovo ThinkSystem servers featuring Intel Xeon processors. Customers can now purchase Intel DAOS with deployment support from experts from Lenovo Professional Services and Intel. The Lenovo Professional Services team will handle Level 1 and Level 2 support, backed by Intel experts for Level 3. This should give customers peace of mind when deploying the solution into production environments.” said Scott Tease, Vice President and General Manager of High Performance Computing and Artificial Intelligence, Lenovo Infrastructure Solutions Group. 

Another support partnership for DAOS by Croit GmbH is bringing its expertise in management and support of open source software-defined storage products by releasing its Croit Platform for DAOS. This will enable users to easily deploy and manage DAOS clusters and use additional access methods commonly used in traditional enterprise products like NVMeOF, S3, and NFS. Croit is also announcing the creation of a U.S. subsidiary to better serve and expand the reach into the U.S. market. These two strategic support partnerships will help accelerate adoption of DAOS solutions in HPC, AI, HPDA, and converged enterprise IT services.

Google Cloud Now Offering DAOS to Optimize Storage Performance and Performance/$

DAOS has arrived at Google Cloud, making it fast and easy to optimize cost and performance/$ for Google Cloud projects. Demand for storage performance grows in HPC cloud infrastructure. While Lustre has been a mainstay for high-performance parallel file systems in HPC, traditional HPC, AI, HPDA, and enterprise deployments with DAOS are growing because of its high-bandwidth, low-latency, high IOPS performance needed by workloads in these domains. With the expansion of data in computing and growing use of Google Cloud for HPC, integrating DAOS into Cloud HPC Toolkit was an important step to continue to provide high-performance cloud infrastructure for a variety of workloads.

Adding DAOS delivers a new high-performance storage tier for cloud projects. In financial services, for example, while historical data is stored in lower-cost Google Cloud storage services, for algorithm and application testing, it can be copied to a DAOS storage tier to support testing and analysis, and the results copied back to lower-cost storage for reporting. Users achieve the most optimal cost-performance and can reduce their workload run times compared to relying on slower traditional storage technology.

AI training can also benefit from DAOS by running the training data set out of high-performance DAOS storage while maintaining the data in disks. Google is looking at other use cases where DAOS will complement its traditional offerings in Google Cloud for customers.

Future Innovation is Wide Open

The future of high-performance storage built on DAOS is wide open to innovation. OEMs are adding DAOS solutions to their systems, and Intel partners are offering support to their customers in collaboration with Intel. Being a community-driven software stack, innovation is open to what developers and system architects can design around the technology.

Storage appliances are a natural product evolution to enable easy adoption of DAOS for any deployment. High-performance storage is needed in more than traditional HPC and the largest AI projects. Smaller systems—from HPC to enterprise IT—can benefit from DAOS, and appliances make perfect sense to benefit users both in terms of accelerating deployment of a system and its costs.

Until now, high-performance storage solutions have been typically delivered through custom hardware, such as dual-port SAS appliances or vendor-specific all-FLASH systems. With open source DAOS and commodity server platforms equipped with Intel® architecture and NVMe storage, access to high-performance storage solutions can expand, reducing the cost of such solutions while making it available to more users.

Find out more about DAOS and DAOS performance by visiting these resources:

 

[1] DAOS (see https://docs.daos.io/) is an open-source scale-out object store designed from the ground up to deliver extremely high bandwidth/IOPS and low latency I/O to the most demanding data-intensive workloads. It aims at supporting next-generation HPC workflows that combine simulation, big data, and AI in a single storage tier. DAOS presents a rich and scalable storage interface that allows efficient storage of both structured and unstructured data. DAOS supports multiple application interfaces including a parallel filesystem, Hadoop/Spark connector, TensorFlow-IO, native Python dictionary bindings, HDF5, MPI-IO as well as domain-specific data models like SEGY.

 

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure. 

Your costs and results may vary. 

© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.