Using Red Hat OpenShift Data Foundation for Storage in a Multicloud Scenario

Keith_Sundberg · ‎05-06-2022

Using Red Hat OpenShift Data Foundation for Storage in a Multicloud Scenario

This blog is the second in our series on using Red Hat OpenShift and technologies from Intel to enable a Hybrid Multicloud strategy. If you haven’t read the first blog, you can find it here and a video covering that material here.

Hybrid and multicloud infrastructures are becoming a major part of data center deployments, whether you’re talking about databases, AI, machine-learning, or telecommunications workloads. By combining these two technologies, today’s cloud-native infrastructures benefit from a hybrid multicloud approach. Distributing workloads is especially important for edge applications that need to run on-premise and for customers who want to control where they store applications and sensitive data.

As demand for hybrid and multicloud infrastructures rises, so does the demand for a method of storage that compliments the advantages of these technologies. Traditional storage methods lack the data persistence, standardization, abstraction, and performance needed for hybrid multicloud deployments. Red Hat Data Foundation paired with technologies from Intel provides the functionality and performance to fully enable all of the advantages provided by hybrid multicloud solutions.

For workload optimized data nodes, OpenShift Data Foundation can leverage configurations based on Intel Optane technology and Intel Ethernet technology. For hybrid multicloud deployments, these technologies offer the capability to move, process, and store data at incredible speeds and help eliminate bottlenecks and enable affordable data sets.

Red Hat and Intel have a long history of collaboration and together we drive open source innovation to accelerate digital transformation in the industry. With Intel providing the infrastructure hardware and Red Hat providing the infrastructure software, together we enable solutions to transition workloads from VMs to containers, build hybrid or multicloud infrastructures, and leverage the power of microservices and Kubernetes.

Red Hat Data Services is a portfolio of solutions that includes persistent software-defined storage and data services integrated with and optimized for Red Hat^® OpenShift^® Container Platform. As part of the Red Hat Data Services portfolio, Red Hat OpenShift Data Foundation delivers resilient and persistent software-defined storage based on Ceph^® technology. OpenShift Data Foundation runs anywhere Red Hat OpenShift does: on-premise or in the public cloud. The platform provides file, block, and object storage classes, enabling a wide range of data modalities and workloads.

The on-premise Bare-Metal Cluster

For this exercise, we have deployed the Red Hat OpenShift Data Foundation (Red Hat ODF) on the on-premise bare-metal cluster running Red Hat OpenShift. The installation of the Red Hat ODF is done according to the following document: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9/html/deploying_openshift_data_foundation_using_bare_metal_infrastructure/index

Before installation, for testing purposes, we added additional storage on 3 worker nodes.

Next, to manage the storage, we added the Local Storage Operator and the Red Hat OpenShift Data Foundation Operator.

The AWS Cluster

On AWS we used Amazon S3 Storage Classes as backing data store for Objects.

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 is employed to store any type of object, which allows for uses like storage for Internet applications, backup and recovery, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage.

The Azure Cluster

On Azure we used Azure blob storage as backing data store for Objects.

Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.

https://azure.microsoft.com/en-us/services/storage/blobs/

The Use cases

In this blog we have worked on three use cases showing Red Hat OpenShift Data Foundation and MultiCloud capabilities.

Mirroring use case	Caching use case	Data tiering use case
The data is mirrored between the on-premise data store and the two public cloud data stores.	The data is cached locally on the on-premise data store, while the main store is on Azure Blob storage.	The data is stored in different buckets depending on the data-severity, ie. only data stored in non severe buckets is replicated to the public data store, while the data in the severe buckets stays local.

The Mirroring Use case

The Multi-Cloud Object Gateway is a new data federation service introduced in OpenShift Data Foundation. The technology is based on the NooBaa project. The Multi-Cloud Object Gateway has an object interface with an S3 compatible API. The service is deployed automatically as part of OpenShift Data Foundation and provides the same functionality regardless of its hosting environment.

NooBaa (Multi-Cloud Object Gateway) allows applications to write data regardless of the physical data repositories. The core machine is the data controller and an endpoint.

NooBaa endpoints can scale out and be placed anywhere and applications connect to these endpoints and gain access to data repositories.

Orchestrating The Storage

NooBaa can consume data from AWS S3, Microsoft Azure Blobs, Google Storage, or any AWS S3 compatible storage private cloud.

Every NooBaa bucket, which are similar to file folders, store objects which consist of data and its descriptive metadata, has its own data placement policy, which determines where the data will be physically placed.

Data placement policies are flexible and can be modified during data lifetime. When data is written to NooBaa's endpoint, a process of deduplication, compression, and encryption will take place and multiple data chunks will be distributed. NooBaa prioritizes local data writes, when possible, and will mirror or spread the data to remote locations in the background.

To demonstrate a mirroring use case, we created an Amazon S3 bucket with the name “multicloud-aws-test-bucket”.

And an Azure Blob storage container with the name “azuremulticloudcontainer”.

Using the Noobaa dashboard, we added two storage resources: one from AWS and one from Azure.

After adding the storage resources, we created an object bucket with a mirroring policy type. More detailed information about the process can be obtained from the following documentation.

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9/html-single/managing_hybrid_and_multicloud_resources/index#mirroring-data-for-hybrid-and-Multicloud-buckets.

For testing purposes, we copied 4 test files (test_file_1, test_file_2, test_file_3 and test_txt_file_1) into the bucket we created, where the data gets stored onto the local backing store.

For the uploading of the test files, we used AWS s3 CLI. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables were set with values configured for the testing account in Noobaa. S3_ENDPOINT_URL was set to the local Noobaa endpoint https://s3-openshift-storage.apps.test.cluster.local and S3_BUCKET was set to mirroring bucket s3://mirroringbucket.

Example of the AWS s3 CLI to copy ./test_file_1:

# aws s3 --no-verify-ssl --endpoint-url ${S3_ENDPOINT_URL} ./test_file_1 ${S3_BUCKET}

We observed that the data was first written onto the local backing store, and then mirrored to AWS S3 and the Azure Blob Store backing store as per the bucket policy.

The Caching Use Case

To leverage the caching feature in Noobaa applications, access the cache bucket using the NooBaa S3 endpoints, which are stateless, lightweight, and scalable pods. As objects are written or read via the endpoint to a cache bucket, the NS-Cache policy takes over and uses the hub to write or read unavailable or already expired data in the local cache storage and automatically updates the local cache with objects (or parts of objects).

Noobaa caching works on a “write-through” cache model, which is efficiently optimizing read workflows. In “write-through” mode, new objects created on the cache bucket are written directly to the hub, while also storing a local copy of the object. Read requests are first attempted to be fulfilled from a local storage, and if the object is not found there, it is fetched from the hub and stored locally for some time to live (TTL). Note that only the relevant parts of objects (ranges) that are being used are kept locally, based on the available capacity and LRU.

Source: https://next.redhat.com/2021/07/27/bucket-caching-for-kubernetes/

The cache model ensures that workflows that require frequent access to the same object or parts of the object do not have to repeatedly get the object from the hub. Also, by only storing the relevant parts of the objects locally, the efficiency of the cache is optimized for maximizing hit ratio even with a relatively small capacity in the local storage.

To demonstrate caching use case we created separate Azure Blob storage container with “azuremulticloudcachecontainer” name.

After that, we added a Namespace resource from Azure and created a Namespace Bucket.

More detailed information about the process can be obtained from the following documentation.

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9/html-single/managing_hybrid_and_multicloud_resources/index#caching-policy-for-object-buckets_rhodf.

For testing purposes, we copied 2 test files (test_cache_txt_file_1, test_cache_txt_file_2) onto the bucket we created using AWS s3 CLI, similar to the one we used in the previous step. S3_BUCKET was set to the caching bucket s3://cachingbucket.

To check that files have been stored, we can use AWS s3 ls CLI. Here is the output for the local bucket:

And confirmation of stored data for the Azure storage container.

The Data Tiering Use case

In this use case we will mirror the non sensitive data onto data stores outside of the data center, while the sensitive data will not leave the data center. For this, we created two buckets, one with policies set to mirror the data onto the public cloud backend store and another bucket where the data will be stored locally on the local Ceph backend store. In the case of data tiering, the application will control the separation and placement of the data in the appropriate buckets.

Conclusion

Managing data storage in a hybrid multicloud environment provides distributed or failover applications the ability to achieve data consistency across your fleet. It reduces storage footprint and the cost associated with data transfer across the network between public clouds and on-premise storage resources. And it leverages the scale of the cloud with the capabilities and flexibility of on-premise clusters.

In this article we discussed how to migrate data between an on-prem bare metal OpenShift cluster, and Managed OpenShift clusters on ROSA and ARO and we explored how to manage data storage in a hybrid multicloud architecture.