The need for a Hybrid cloud approach for Data Platforms

In this brief write we discuss why a Hybrid cloud approach for hosting data platforms is beneficial compared to a pure on-prem or cloud only play

Data is the fuel that drives Digital Transformation. Access to Data and the insights that can be gleaned from it provides the necessary edge for organizations to compete, thrive and innovate. Data by itself does not provide the cutting edge. The ability to analyze data in a timely and cost-effective manner is the key.

A self-service data platform (a requirement in Data Mesh architecture) is a key component to empower organization and their employees to perform such data analysis. As adoption of a self-service platform grows it will result in greater need for increased storage and compute. Traditional Datawarehouse’s struggle to keep up with such elastic demand.

Traditional Datawarehouse’s used to be an on-prem play. Teradata, VMWare Tanzu Greenplum, Exadata are different on-prem implementations. Massive implementations of the same exist even today in big organizations. The traditional challenges with the on-prem model are all well-known (Inability to scale quickly, expensive footprint, datacenter challenges, upfront capital expenditure, resources to maintain the platforms etc.)

Over the past few years Cloud data platforms have been gaining a lot of momentum. Ability to deploy quickly, scale compute and storage independently, consumption-based billing, fully managed platforms are some of the challenges that are overcome with cloud implementation.

While all these are certainly advantageous, the consumption-based billing is a nightmare for an organization. Not knowing how much opex expense would hit an organization for a month or a quarter is unacceptable. A bad or unoptimized query run by an employee sifts through terabytes of data or uses a large amount of compute will result in high costs.

Organizations which adopt a mix of on-prem and Cloud data platforms will be able to reap the benefits of both and have control over their costs. The approach would be to run majority of their workloads on-prem with the ability to burst to the cloud when required. This way when an organization is reaching capacity, they can leverage the cloud to run their workloads and over time, work towards expanding the platforms on-prem capacity. This approach of a Hybrid cloud also helps organizations leverage existing on-prem Datawarehouse’s and scale to the cloud. Finally, this approach also helps in maintaining the on-prem non-prod environments.

A key requirement for a hybrid approach is having the same vendor provided data platform on-prem and on the cloud. Running one specific platform on-prem and moving to another vendor platform on the cloud will require separate code bases, maintaining resources with multiple skill sets, converting file formats etc. Though there are ways by which you can architect your solution (use common file formats with spark as the processing engine in the cloud) to get over this it is not a seamless process.

Vendors who offer on -prem and cloud versions of their software will be the ones that will have an advantage. A few such examples are Teradata, VMWare Tanzu Greenplum, Oracle Exadata, Yellow Brick etc.

A key challenge that needs to be worked upon is the accessibility of on-prem data in the cloud. Network bandwidth between on-prem and cloud continues to be a challenge. However, moving key datasets to the cloud on an incremental/daily basis can be one option. The data moved to the cloud could also serve as a backup in case of disaster recover.

In short, relying completely on an on-prem or completely on cloud platform comes with the challenges discussed above. A hybrid approach provides the best of both worlds.