A recommended approach: Multi-cloud strategy with Snowflake

Published on: April 21, 2021
Written by: Tridant

Multi-cloud is growing in popularity as a key enterprise architecture strategy as organisations progress their digital transformation to assure business continuity with seamless access to applications. This approach also helps avoid vendor lock-in, enabling businesses to move between different cloud environments to maximise their flexibility.

A multi-cloud strategy enables an organisation to leverage multiple cloud computing and storage vendors within a single architecture, to advance its data maturity to achieve business goals more efficiently. For example, an organisation may have its data on a Snowflake database hosted on Microsoft Azure Cloud and at the same time, also leverages the AWS Cloud and some of its data services and tools to help stream, land and ingest data into Snowflake and perform downstream data analytics and data science activities to inform growth strategies.

Determining the right cloud service for the job is no small task. CIOs need to ensure optimal user experiences and application availability whilst managing CapEx, scalability, governance, security, and other considerations. Cloud platform choices truly come to life with integration – the ability to run different applications, workloads, and business processes is critical.

Business Challenges

Tridant was approached by a large sports organisation facing several data challenges that prevented business stakeholders from getting to the information they needed to make agile, informed, and validated decisions:

  • Their existing Microsoft SQL Server, an on-premise data warehouse, was unable to provide data for decision making to the business in a timely manner.
  • The internal IT team was unable to service data requests with the desired frequency to meet the organisation’s requirements, causing a bottleneck.
  • Data from the data warehouse did not match or reconcile with their source systems.
  • There was no clear or easy way to verify if the data from the data warehouse reconciled with source system data.
  • There was a growing need to load semi-structured data (JSON, AVRO, PARQUET) from various source systems.
  • There was a growing need to refresh the data more frequently, for more real-time data, than just overnight refreshes.
  • Data governance and more specifically, the right access to the right data by the right people, was difficult to implement.

Solution Architecture

After intensive discovery sessions with the client, Tridant cloud architects recommended a multi-cloud architecture that would address their organisational requirements and critical goals, and comply with the client’s overarching vendor strategy.

The diagram below depicts the recommended multi-cloud architecture to resolve our client’s unique challenges.

This multi-cloud architecture was partly dictated by the organisation’s requirement to have MS Azure Cloud as its main cloud vendor, but also leverage AWS Cloud and some of its services and tools to achieve specific outcomes.

With Microsoft Active Directory and Microsoft Azure Active Directory used for Identity and Access Management (IAM), and Microsoft Power BI for their reporting and analytics needs, the organisation sourced data from a mix of internal and external data sources, including on-premise and cloud-based data sources. Some of the cloud data sources require an Amazon Kinesis stream and s3 bucket to send across data.

Over time, the solution can easily evolve to include other tools and technologies to achieve specific outcomes. For example, a data integration tool like Matillion may be considered if the number and complexity of data sources to be stored and analysed increase, and its orchestration needs to increase accordingly. Amazon SageMaker can also be considered to meet some of the data science outcomes.

Solution Architecture

TT15 - Multi-cloud strategy with Snowflake-Post2-1

Figure 1:  Cloud data architecture highlighting a multi-cloud approach

From the Data Landing Zone, we used a combination of Snowflake’s internal capabilities like Bulk Load, SnowPipe, Stages, Tasks to load the data into a Staging Area within Snowflake. The Data Landing Zone and the Snowflake Staging Area establishes a reconciliation process which can be both automated and exposed to a business intelligence tool like Power BI or Tableau for continuous monitoring of the state of the incoming data from the data source, the quality of that data and the health of the data load process. This report can then be used to alert stakeholders as needed.

From the Staging Area, we used data modelling techniques to model the data into a data foundation layer that will form a strong and robust foundation to all downstream data science and data analytics applications.

An important part of this solution is integration with the on-premise Active Directory, Azure Active Directory, Snowflake and any downstream data analytics applications like Power BI and Tableau, to alleviate data governance concerns faced by the organisation. We also enabled Single Sign On (SSO) so that a user logs in only onto their workstation and is automatically granted access to all approved applications for that user.

Outcomes:

Today, Snowflake provides a powerful and flexible data cloud solution for our client. It is able to ingest large amounts of data within minutes (if not seconds). We conducted a benchmark test of the time taken to load the entire historical data by the SQL Datawarehouse using Microsoft SSIS vs. the time taken by Snowflake to load the same data. Figure 2 depicts some results:

TT15 - Multi-cloud strategy with Snowflake - Michelle Susay

Figure 2: Benchmark test of data load time using SQL Server/SSIS solution on-premise data warehouse vs. a Snowflake solution

  • The client can now refresh the data warehouse multiple times during the day, as and when they require. It also enables the client to load in more data sets that can add value.
  • The IT bottleneck to service data requests has been addressed. With proper data governance controls, it was easy to assign the right access to the different stakeholders and contributors to the Snowflake data, using Data Analytics tools and/or direct access. This is recommended only after adequate training. Unlike its previous on-premise solution, Snowflake provides the ability to load and store semi-structured (JSON, AVRO, PARQUET) data. Snowflake allows users to store semi-structured data in its native format for analysis without transformation. Once the business understands this semi-structured data better, they can then determine whether to convert the data to structured data and model it.

Word of caution

A multi-cloud strategy has complex logistical and operational considerations, with inherent challenges that need to be understood and worked through:

  • Cloud vendors charge egress charges for taking data out of a cloud and moving it to another vendor’s cloud.
  • Leveraging multiple cloud vendors increases security concerns, especially in ensuring security in a heterogenous environment.
  • A multi-cloud strategy needs to be adopted after taking a broad look at the data strategy, to ensure that cloud vendors are chosen to achieve specific business outcomes.

Conclusions

  • A multi-cloud approach not only enables organisation to avoid a vendor bias and lock-in. it also allows the organisation to cherry pick solutions that each of the cloud technologies offer, to suit their unique business requirements
  • Snowflake data cloud allows an organisation to remain cloud vendor agnostic as it can be hosted on AWS, Azure and Google Cloud platforms.
  • While widely documented benefits outweigh the cons, an organisation must always assess the challenges of a multi-cloud environment, including managing costs and pathways to mitigate security risks.
  • Data governance should form the basis of any data application, be it on-premise or cloud. Modern cloud data tools like Snowflake allow for a simple code-based integration process with Active Directory, ADFS, OKTA and the steps are well documented on the Snowflake documentation site. See here.
  • A data cloud application like Snowflake empowers organisations to ingest both structured and semi-structured data, create a data foundation layer in rapid agile cycles, and offers a clear process for access control. It is a user-friendly self-service solution, placing data into the hands of approved users and stakeholders who have been provided with the necessary skills and confidence to use that data effectively.
  • Snowflake also enables teams and stakeholders to load large amounts of data quickly. This capability enables the loading of business data multiple times during a day, to suit stakeholder demands for updated or real-time data.

Prioritising business resilience, productivity, and agility

The value of a single vendor-specific cloud architecture is in question, particularly as contracts, costs, and capacity issues struggle to keep pace with the flexibility embedded in a multi-cloud strategy.

As organisations assess their cloud architecture to optimally manage cost, performance and scalability, multi-cloud will continue to gain momentum.

Assure uninterrupted availability, mitigate risk, and optimise security in your cloud strategy. Talk with Tridant cloud platform architects today.

Oswald Almeida | Michelle Susay

Privacy Policy
chevron-down