Using Microsoft Databricks to Facilitate a Modern Data Architecture

Featured in: /
Published on: 3 August 2022
Written by: Bhavna Gupta

With Microsoft Databricks gaining popularity in the data community, it is worth looking at how Azure Databricks and Delta Lake integrate with other Azure services such as Azure Data Factory and Azure Data Lake and hence, facilitate a modern, scalable, flexible and cost-effective Data and Analytics architecture.

Modern Analytics Architecture

From the above Architecture, the data flows as follows:

1. Data that needs to be extracted can be arriving in two ways:

  • The continuous flow of data (Streaming Data): for example, data generated from sensors and IOT (Internet of Things).
  • Data that arrives in batches (Batch Data): for example, data generated from media, logs, files, custom applications, etc.

2. Raw streaming data is ingested to Azure Databricks using Azure Event Hubs.

3. Raw batch data is ingested to Azure Data Lake using Azure Data Factory (ADF) pipelines. These data loads can be automated and scheduled using ADF pipelines.

4. The ADF pipelines are also used to schedule and trigger Azure Databricks notebooks. Hence, once the data lands in Azure Data Lake (ADL), Azure Databricks notebook:

  • reads and transfers the raw data from ADL to Bronze Delta Lake format,
  • cleans, filters, transforms the data from Bronze Delta Lake format and loads it to Silver Delta Lake format,
  • finally, aggregates the data from Silver Delta Lake format and loads it into Gold Delta Lake format.

5. The aggregated data is then delivered to Analytical tools like PowerBI for analysis purposes and further, generating dashboards and reports.

6. During this whole process, Azure Data Monitor and Governance services can be leveraged to meet the following purposes:

  • Azure Key Vault: to manage secrets, keys, and certificates.
  • Azure Active Directory: to provide single sign-on (SSO) for Azure Databricks Users and to automate user provisioning tasks like managing user access and associated privileges. 
  • Azure Monitor: to identify problems and maximize performance and reliability across the Azure services being used.
  • Azure Cost Management and Billing: to financially govern Azure workloads.

Organisations in the following industries can benefit from the above solution:

  • Energy sector
  • Retail and e-commerce
  • Banking and Finance
  • Medicine and Healthcare
  • Insurance.

Databricks Integration

Azure Databricks integrates well with other Azure data ecosystem tools. It is a flexible solution that allows the targeting of any use case involving analysing data in large volumes that has a variety of data coming from several data sources in different formats and forms, combining the data for data analysis purposes.

Contact us if you would like to discuss how the implementation of Microsoft Databricks can facilitate a modern data and analytics architecture in your business.

Related Articles

Copyright © Tridant Pty Ltd.

Privacy Policy