What is Azure Databricks?
Azure Databricks is an analytics and data engineering platform based on Apache Spark that runs as a managed service in Microsoft Azure. It is used to process large data volumes, build lakehouses and develop machine learning models, and is tightly integrated with Azure storage and services.
Also known as: Databricks · Databricks on Azure · Spark platform · lakehouse platform
Where Azure Databricks is used
Azure Databricks provides scalable compute clusters that can process even very large data volumes. Through the open table format Delta Lake, lakehouses are built with transactional safety and good performance. Within it, teams develop data pipelines (ETL/ELT), prepare data along a medallion architecture and train machine learning models.
As a managed service in Azure, Databricks is tightly connected with Azure storage, security and identity services and scales compute on demand. The refined data is frequently analyzed in Power BI.
A practical example
In the dy Project AG data platform, a large construction project worth over 1 billion CHF, Azure Databricks served as the central processing platform. Data from SQL Server, Excel and REST APIs was integrated there and refined along a medallion architecture (bronze, silver, gold) before being available as a validated basis for Power BI reporting.
How it relates & how smiit uses it
Azure Databricks is a powerful processing and lakehouse platform, whereas Microsoft Fabric is a broader, integrated analytics offering; both can be combined. Databricks is not the reporting tool itself but provides the refined data that Power BI, for example, visualizes via a semantic model. ETL/ELT, the medallion architecture and data modeling are put into practice in Databricks. smiit uses Azure Databricks when large data volumes, demanding transformations or machine learning require a powerful, scalable platform.
Common mistakes & misconceptions
- Azure Databricks is not just hosted Spark; it is a lakehouse platform with Delta Lake, collaborative notebooks and integrated governance.
- Many believe Databricks is only for data scientists. It equally serves data engineering, ETL/ELT and analytics over structured and unstructured data.
- A common error is to assume clusters must run permanently. Without auto-termination and right-sizing, costs quickly become unnecessarily high.
Frequently asked questions
What is the difference between Azure Databricks and Microsoft Fabric?
Azure Databricks is specialized in powerful data engineering, large data volumes and data science. Microsoft Fabric is a broader, integrated platform with tight Power BI integration. Both use lakehouse concepts and can be combined.
Do you need programming skills for Azure Databricks?
For demanding pipelines, knowledge of languages such as Python, SQL or Scala is helpful. smiit contributes this expertise so companies can use the platform without having to build deep Spark know-how themselves.
What is Delta Lake in the context of Azure Databricks?
Delta Lake is an open table format that gives a lakehouse transactional safety, versioning and good query performance. It forms the storage foundation on which reliable data pipelines and a medallion architecture are built in Databricks.
How does scaling in Azure Databricks affect costs?
The compute clusters scale up and down on demand, so only the compute time actually used is billed. Clusters that shut down automatically when idle, together with appropriately sized clusters, are the main levers for keeping costs controllable.
Related terms
Sources & further reading
Want to put this topic to work in your company?
Updated · Back to the glossary