Unified Data Service: Databricks Runtime

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Unified Data Service 

 
The Databricks Unified Service is the engine powering the work practitioners do in the 
Data Science Workspace. The components that make up the UDS include the Databricks 
Runtime, Delta Lake, and Databricks Ingest.  
 
Databricks Runtime 
The Databricks Runtime is the actual processing engine of the UDAP. It’s built on an 
optimized version of Apache Spark that has been programmed to run faster than 
open-source Apache Spark. It runs more reliably in the UDAP than in any other 
environment.  
 
The Databricks Runtime runs on auto-scaling infrastructure. In other words, it “knows” 
when it needs more processing power, and can get that processing power on its own, 
without waiting for human involvement. It also “knows” when to scale back down and even 
terminate when it detects that it’s not being used, which is important in reducing 
computational resources and therefore, money spent.  
 
Delta Lake 
Organizations that use Databricks use a data lake as their long-term data storage solution. 
Delta Lake adds intelligence, via a transaction log, to your data lake. This means that in 
addition to getting the usual functionality of a data lake for data storage, you get additional 
functionality that helps prepare data for machine learning and data science workflows, 
plus added performance. These benefits include:  
● Delta reliability - Delta Lake helps organizations clean up the raw data kept in 
data lakes to ensure that data practitioners using data for analytics only use 
high-quality data. It also maintains historical versions of data so that you can 
see how data has changed over time.  
● Easier data management - Delta Lake includes ways for organizations to 
delete or modify data records which is important for compliance purposes.  
● Connections to visualization tools - data analysts can work directly in their 
preferred visualization tools by connecting directly to Delta Lake. This way, 
they can take advantage of the UDS compute power in tools that they 
already work with.  
● Faster performance - Delta Lake is faster than most typical storage formats.  
 
 
Databricks Ingest 
Databricks Ingest gives organizations the ability to bring data together for use in analytics 
- from their own data stores, as well as from third parties (see the Data Ingestion Network 
partners in the image below) by incrementally ingesting real-time data into their Delta 
Lakes (data lakes built with Delta Lake) from a variety of data sources including:  
● Applications like Salesforce, Marketo, Zendesk, SAP, Google Analytics 
● Databases like Kafka, Cassandra, Oracle, MySQL, MongoDB 
● File storage systems like Amazon S3, Azure Data Lake Storage, Google Cloud 
Storage 

You might also like