Professional Documents
Culture Documents
Unified Data Service: Databricks Runtime
Unified Data Service: Databricks Runtime
Unified Data Service: Databricks Runtime
The Databricks Unified Service is the engine powering the work practitioners do in the
Data Science Workspace. The components that make up the UDS include the Databricks
Runtime, Delta Lake, and Databricks Ingest.
Databricks Runtime
The Databricks Runtime is the actual processing engine of the UDAP. It’s built on an
optimized version of Apache Spark that has been programmed to run faster than
open-source Apache Spark. It runs more reliably in the UDAP than in any other
environment.
The Databricks Runtime runs on auto-scaling infrastructure. In other words, it “knows”
when it needs more processing power, and can get that processing power on its own,
without waiting for human involvement. It also “knows” when to scale back down and even
terminate when it detects that it’s not being used, which is important in reducing
computational resources and therefore, money spent.
Delta Lake
Organizations that use Databricks use a data lake as their long-term data storage solution.
Delta Lake adds intelligence, via a transaction log, to your data lake. This means that in
addition to getting the usual functionality of a data lake for data storage, you get additional
functionality that helps prepare data for machine learning and data science workflows,
plus added performance. These benefits include:
● Delta reliability - Delta Lake helps organizations clean up the raw data kept in
data lakes to ensure that data practitioners using data for analytics only use
high-quality data. It also maintains historical versions of data so that you can
see how data has changed over time.
● Easier data management - Delta Lake includes ways for organizations to
delete or modify data records which is important for compliance purposes.
● Connections to visualization tools - data analysts can work directly in their
preferred visualization tools by connecting directly to Delta Lake. This way,
they can take advantage of the UDS compute power in tools that they
already work with.
● Faster performance - Delta Lake is faster than most typical storage formats.
Databricks Ingest
Databricks Ingest gives organizations the ability to bring data together for use in analytics
- from their own data stores, as well as from third parties (see the Data Ingestion Network
partners in the image below) by incrementally ingesting real-time data into their Delta
Lakes (data lakes built with Delta Lake) from a variety of data sources including:
● Applications like Salesforce, Marketo, Zendesk, SAP, Google Analytics
● Databases like Kafka, Cassandra, Oracle, MySQL, MongoDB
● File storage systems like Amazon S3, Azure Data Lake Storage, Google Cloud
Storage