Welcome to Scribd!

M4 - Summary

Uploaded by

0% found this document useful (0 votes)

22 views5 pages

This document summarizes a course on data lakes and data warehouses. It discusses key concepts including the role of data engineers in building data pipelines to enable stakeholders to make decisions using data. The primary advantages of doing data engineering in the cloud are the ability to separate compute and storage, use of serverless products, and not having to manage infrastructure. It also defines data lakes and data warehouses, and discusses Google Cloud options for data lakes, data warehouses, and data pipelines including Cloud Storage, BigQuery, and reference architectures.

Original Description:

M4 - Summary

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

22 views5 pages

M4 - Summary

Uploaded by

Edgar Sanchez

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 5

Search inside document

Data Lakes and Data

Warehouses Course
Summary

Let’s review some keys concepts we covered in this course on data lakes and data
warehouses.
Module summary
● Data engineers build data
pipelines

● The customers of a data

engineer are all the people who
make decisions with data

The primary role of a data engineer is to build data pipelines. The ultimate purpose of
a data pipeline is to enable stakeholders in a business to use data to make faster and
better decisions to improve their business. While the role of a data engineer is not
new, being able to build data pipelines entirely in the cloud is relatively new.
Module summary
● The three primary advantages of
doing data engineering in the
cloud are:

○ Ability to separate compute

and storage
○ Serverless products
○ Not having to manage
infrastructure

We argue that doing data engineering in the cloud is advantageous because you can
separate compute from storage, and you don’t have to worry about managing
infrastructure and even software. This allows you to spend more time on what
matters; getting insights from data.
Module summary
● Difference between a data lake
and data warehouse

● Google Cloud Storage as a data

lake solution

● Getting your data to Google

Cloud

We introduced data lakes and data warehouses and the key differences between the
two. At a high level, a data lake is a place to store unprocessed data. A data
warehouse is a place to store transformed data that you ultimately want to use for
analytics, machine learning, and dashboards. Next, we discussed Cloud Storage as
the data lake solution on GCP in some technical depth. We also presented other GCP
solutions for low-latency requirements, transactional workloads, and structured data.
Module summary
● BigQuery as a data warehouse
solution

● Differences between ETL, ELT

and EL

● Google Cloud reference

architectures for ETL, ELT and
EL

Finally, we introduced BigQuery as the data warehouse solution on Google Cloud. We

discussed partitioning and clustering in BigQuery as techniques for improving query
performance. Also, we talked about EL, ELT, and ETL and how these relate to data
lakes and warehouses. Finally, we presented some reference architectures on GCP
for streaming and batch data pipelines. The hope is that these reference architectures
serve as a starting point for your data pipeline.

Affine - Interview Questions - 20170223 v2
Document12 pages
Affine - Interview Questions - 20170223 v2
Shashank Reddy
No ratings yet
Modern Data Warehouse Proposal
Document3 pages
Modern Data Warehouse Proposal
Juan Pablo Garicoits
No ratings yet
Infrastructure And Operations Automation A Complete Guide - 2020 Edition
From Everand
Infrastructure And Operations Automation A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Saas Brochure
Document2 pages
Saas Brochure
Vivek Jaiswal
No ratings yet
Teaching Science IN The Elementary Grades (Biology and Chemistry)
Document23 pages
Teaching Science IN The Elementary Grades (Biology and Chemistry)
Eddie Mamusog Awit
78% (9)
M3 - Building A Data Warehouse
Document123 pages
M3 - Building A Data Warehouse
Edgar Sanchez
No ratings yet
M1 - Introduction To Analytics and AI
Document21 pages
M1 - Introduction To Analytics and AI
Edgar Sanchez
No ratings yet
M2 - Prebuilt ML Model APIs For Unstructured Data
Document18 pages
M2 - Prebuilt ML Model APIs For Unstructured Data
Edgar Sanchez
No ratings yet
Google Next 2017 Keybook
Document53 pages
Google Next 2017 Keybook
Rodolfo Saldias
No ratings yet
M3 - Serverless Data Processing With Cloud Dataflow
Document70 pages
M3 - Serverless Data Processing With Cloud Dataflow
Edgar Sanchez
No ratings yet
Module 5 - Deriving Insights From Unstructured Data Using Machine Learning
Document91 pages
Module 5 - Deriving Insights From Unstructured Data Using Machine Learning
Edgar Sanchez
No ratings yet
M1 - Introduction To Building Batch Data Pipelines
Document31 pages
M1 - Introduction To Building Batch Data Pipelines
Edgar Sanchez
No ratings yet
M4 - High-Throughput BigQuery and Bigtable Streaming Features
Document41 pages
M4 - High-Throughput BigQuery and Bigtable Streaming Features
Edgar Sanchez
No ratings yet
Module 2 - Product Recommendations Using Cloud SQL and Spark
Document98 pages
Module 2 - Product Recommendations Using Cloud SQL and Spark
Edgar Sanchez
No ratings yet
M4 - Manage Data Pipelines With Cloud Data Fusion and Cloud Composer
Document95 pages
M4 - Manage Data Pipelines With Cloud Data Fusion and Cloud Composer
Edgar Sanchez
No ratings yet
Business Case For EA - Iserver
Document23 pages
Business Case For EA - Iserver
Nishant Kulshrestha
No ratings yet
Fin 27
Document17 pages
Fin 27
suggestionbox
No ratings yet
Module 1 - Introduction To Google Cloud Platform
Document94 pages
Module 1 - Introduction To Google Cloud Platform
Edgar Sanchez
No ratings yet
The Right Database
Document30 pages
The Right Database
Chingore Paul
No ratings yet
M2 - Serverless Messaging With Cloud Pub - Sub
Document38 pages
M2 - Serverless Messaging With Cloud Pub - Sub
Edgar Sanchez
No ratings yet
How Google Big Query Changed The Game
Document11 pages
How Google Big Query Changed The Game
Priya Sharma
No ratings yet
Dataengineering - v2.0 - PDF - 2 - Batch Processing of Data With Spark and Hadoop On GCP - M2 - Executing Spark On Cloud Dataproc
Document67 pages
Dataengineering - v2.0 - PDF - 2 - Batch Processing of Data With Spark and Hadoop On GCP - M2 - Executing Spark On Cloud Dataproc
Edgar Sanchez
No ratings yet
Business Intelligence On The Cloud PDF
Document15 pages
Business Intelligence On The Cloud PDF
jua1991
No ratings yet
GCP Fund Module 7 Developing Deploying and Monitoring in The Cloud
Document15 pages
GCP Fund Module 7 Developing Deploying and Monitoring in The Cloud
Rayzach Arjuna
No ratings yet
Digital Transformation Concepts
Document11 pages
Digital Transformation Concepts
Tarini Suri
No ratings yet
A - Aws - G C: Loud Omparison
Document5 pages
A - Aws - G C: Loud Omparison
Mohit Kumar
No ratings yet
NIT5082 Cloud Security Lecture 9: Google Cloud Platform (GCP)
Document37 pages
NIT5082 Cloud Security Lecture 9: Google Cloud Platform (GCP)
seenu933
No ratings yet
Google Cloud Professional Architect Case-Studies
Document13 pages
Google Cloud Professional Architect Case-Studies
Alfredo Alberto
No ratings yet
This Is Cloud Data Management
Document2 pages
This Is Cloud Data Management
Rayan
No ratings yet
Cloud Data Warehouse Modernization For Powercenter
Document13 pages
Cloud Data Warehouse Modernization For Powercenter
ajmaladesh
No ratings yet
Cloud Solution Provider
Document70 pages
Cloud Solution Provider
adrian gonzalez
No ratings yet
Big Data Architectural Patterns and Best Practices On AWS Presentation
Document56 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
munnaz
No ratings yet
Google App Engine
Document14 pages
Google App Engine
PARAGJAIN000
100% (1)
C. Dropout Methods
Document8 pages
C. Dropout Methods
Adiputra Sinaga
100% (1)
Cloud Architect - How To Build Architectural Diagrams of Google Cloud Platform (GCP) - by Sphe Malgas - Medium
Document15 pages
Cloud Architect - How To Build Architectural Diagrams of Google Cloud Platform (GCP) - by Sphe Malgas - Medium
nnaemeken
No ratings yet
Best Practices in Data Modeling PDF
Document31 pages
Best Practices in Data Modeling PDF
leeanntoto1118
No ratings yet
Prep4Sure: Real Exam & Prep4Sureexam Is For Well Preparation of It Exam
Document7 pages
Prep4Sure: Real Exam & Prep4Sureexam Is For Well Preparation of It Exam
Muralidharan Krishnamoorthy
No ratings yet
Well Architected Lakehouse Workshop
Document49 pages
Well Architected Lakehouse Workshop
ekanshmmax
No ratings yet
Mittr-X-Databricks Survey-Report Final 090823
Document25 pages
Mittr-X-Databricks Survey-Report Final 090823
Grisha Karunas
No ratings yet
Cloud Management Buyer's Guide 2019
Document36 pages
Cloud Management Buyer's Guide 2019
hatemshams3394
No ratings yet
GCP Fund Module 2 Getting Started With Google Cloud Platform
Document48 pages
GCP Fund Module 2 Getting Started With Google Cloud Platform
Trần Đình Bảo
No ratings yet
Gartner and Data Pipelines
Document11 pages
Gartner and Data Pipelines
dgloyn_cox
100% (1)
Google Certkey Professional-Data-Engineer v2021-04-19 by Charlotte 93q
Document65 pages
Google Certkey Professional-Data-Engineer v2021-04-19 by Charlotte 93q
Sunil Nain
No ratings yet
PLAYBOOK - Data & AI - Migrate and Modernize Data Estate
Document5 pages
PLAYBOOK - Data & AI - Migrate and Modernize Data Estate
anh
No ratings yet
Coach IT Brochure International v3
Document17 pages
Coach IT Brochure International v3
Mario Henrique Nascimento
No ratings yet
GCP Fund Module 7 Developing, Deploying, and Monitoring in The Cloud
Document15 pages
GCP Fund Module 7 Developing, Deploying, and Monitoring in The Cloud
Nahian Chowdhury
No ratings yet
Behind The Cloud - 5 Keys To Building A True Cloud Service
Document5 pages
Behind The Cloud - 5 Keys To Building A True Cloud Service
Suresh Kumar
No ratings yet
Gartner Report 2020 - Business Intelligence Applications
Document39 pages
Gartner Report 2020 - Business Intelligence Applications
Ram Sidh
No ratings yet
GCP Fund Module 5 Containers in The Cloud
Document62 pages
GCP Fund Module 5 Containers in The Cloud
pradana fajrin riyanda
No ratings yet
Agile-Software-Development - 2020
Document26 pages
Agile-Software-Development - 2020
Shekhar Soni
No ratings yet
Google Cloud
Document22 pages
Google Cloud
Jose Luiz M Souza
No ratings yet
PWC Cloud Computing and Revenue Recognition Whitepaper PDF
Document14 pages
PWC Cloud Computing and Revenue Recognition Whitepaper PDF
wclonn
No ratings yet
Data Pipeline Essentials: See Ya Later
Document6 pages
Data Pipeline Essentials: See Ya Later
Dev
No ratings yet
Delivering Reliable Software in The Enterprise: The Complete Guide To
Document16 pages
Delivering Reliable Software in The Enterprise: The Complete Guide To
asmuchanga
No ratings yet
Oracle Cloud Marketplace
Document10 pages
Oracle Cloud Marketplace
brunohf1208
No ratings yet
Claranet - WP - Business Case For Cloud Computing
Document12 pages
Claranet - WP - Business Case For Cloud Computing
Praveen GY
No ratings yet
Big Query
Document5 pages
Big Query
Milena Zurita Campos
No ratings yet
Fundamentals of Big Data Engineering: A Guide To The
Document14 pages
Fundamentals of Big Data Engineering: A Guide To The
hemanth katkam
No ratings yet
2020-11-Apps Modernization With Google Cloud
Document15 pages
2020-11-Apps Modernization With Google Cloud
Bala Subramanyam
No ratings yet
Discriminative and Generative Models in Machine Learning
Document9 pages
Discriminative and Generative Models in Machine Learning
Eknath Dake
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Budget Laboratory - Part 2 - ISCSI Virtual SAN With FreeNAS 8 - Fix The Exchange!
Document31 pages
Budget Laboratory - Part 2 - ISCSI Virtual SAN With FreeNAS 8 - Fix The Exchange!
Edgar Sanchez
No ratings yet
Budget Laboratory - Part 2 - IsCSI Virtual SAN With FreeNAS 8 - Fix The Exchange!
Document47 pages
Budget Laboratory - Part 2 - IsCSI Virtual SAN With FreeNAS 8 - Fix The Exchange!
Edgar Sanchez
No ratings yet
Module 1 - Introduction To Google Cloud Platform
Document94 pages
Module 1 - Introduction To Google Cloud Platform
Edgar Sanchez
No ratings yet
Dataengineering - v2.0 - PDF - 2 - Batch Processing of Data With Spark and Hadoop On GCP - M2 - Executing Spark On Cloud Dataproc
Document67 pages
Dataengineering - v2.0 - PDF - 2 - Batch Processing of Data With Spark and Hadoop On GCP - M2 - Executing Spark On Cloud Dataproc
Edgar Sanchez
No ratings yet
M1 - Introduction To Data Engineering
Document65 pages
M1 - Introduction To Data Engineering
Edgar Sanchez
No ratings yet
M1 - Introduction To Building Batch Data Pipelines
Document31 pages
M1 - Introduction To Building Batch Data Pipelines
Edgar Sanchez
No ratings yet
M4 - Manage Data Pipelines With Cloud Data Fusion and Cloud Composer
Document95 pages
M4 - Manage Data Pipelines With Cloud Data Fusion and Cloud Composer
Edgar Sanchez
No ratings yet
Module 2 - Product Recommendations Using Cloud SQL and Spark
Document98 pages
Module 2 - Product Recommendations Using Cloud SQL and Spark
Edgar Sanchez
No ratings yet
Module 5 - Deriving Insights From Unstructured Data Using Machine Learning
Document91 pages
Module 5 - Deriving Insights From Unstructured Data Using Machine Learning
Edgar Sanchez
No ratings yet
M2 - Prebuilt ML Model APIs For Unstructured Data
Document18 pages
M2 - Prebuilt ML Model APIs For Unstructured Data
Edgar Sanchez
No ratings yet
Module 4 - Real-Time Dashboards With Pub - Sub, Dataflow, and Data Studio
Document69 pages
Module 4 - Real-Time Dashboards With Pub - Sub, Dataflow, and Data Studio
Edgar Sanchez
No ratings yet
M3 - Building A Data Warehouse
Document123 pages
M3 - Building A Data Warehouse
Edgar Sanchez
No ratings yet
Module 3 - Predicting Visitor Purchases With Classification Using BigQuery ML
Document112 pages
Module 3 - Predicting Visitor Purchases With Classification Using BigQuery ML
Edgar Sanchez
No ratings yet
M3 - Big Data Analytics With Cloud AI Platform Notebooks
Document18 pages
M3 - Big Data Analytics With Cloud AI Platform Notebooks
Edgar Sanchez
No ratings yet
M3 - Serverless Data Processing With Cloud Dataflow
Document70 pages
M3 - Serverless Data Processing With Cloud Dataflow
Edgar Sanchez
No ratings yet
M5 - Advanced BigQuery Functionality and Performance
Document58 pages
M5 - Advanced BigQuery Functionality and Performance
Edgar Sanchez
No ratings yet
M1 - Introduction To Analytics and AI
Document21 pages
M1 - Introduction To Analytics and AI
Edgar Sanchez
No ratings yet
Vmware Basic Support Datasheet
Document1 page
Vmware Basic Support Datasheet
Edgar Sanchez
No ratings yet
M2 - Serverless Messaging With Cloud Pub - Sub
Document38 pages
M2 - Serverless Messaging With Cloud Pub - Sub
Edgar Sanchez
No ratings yet
M4 - High-Throughput BigQuery and Bigtable Streaming Features
Document41 pages
M4 - High-Throughput BigQuery and Bigtable Streaming Features
Edgar Sanchez
No ratings yet
M1 - Introduction To Processing Streaming Data
Document16 pages
M1 - Introduction To Processing Streaming Data
Edgar Sanchez
No ratings yet
M 7
Document41 pages
M 7
Edgar Sanchez
No ratings yet
M3 - Cloud Dataflow Streaming Features
Document28 pages
M3 - Cloud Dataflow Streaming Features
Edgar Sanchez
No ratings yet
R502 MgmtCard CLI 5660-000013 RevR PDF
Document153 pages
R502 MgmtCard CLI 5660-000013 RevR PDF
Edgar Sanchez
No ratings yet
ZFS Implementation and Operations Guide: December 2016 (Edition 1.0) Fujitsu Limited
Document75 pages
ZFS Implementation and Operations Guide: December 2016 (Edition 1.0) Fujitsu Limited
Edgar Sanchez
No ratings yet
Support by Product Matrix VMware
Document2 pages
Support by Product Matrix VMware
Edgar Sanchez
No ratings yet
Nbu 76 SCL PDF
Document101 pages
Nbu 76 SCL PDF
Edgar Sanchez
No ratings yet
Manage of Ovm Server Sparc With The VM Manager, Opscenter or VDCF
Document56 pages
Manage of Ovm Server Sparc With The VM Manager, Opscenter or VDCF
Edgar Sanchez
No ratings yet
T5C Us
Document2 pages
T5C Us
Edgar Sanchez
No ratings yet
Oracle® Communications Performance Intelligence Center: Feature Guide
Document90 pages
Oracle® Communications Performance Intelligence Center: Feature Guide
Edgar Sanchez
No ratings yet
Divide and Conquer-Based 1D CNN Human Activity Rec
Document24 pages
Divide and Conquer-Based 1D CNN Human Activity Rec
Trương Vỉ Bùi
No ratings yet
Concept Note
Document2 pages
Concept Note
vikas_psy
No ratings yet
I Used To Think Now I Think Template
Document3 pages
I Used To Think Now I Think Template
api-535301747
No ratings yet
3 References
Document2 pages
3 References
Princess Fatima De Juan
No ratings yet
Notes of CH 10 Gravitation - Class 9th Science
Document19 pages
Notes of CH 10 Gravitation - Class 9th Science
Anuj
No ratings yet
KSSR Translation
Document23 pages
KSSR Translation
aran92
No ratings yet
Introduction To Arabic Egyptian Arabic For First Year Students 1689797941
Document424 pages
Introduction To Arabic Egyptian Arabic For First Year Students 1689797941
17st03
No ratings yet
Teaching Organ in An Holistic Manner Laver
Document6 pages
Teaching Organ in An Holistic Manner Laver
manuel1971
No ratings yet
Nagpandayan Es Grade 6 Project An Post
Document6 pages
Nagpandayan Es Grade 6 Project An Post
Jhun Santiago
No ratings yet
Reaction Paper
Document2 pages
Reaction Paper
Gary Tanato
No ratings yet
Arunoday
Document182 pages
Arunoday
lallilal
No ratings yet
Racy D. Caparas: National Historical Commission of The Philippines
Document7 pages
Racy D. Caparas: National Historical Commission of The Philippines
irvin2030
No ratings yet
Template Artikel Jurnal
Document4 pages
Template Artikel Jurnal
yustinus erasi rosario
No ratings yet
0 Lesson Plan 6
Document11 pages
0 Lesson Plan 6
Melania Anghelus
No ratings yet
Application For Petitioned Class: College of Engineering and Architecture
Document7 pages
Application For Petitioned Class: College of Engineering and Architecture
John A. Ceniza
No ratings yet
Learning Plan Sample
Document17 pages
Learning Plan Sample
ruclito morata
No ratings yet
Math6 q1 Mod2 SDOv2
Document24 pages
Math6 q1 Mod2 SDOv2
Yna Ford
No ratings yet
Expert Classes I Offer On
Document2 pages
Expert Classes I Offer On
LearnItLive
No ratings yet
Mak CoBAMS Postgraduate Courses Admission List 2023 2024
Document33 pages
Mak CoBAMS Postgraduate Courses Admission List 2023 2024
isaac
No ratings yet
Republic of The Philippines Department of Education Region MIMAROPA Division of Palawan
Document5 pages
Republic of The Philippines Department of Education Region MIMAROPA Division of Palawan
Vince Torremoro Cabebe
No ratings yet
PR2 - Thesis Group 5
Document15 pages
PR2 - Thesis Group 5
Aloe-Jewel Foronda
No ratings yet
Detailed Lesson Plan by Sitoy CHS 3 Block A
Document8 pages
Detailed Lesson Plan by Sitoy CHS 3 Block A
Sherwin D. Sungcalang
No ratings yet
Lecture Notes in Practical Research 2 Part 1
Document11 pages
Lecture Notes in Practical Research 2 Part 1
Charline A. Radislao
75% (8)
Plantilla of Teachers 2016
Document6 pages
Plantilla of Teachers 2016
Jhasper FLores
No ratings yet
Learning Guide Assessment 1 National University Hospital Pte LTD
Document5 pages
Learning Guide Assessment 1 National University Hospital Pte LTD
Vksathiamoorthy Krishnan
No ratings yet
9073 48105 1 PB
Document15 pages
9073 48105 1 PB
Heru Setiawan
No ratings yet
Teologi Rasional Ijtihadi Dan Dogmatis Taqlidi Penalaran Filsafat Kalam
Document22 pages
Teologi Rasional Ijtihadi Dan Dogmatis Taqlidi Penalaran Filsafat Kalam
arya kamandanu
No ratings yet
Accounting Assignment 3
Document9 pages
Accounting Assignment 3
Divyesh Bundela
No ratings yet
Certificate of Participation Imee Loren S. Abonales: Jaclyn Joyce T. Bascar Gertrudes Anastacia D. Lopez, Ed. D
Document35 pages
Certificate of Participation Imee Loren S. Abonales: Jaclyn Joyce T. Bascar Gertrudes Anastacia D. Lopez, Ed. D
Imee Loren Abonales
No ratings yet