Professional Documents
Culture Documents
Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12
Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12
Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12
● Projects: Top-level containers in the Google Cloud Platform that store the data
● Datasets: Within projects, datasets hold one or more tables of data
● Tables: Within datasets, tables are row-column structures that hold actual data
● Jobs: The tasks you are performing on the data, such as running queries, loading data,
and exporting data
Example: BigQuery, Datasets, and Tables
● Here is an example of the left-pane
navigation within BigQuery
● Project are identified by the project name, for
example ‘bigquery-public-data’
● You can expand projects to see the
corresponding datasets, for example ‘github-
repos’
● Tables are referenced by their project and
dataset as: <project>.<dataset>.<table>
○ for example ‘bigquery-public-
data.github_repos.contents’
Accessing BigQuery
● Web UI (bigquery.cloud.google.com)
● console/command line (gcloud)
● Third party Tools
○ Tableau
○ QlikView
○ R
○ Excel
○ …
● Restful API
Restful API
Method HTTP Request
For Dataset
Restful API
Method HTTP Request
For Jobs
BigQuery Architecture: Dremel
● Data model/Storage
● Query execution
Data model/Storage
● Columnar Storage
● Nested/Repeated Fields
● No indexing => Single full table
scan from disk
BOOK 1:
AUTHOR: Dumas
TITLE: The Three Musketeers
PRICE:
DISCOUNT: 0
USD: 20
EUR: 19
BOOK 2:
AUTHOR: Yrsa Sigurdardottir
AUTHOR: Tina Flecken
AUTHOR: Elma Klein
TITLE: Feuernacht
BOOK 3:
TITLE: Get Fit, Stay Fit
PRICE:
DISCOUNT: 0
EUR: 12
PRICE:
DISCOUNT: 1
EUR: 11
Columnar Representation
AUTHOR PRICE.EU
Dumas (0, 1) 19 (0, 2)
Yrsa Sigurdardottir (0, PRICE.DISCOUNT NULL (0, 0)
1) 0 (0, 2) 12 (0, 2)
Tina Flecken (1, 1) NULL (0, 0) 11 (1, 2)
Elma Klein (1, 1) 0 (0, 2)
NULL (0, 0) 1 (1, 2)
PRICE.USD
TITLE 20 (0, 2)
The Three Musketeers (0,1) NULL (0, 0)
Feuernacht (0, 1) NULL (0, 1)
Get Fit, Stay Fit (0, 1) NULL (1, 1)
BOOK 1: R D
AUTHOR: Dumas AUTHOR 0 1 R = In the
TITLE: The Three Musketeers TITLE 0 1
PRICE: path to the
DISCOUNT: 0 PRICE.DISCOUNT 0 2
USD: 20 PRICE.USD 0 2 field, what
BOOK 2:
EUR: 19 PRICE.EUR 0 2 is the last
AUTHOR: Yrsa Sigurdardottir AUTHOR 0 1 repeated
AUTHOR: Tina Flecken AUTHOR[1] 1 1
AUTHOR: Elma Klein AUTHOR[2] 1 1 field ?
TITLE: Feuernacht TITLE 0 1
(PRICE)
(DISCOUNT): NULL (PRICE).(DISCOUNT) 0 0
(EUR): NULL (PRICE).(EUR) 0 0
(USD): NULL (PRICE).(USD) 0 0 D = In the
BOOK 3:
(AUTHOR): NULL (AUTHOR) 0 0
path to the
TITLE: Get Fit, Stay Fit TITLE 0 1 field, how
PRICE:
DISCOUNT: 0 PRICE.DISCOUNT 0 2 many
EUR: 12 PRICE.EUR 0 2
(USD): NULL PRICE.(USD) 0 1
defined
PRICE: fields ?
DISCOUNT: 1 PRICE[1].DISCOUNT 1 2
EUR: 11 PRICE[1].EUR 1 2
(USD): NULL PRICE[1].(USD) 1 1
Query execution
● Tree architecture
● Using about tens thousands
machines over Google’s petabit
network (+1Petabits/s)
DEMO
References
● https://www.oreilly.com/library/view/google-bigquery-the/9781492044451/
● https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● https://cloud.google.com/bigquery/docs/
● https://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_tue_1415_Ryan
Boyd.pdf