Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 7

Project Overview

Project Vector at Google is the first large data management deployment on Google Cloud
Platform (GCP) and also one of the largest Salesforce implementations for Deloitte.

Key Data Impact Indicators

 80 million Leads and Campaign members.


 12 million Accounts.
 Over all 180+ million records processed using GCP.

Note:- The above metrics is after data clean-up/Mastering.

Applications:-
Informatica MDM, Dell Boomi, GCP and Salesforce.

Copyright © 2018 Deloitte Development LLC. All rights reserved.


1
Legacy Architecture design

Copyright © 2018 Deloitte Development LLC. All rights reserved.


2
Implemented Architecture

Copyright © 2018 Deloitte Development LLC. All rights reserved.


3
Google Compute engine (VM):-
Virtual machine used to access different servers/applications across organization
Comes with a pre installed gsutil

gsutil:-
Command line tool used to move files to and from google storage.

Frequently used commands


Gsutil cp / Gsutil cp –r / Gsutil –m – cp –r
Gsutil –ls
Gsutil rb (remove bucket, but it must be empty)
Gsutil rm
Gsutil rsync / gsutil rsync –r –d / gsutil rsync –r

Note:- If you have a large number of files to move/copy you might want to use the gsutil -m option, to perform a multi-threaded/multi-processing.

Google DataPrep :-
Google Cloud Dataprep is a managed cloud service for quick data exploration and transformation. Dataprep makes it easy to clean and transform large
datasets for analysis. 

Can import data only from BigQuery and Google Storage.

Google Storage :-
Can store any type of data and any size.

Copyright © 2018 Deloitte Development LLC. All rights reserved.


4
BigQuery:-
Enterprise data warehouse which can query gigabits of data in sec.

Points to note:-

 Batch inserts are free, but streaming inserts incur extra changes which is currently $0.05 per GB sent.
 BigQuery queries’ costs depend on the amount of data scanned, not the data retrieved.
 Best suited for ELT.
 Select what you need.
 Where clause String comparation? Avoid lower and upper for case insensitive filters and use regex_match()
 Order by clause is the most expensive so think twice before use.

Copyright © 2018 Deloitte Development LLC. All rights reserved.


5
Copyright © 2018 Deloitte Development LLC. All rights reserved.
6
Copyright © 2018 Deloitte Development LLC. All rights reserved.
7

You might also like