Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

FPT SOFTWARE Enrich Role-based Skills Program

Data Engineer Advance


Enrich Role-based Skills Training Program

1. Introduction
Data engineers work in a variety of settings to build systems that collect, manage, and convert raw data into usable
information for data scientists and business analysts.

2. Objectives 4. Program Information

Learn advance advanced knowledge, methods, and  Duration: 2.5-03 months


techniques required for engineering and managing  Skills: 65% courses and 35% projects
data and database. Dive deeper into advanced data  Time: 7~10 hours/week
integration features available in the clouds.
 Approximate hours: 71 hours
 Entry Level: Intermediate
Learners in this program will:
 Achievement Level: Advanced
- Learn a wide range of OSS big data tools for
various stages of big data projects 5. Prerequisites
- Learn advance technique in Spark with hand-
on examples. Prerequisites of foundation learner knowledge, skills
- Build a full data pipeline in the cloud with and practices are included knowledge, skills and
Google Cloud Platform or Azure. practices are required:
- Learn to preprocess, wrangle and visualize  Fundamental of Scala
data for practical data science application in  Fundamental of Database Technology
Python.
 Fundamental of Cloud platform
- Learn emergence data processing techniques:
 Docker
Streaming, IoT data…

3. Skills Take-Away 6. Required Technology Stack

 Big data tools/solutions  Scala


 Azure/Google cloud
 Advanced Spark
 Docker
 Build Data Management Platform on Cloud
 Data Architecture
7. Mentor’s Contact Points

 Name: Nguyen Dang Duy


 Mobile: +84 988997033
 Email: duynd4@fsoft.com.vn

CTC - Corporation of Training Center Syllabus Template


FPT SOFTWARE Enrich Role-based Skills Program

8. Training Syllabus

Program is providing work-in-progress courses and sections from the skills of the product owner role includes but is not
limited to the following topics:

No. Role-based Skills Hours Vendor Course Name & Uri Lessons
Filter Section 1 21 Big data landscape & Advanced Spark and
Python
1 Big data foundation 4 Udemy A foundation course for big data that covers All
the big data tools for various stages of a big
data project.
- Big data characteristics
- Big data storage
- Big data ingestion
- Big data analytics
- Big data visualization, security and
vendors
- Big data project
2 Introduction to Apache 2 Udemy Apache NiFi - An Introductory Course to Learn All
NiFi | Cloudera Installation, Basic Concepts and Efficient
DataFlow Streaming of Big Data Flows
- Introduction to NiFi and first concept
- Getting started with NiFi
- Apache NiFi in depth
- Annexes

3 Taming Big Data with 7 Udemy Apache Spark tutorial with 20+ hands-on All
Apache Spark and examples of analyzing large data sets on your
Python - Hands On! desktop or on Hadoop with Python!
- Getting started with Spark
- Spark basic and RDD Interface
- Spark SQL, DataFrame and Dataset
- Running Spark on a Cluster
- Machine learning with Spark ML
- Spark Streaming, Structured
Streaming and GraphX

4 Mock Project 1 8 Self-paced Using NiFi to ingest data from many sources
and process with Spark
II Section 2 29~33 Cloud (Learner select Azure or GCP)

1 GCP - Google Cloud 17 Udemy Learn Google Cloud Professional Data All
Professional Data Engineer Certification with 80+ Hands-on
Engineer Certification. demo on storage, Database, ML GCP Services
- Grasp Basic Data Engineering &
Database Concept

CTC - Corporation of Training Center Syllabus Template


FPT SOFTWARE Enrich Role-based Skills Program

- Provision Basic GCP infrastructure


services - VM, Container, GKE, GAE,
Cloud Run
- Learn Various Storage Product like
Cloud Storage, Disk, Filestore for
Unstructured Data
- Structure Data Solution - SQL,
Spanner, BigQuery
- Store massive semi structure data in
BigTable, Datastore
- Deploy Data Pipeline on different
Data Processing Product - DataFlow
(Apache Beam), DataProc
(Hadoop/Spark), Data Fusion,
Composer(Airflow)
- Cleanse, Wrangle & Prepare Your
Data with DataPrep
- Machine Learning Basics & its GCP
Solution Product
- Search Data Asset from Data catalog
- Visualize Data by creating Reports &
Dashboard with Google Data Studio
- Use Prebuilt ML API (Vision,
Language, Speech) in your application
- Apply Auto ML on your own data to
build model
- Build Machine Learning Custom
Model with Notebook , Scikit Learn
Library
- Deploy Scikit-learn Model, Tensorflow
as endpoint for prediction
- Detect Sensitive PII data with Data
Loss Prevention (DLP) API
- Store process and analyse your
petabyte-scale data with Google data
warehousing solution cloud bigquery
- Decouple application with
asynchronous communication Google
cloud pub sub
- Store data inside the memory for
faster access with memory Store redis
database
2 Data Engineering on 23 Udemy learn the various Azure services that pertain to All
Microsoft Azure 2021 Data Engineering

- What is the purpose of an Azure Data


Lake Gen 2 storage account
- Basics on Transact-SQL commands
- How to work with Azure Synapse. This
will include building a data warehouse

CTC - Corporation of Training Center Syllabus Template


FPT SOFTWARE Enrich Role-based Skills Program

into a dedicated SQL Pool.


- How to build an ETL pipeline with the
help of Azure Data Factory. There will
be various scenarios on how to create
mapping data flows.
- How to stream data with the use of
Azure Stream Analytics. You can see
how SQL commands can be used for
your streaming data.
- Basics on the Scala programming
language, and SPARK
- How to work with SPARK, Scala in
Azure Databricks. We will see how to
work with Notebooks. We will also see
how to stream data into Azure
Databricks.
- The different security measures and
monitoring aspects to consider when
working with Azure services
3 Mock Project 2 12 Self-paced Build full data pipeline on GCP/Azure

III Section 3 21 Data Wrangling, Data Visualization and


Process time series data
1 Complete Data 5 Udemy Learn to Preprocess, Wrangle and Visualize All
Wrangling & Data Data For Practical Data Science Applications in
Visualization With Python
Python - Read in data from different sources
with Pandas
- Data Cleaning
- Data wrangling
- Feature selection and transformation
- Theory behind data visualization
- Common data visualizations
2 An Introduction to 1 Udemy - Basic algorithms used in machine All
Machine Learning for learning.
Data Engineers - Real world models are built using
Python.
3 Time Series Analysis in 7 Udemy Time Series Analysis in Python: Theory, All
Python Modeling: AR to SARIMAX, Vector Models,
GARCH, Auto ARIMA, Forecasting.
- Differentiate between time series data
and cross-sectional data.
- Understand the fundamental
assumptions of time series data and
how to take advantage of them.
- Transforming a data set into a time-
series.
- Start coding in Python and learn how
to use it for statistical analysis.
CTC - Corporation of Training Center Syllabus Template
FPT SOFTWARE Enrich Role-based Skills Program

- Comprehend the need to normalize


data when comparing different time
series.
- Encounter special types of time series
like White Noise and Random Walks.
- Learn about "autocorrelation" and
how to account for it.
- Learn about accounting for
"unexpected shocks" via moving
averages.
3 Mock Project 3 8 Self-paced Wrangling, cleansing, process and visualize
time series data
Total hours 72

9. Mock Projects

In mock project, you’ll define the specifications of scenario, environment to setup in Azure cloud platform

No. Mock Project Technical Stack Description

I Mock Project 1 MongoDB Using NiFi to ingest data from many sources
Cassandra (MongoDB, social network, CMS DB...) and
NiFi process with Spark framework and store on
Spark Azure Data Lake.
Python
II Mock Project 2 Azure/ Build full data pipeline on GCP/Azure.
Google Cloud Platform - Extract from multiple data sources.
Spark - Transform/Process the data using
Scala/Python cloud services
- Load and store on cloud DWH.
- Visualize using cloud visualization
tools.

III Mock Project 3 Python Apply the process of data wrangling,


cleansing, process, apply ML and visualize
data for full life cycle data engineering
project.

Notes: ___________________________________________________________________________________________________________________
__________________________________________________________________________________________________________________________
__________________________________________________________________________________________________________________________

CTC - Corporation of Training Center Syllabus Template

You might also like