Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Roadmap to become a

DATA ENGINEER
2023
January

Basics of Programming

Operators, Variables & Conditional Statements &


Data Types in Python Looping Constructs

Data Structures in Python Writing custom Functions

Standard Libraries in Python Regular Expressions


February

Fundamentals of Computing

Shell Scripting Working with APIs

Data Structures in Python Git and GitHub


March

Relational Databases

Basic Querying in SQL Keys in SQL

Joins in SQL Subqueries in SQL

Window Functions Normalisation


April

Cloud Computing
Fundamentals with AWS

Basics of AWS IAM users and IAM Roles

AWS EC2 Lambda Functions on AWS

AWS S3 API gateway

AWS VPC AWS RDS and Aurora


May

Data Processing with


Apache Spark

Spark architecture RDDs in Spark

Working with Spark Understand Spark


Dataframes Execution

Broadcast and Spark SQL


Accumulators
June

Fundamentals of Computing

Overview of the Hadoop Understand MapReduce


Ecosystem architecture

Understand the working Work with Hadoop on


of YARN the cloud with AWS EMR
July

Data Warehousing with Apache Hive

Hive Query Language Managed vs External tables

Partitioning and Bucketing Types of File formats

SerDes in Hive
August

Ingesting streaming data


with Apache Kafka

Learn Kafka architecture Learn about Producers


and Consumers

Create topics in Kafka Ingest streaming data on


cloud with AWS Kinesis
September

Process streaming data with


Spark Streaming

DStreams Stateless vs Stateful


transformations

Checkpointing Structured Streaming


October

Advanced Programming

OOPs concepts Understand Recursion


functions

Unit testing Integration testing


November

NoSQL

CAP theorem Documents and Collections

CRUD operations Different types of operators

Aggregation Pipeline Sharding and Replication


December

Workflow Scheduling

DAGs Task dependencies

Operators Scheduling

Branching

You might also like