AWS Project1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Apexon – AWS training Project 1

Project Statement:
Data Pipeline with Glue and Redshift:

Challenge:
Build a simple ETL pipeline using Glue crawlers and jobs to
Extract data from a CSV file (e.g., customer data) stored in S3,
Transform it (e.g., filter, format), and
Load it into a Redshift table.

Step1:
1. Creating a S3 bucket.
2. Creating Redshift Cluster.
3. IAM role Creation.
4. AWS Glue – Database, Table, Crawler and ETL job Creation.
5. Integrate all services using Visual ETL.

Step 1: Creating a S3 Bucket:

Created a S3 bucket with name “apxawsprojectbucket”


Added one more folder name “Customer-Data” and added a csv file “customerdata.csv” to that file.
Step 2: Setting up things on Redshift side:

Open Amazon Redshift service

Created a workgroup and namespace which are needed for AWS glue to connect to
Redshift:

Step 3: IAM role creation.

Now we are creating a IAM role to make sure AWS Glue is having necessary permissions to access S3
buckets and Redshift. Since we need access to S3 and Redshift as well , so to do this I am taking
“AWSGlueConsoleFullAccess” Policy.
Crea%ng a database, Table and Glue Crawler to discover the schema of CSV file in S3.
Also, we must Specify the S3 location and IAM role created earlier.

• Creating a database in AWS Glue.

• Creating a crawler in AWS Glue so that this will help us to fetch the data stored in AWS
S3 bucket.

• Up on running the crawler, we can see one table us create from the database we created
above.

1. In Running Status:
2. Once Crawler is done, we have to chek if table is created or not:

• Now in tables we can see Schema has been automatically by glue itself.

Connection Creation: This connection is very important from which we can allow AWS Glue to
access Amazon Redshift.
Look for AWS Redshift in Choose data Source page.

In this step we must provide the workgroup details that we have created earlier. Upon selecting
this AWS have capacity to identify the Database name and username. We must provide the
password.
Next step we must review the details and continue creating the connection.

We must test the connection:

To do this we have to select an IAM role.


Test connection is in progress:

Successful connection:
Now we must focus on the target storage system which is in our case “Redshift”.
Clicking on Query editor v2 option on the left nav bar as shown:

Creating a Table in AWS Redshift using Query editor so that it will match with the column data
type present in s3.

Cross checking if the table is created properly or not:


Creating a AWS Glue Job:

Select the source as S3:


Transformation: Dropping one column
Job run status:

Once done:

Once Job is done, we can see the data loaded to redshift table as below

You might also like