Professional Documents
Culture Documents
Cloud Based Developer - SantoshKedar (4y - 0m)
Cloud Based Developer - SantoshKedar (4y - 0m)
Cloud Based Developer - SantoshKedar (4y - 0m)
Career Objective:
Professional Summary:
Dedicated Big Data Engineering professional with 4 yrs. of experience and proficient in Spark,
PySpark, Snowflake, AWS services, Python, and SQL with a strong focus on processing large data
volumes. Skilled in building optimized big data pipelines with AWS Step Functions and processing
structured, semi-structured and unstructured data using AWS Glue with various input data formats,
such as CSV and JSON. Hands-on experience with AWS services, including AWS S3, Glue,
Lambda, CloudWatch, SQS and IAM roles. Also previously worked as Business Process Lead in
SAP deployment project (Mfg. Module-Healthcare) and Mfg. Ops. Dept.
Professional Skills:
Certification:
Job Summary:
➢ Designed and implemented ETL processes to extract, transform, and load data from diverse
sources into a centralized data warehouse.
➢ Automated data ingestion and transformation processes, reducing manual data handling, leading
to improved data quality and operational efficiency.
➢ Assisted in the development of ETL pipelines and data warehousing solutions, gaining hands-on
experience with tools and technology such as Python, SQL, PySpark, Terraform, SnowFlake and
AWS Glue, Lambda, S3 buckets, CloudWatch, SQS, IAM roles.
➢ Supported the team in troubleshooting and resolving data-related issues and contributed to on-
call support as needed.
Project Summary:
➢ The primary objective of this project is to create a seamless and efficient workflow where data
is continuously updated in the Snowflake data warehouse, eliminating the need for manual
interventions and ensuring the availability of up-to-date data for analysis and reporting.
➢ Project focuses on establishing a streamlined data pipeline from raw JSON data that stored
finally in an AWS S3 bucket.
➢ It utilizes an AWS Glue ETL job written in PySpark to transform the data and store it in a
designated S3 landing bucket.
➢ The transformed data is seamlessly integrated with Snowflake, a cloud-based data warehousing
platform, to ensure high performance and scalability.
➢ To maintain real-time data synchronization, an SQS (Simple Queue Service) queue is set up to
receive notifications when new data files arrive.
➢ These SQS notifications trigger Snowpipe, a native Snowflake feature, to automate the data
loading process into the Snowflake table.
➢ Utilize Git and GitHub for version control to manage and collaborate on the codebase
effectively.
Project Summary:
➢ Establishing an end-to-end data pipeline for movie and TV show data retrieval and processing
from The Movie Database (TMDB) API.
➢ Developed Python script and pandas for efficient data extraction, data processing and
transformation.
➢ Deployment on Amazon Web Services (AWS) infrastructure using Terraform for infrastructure
as code.
➢ Utilizing a serverless architecture on AWS lambda function for streamlining infrastructure
management.
➢ Key technologies used include Python and Pandas for data processing, AWS Lambda for
serverless computing, AWS S3 for data storage, AWS CloudWatch for monitoring and
Terraform for infrastructure management.
Professional Experience:
Duration : 3.5 Yrs. (SAP/ERP domain as end user) and 16 Yrs. in Healthcare Mfg. Ops.
Project : Pfizer SAP-ERP deployment project with IBM team for implementation of SAP
at the site location including following phases.
Project Summary:
Educational Qualification: