Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Harsh G Miyani

(814)-844-3102
Hmiyani8@gmail.com
Atlanta, GA

Senior GCP Data engineer with leadership abilities who is experienced, results-oriented, resourceful,
and problem-solving. Adapted to the constraints of tight release dates and triumphed. Around 6+
years of diversified IT expertise, including development and implementation of several applications in
big data and mainframe systems.

PROFILE SUMMARY:
 Totally around + 6years of comprehensive experience as a Data Engineer, Big Data &
Analytics Developer.
 Experienced in leading the full project lifecycle from design to implementation, with a focus
on ensuring data integrity within data lakes.
 Proficient in designing and developing ETL processes using AWS Glue to migrate data from
various sources into AWS Redshift.
 Skilled in logical and physical data modeling, specializing in star schemas for Enterprise Data
Warehouses and data marts.
 Proficient in orchestrating ETL processes using Oozie workflows and Python scripting for
automation, including tasks with Airflow DAGs.
 Strong expertise in architecting and implementing complex data pipelines, leveraging Airflow
for scheduling and monitoring workflows.
 Well-versed in data modeling concepts like star-schema modeling and Snowflake, with
proficiency in PySpark for data quality checks and report generation.
 Experienced in data visualization and reporting techniques for effective analytical insights
presentation.
 Expertise in Spark technologies for interactive analysis, batch processing, and stream
processing.
 Extensive experience with Amazon Web Services (AWS), including EC2, S3, Glue, Lambda,
and EMR.
 Skilled in implementing NoSQL solutions and managing databases like Oracle and NoSQL in
production environments.
 Proficient in developing ETL pipelines within data warehouses and producing regulatory and
financial reports using advanced SQL queries.
 Experienced in real-time streaming pipelines using Kafka, Spark Streaming, and Redshift.
 Skilled in designing logical and physical data flow models for ETL applications like
Informatica.
 Proficient in cloud platforms like AWS and GCP, including services like S3, EMR, Snowflake,
and BigQuery.
 Experienced in developing and maintaining data pipelines using Boto3 and AWS services for
smooth data ingestion, transformation, and storage.
 Skilled in performance tuning and optimization of Spark applications, along with experience
in troubleshooting and debugging Python applications.
 Proficient in utilizing Talend for real-time data integration and ensuring data quality checks
and validation processes.
 Strong communication and collaboration skills, with experience working with stakeholders,
business analysts, and data scientists.
 Experienced in designing, developing, and testing databases, stored procedures, and SQL
queries.
 Proficient in job scheduling, deployment, and maintenance using tools like Control-M and
SAP Data Services Management Console.
 Skilled in technical documentation, including Technical Design Documents (TDDs) and system
design documents.
 Experienced in working with various development methodologies, including Agile.

TECHNICAL SKILLS:

PROGRAMMING LANGUAGES Java, Scala, Python and Shell Scripting, Scala

BIG DATA ECOSYSTEM Spark, Hive, SQOOP, Oozie, ELK, Pig, Kafka, Play2
MapReduce,
CLOUD Snowflake, AWS EMR, EC2, S3, RDS, Dataflow, Azure
Data Factory, Blob Storage, Azure Data Lake
DBMS SQL Server, MySQL, PL/SQL, Oracle, Database modelling,
PostgreSQL
NoSQL Databases Cassandra, Mongo DB, Dynamo DB
IDEs Eclipse, Visual Studio, Version Control
OPERATING SYSTEMS Windows, Unix, Linux, MacOS-X, CentOS
FRAMEWORKS MVC, Struts, Power BI, Maven, Junit, Log4J, ANT, Tableau,
Qlik, Splunk, Aqua-data Studio
ETL Tools Databricks Lakehouse Platform, Five Tran,
METHODOLOGIES Agile, Waterfall, TDD, ATDD

EMPLOYEMENT DETAILS
UnitedHealth Group Atlanta, GA July 2022 to current
Role: Sr. GCP Data engineer
Responsibilities:

 Lead the project life cycle, encompassing design, development, and implementation, with a
focus on verifying data integrity within the data lake.
 Design and develop ETL processes utilizing AWS Glue to seamlessly migrate data from diverse
external sources such as S3 and text files into AWS Redshift.
 Proficient in logical and physical data modeling, specializing in the creation of star schemas
for Enterprise Data Warehouses and data marts.
 Develop and orchestrate ETL processes using Oozie workflows, alongside Python scripting for
automation, including tasks like extracting weblogs using Airflow DAGs.
 Architect and implement complex data pipelines, leveraging Airflow for scheduling and
monitoring workflows.
 Working with AWS/GCP cloud using GCP Cloud storage, Data-proc, Data Flow, Big-Query,
EMR, S3, Glacier and EC2 with EMR Cluster.
 Apply data modeling concepts like star-schema modeling and Snowflake, authoring PySpark
scripts for rigorous data quality checks and report generation.
 Proficient in data visualization and reporting techniques to present analytical insights
effectively.
 Expertise in Spark technologies, including Spark Core, Spark SQL, and Spark streaming, for
interactive analysis, batch processing, and stream processing.
 Extensive experience working with Amazon Web Services (AWS), utilizing EC2 for computing
and S3 for storage.
 Set up databases in GCP using RDS, storage using GCS bucket and configuring instance
backups to GCS bucket.
 Design and implement NoSQL solutions, managing Oracle and NoSQL databases in
production environments.
 Evaluate system performance and validate NoSQL solutions, while providing consultation on
Snowflake Data Platform Solution Architecture and Deployment.
 Develop ETL Pipelines within the data warehouse, producing regulatory and financial reports
using advanced SQL queries in Snowflake.
 Stage API or Kafka data into Snowflake DB, employing Looker for report generation based on
Snowflake connections.
 Analyze impact changes on existing ETL/ELT processes to ensure timely data availability for
reporting.
 Architect, build, and manage ELK (Elasticsearch, Logstash, Kibana) clusters for centralized
logging and search functionalities.
 Implement real-time streaming pipelines using Kafka, Spark Streaming, and Redshift.
 Design logical and physical data flow models for Informatica ETL applications.
 Incorporate AWS services like S3 and RDS for hosting static & media files and databases in
the cloud.
 Develop and execute job streams using Control-M, alongside writing shell scripts for data
 extraction from Unix servers into Hadoop HDFS.
 Work extensively on building data pipelines within Docker container environments during
the development phase.

Environment: Hadoop, Spark, Hive, Teradata, Tableau, Linux, Python, Kafka, Snowflake, AWS S3
Buckets, AWS Glue, NiFi, PostgreSQL, AWS EC2, Oracle PL/SQL, AWS stack, Development tool kit
(JIRA, Bitbucket/Git, ServiceNow, etc.)

Exomoon Infotech Surat, Gujart March 2020 – December 2021


Role: GCP Data Engineer
Responsibilities:

 Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3
for small data sets processing and storage.
 Loaded data into S3 buckets using AWS Lambda Functions, AWS Glue and PySpark and
filtered data stored in S3 buckets using Elasticsearch and loaded data into Hive external
tables. Maintained and operated Hadoop cluster on AWS EMR.
 Used AWS EMR Spark cluster and Cloud Dataflow on GCP to compare the efficiency of a POC
on a developed pipeline.
 Implemented AWS Step functions to automate and orchestrate the Amazon SageMaker
related tasks such as publishing data to S3, training ML model and deploying it for prediction.
 Install and configure Apache Airflow for AWS S3 bucket and created dags to run the Airflow.
 Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks
running on Amazon SageMaker.
 Developed Oozie work processes for planning and arranging the ETL cycle. Associated with
composing Python script to computerize the way towards extricating weblogs utilizing
Airflow DAGs.
 Performing tuning and optimizations on the glue and lambda jobs for optimal performance.
 Writing Python scripts to design and develop ETL(Extract-Transform-Load) process to Map
the data, transform it and to load them to target and perform Python unit tests to ensure its
accuracy and usefulness.
 Set up GCP firewall rules to allow deny traffic to and from VM’s instances based on specified
configuration and used GCP and CDN (content delivery network) to deliver content from GCP
cache locations drastically improving user experience and latency.
 Implemented real-time data integration solutions using Talend Real-Time Big Data Platform,
enabling streaming data processing and event-driven architectures
 Created data quality checks and validation processes using Talend to ensure data accuracy,
completeness, and consistency
 Collaborated with stakeholders, such as business analysts and data scientists, to understand
data requirements and design data integration solutions using Talend.
 Developed and maintained data pipelines using Boto3 and AWS services, such as Amazon S3,
AWS Glue, and AWS Lambda, to ensure smooth data ingestion, transformation, and storage.
 Implemented ETL (Extract, Transform, Load) processes to efficiently process and integrate
large volumes of structured and unstructured data from various sources into a centralized
data lake using Boto3 and AWS Glue.
 Performed troubleshooting and deployed many Python bug fixes of the main applications
that were maintained
 Worked with Spark for improving performance and optimization of the existing algorithms in
Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD, Spark YARN.
 Created Airflow scheduling scripts in python.
 Design and Develop the Components to ingest the real time stream data into Snowflake
using DynamoDB, Lambda, S3, Snow Pipe and Snowflake Tasks using Python as programming
language.
 Setup Snowflake Stage and Snowpipe for continuous loading of data from S3 buckets into the
landing table.
 Develop Snowflake Procedure to perform transformations, load the data into target table
and purge the stage tables.
 Configured Snow pipe to pull the data from S3 buckets into Snowflakes table and stored
incoming data in the Snowflakes staging area.
 Performed tuning of Spark Applications to set batch interval time and correct level of
Parallelism and memory tuning.
 Collaborate with Basis team to implement SAP WEB IDE in SAP Cloud Platform - Git
integration for building SAP HANA artifacts. Using the SAP PAL to develop models for
Machine learning.

Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, S3, EC2, MapR, HDFS, Hive, PIG, Apache
Kafka, Sqoop, Python, Scala, Pyspark, SAP HANA, Shell scripting, Linux, Avro, MySQL, NoSQL, SOLR,
Jenkins, Eclipse, Oracle, Git, Oozie, SOAP, Cassandra, and Agile Methodologies.

BlueRibbon Solution Surat, Gujarat June 2018 – February 2020


Role: ETL Developer
Responsibilities:

 Analyzed, designed, and developed databases using ER diagrams, normalization, and


relational database concepts.
 Involved in the design, development, and testing of the system.
 Developed SQL Server stored procedures, tuned SQL queries (using indexes and execution
plans).
 Developed user-defined functions and created views.
 Experienced in job debugging, job check-in and check-out to the central repository, and
labeling.
 Loaded data into the target system via AWS services such as AWS Glue, AWS Data Pipeline,
or custom ETL tools.
 Executed jobs for respective environments (Development, Quality, and Production systems)
in SAP Data Services Management Console.
 Implemented IDOC integration in BODS.
 Created Technical Design Documents (TDDs) to track code changes.
 Implemented triggers to maintain referential integrity.
 Implemented exceptional handling.
 Worked on client requirements and wrote complex SQL queries to generate Crystal Reports.
 Created and automated regular jobs.
 Tuned and optimized SQL queries using execution plans and profilers.
 Developed the controller component with Servlets and action classes.
 Developed business components (model components) using Enterprise Java Beans (EJB).
 Established schedule and resource requirements by planning, analyzing, and documenting
development effort to include timelines, risks, test requirements, and performance targets.
 Analyzed system requirements and prepared system design documents.
 Developed dynamic user interfaces with HTML, JavaScript, using JSP and Servlet technology.
 Used JMS elements for sending and receiving messages.
 Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files,
XML files) and loaded data into destination tables by performing different kinds of
transformations using SSIS/DTS packages.
 Experienced in slowly changing dimensions in SSIS packages.
 Developed, monitored, and deployed SSIS packages.
 Responsible for scheduling jobs, alerting, and maintaining SSIS packages.
 Created and executed test plans using Quality Center by Test Director.
 Mapped requirements with test cases in Quality Center.
 Supported system tests and user acceptance tests.
 Rebuilt indexes and tables as part of performance tuning exercises.
 Involved in performing database backup and recovery.
 Worked on documentation using MS Word.

Environment: MS SQL Server, SSRS, SSIS, SSAS, DB2, HTML, XML, JSP, Servlet, JavaScript, EJB, JMS, MS
Excel, MS Word, Amazon Web Services (AWS) (additionally mention specific AWS services relevant to
your experience).

EDUCATIONAL QUALIFICATION:

 Bachelor of Computer Application from VNSGU, 2019


 Master’s in computer science from Gannon University, Erie, PA, USA.2023

You might also like