DON

Pavani
Senior Data Engineer

PROFESSIONAL SUMMARY
 8 years of overall experience as Big Data Engineer, Data Analyst and ETL developer, comprises
designing powerbi, development and implementation of data models for enterprise - level
application.
 Good knowledge in Technologies on systems which comprises of massive amount of data running in
highly distributive mode in Cloudera, Hortonworks Hadoop distributions and Amazon AWS.
 Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase,
Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce
framework, Yarn, Scala and Hue.
338585338585338585338585338585338585338585338585338585338585338585338585338585338
585338585338585338585338585338585338585338585338585338585338585338585338585338585
338585338585338585338585338585338585338585338585338585338585338585338585338585338
585338585338585338585338585338585338585338585338585338585338585338585338585
 Extensive experience in working with NO SQL databases and its integration Dynamo DB, Cosmo DB,
Mongo DB, Cassandra and HBase.
 Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core,
Spark SQL, Spark streaming and expertise in building PySpark and Spark-Scala applications for
interactive analysis, batch processing and stream processing.
 Experience in configuring Spark Streaming to receive real time data from the Apache Kafka and store
the stream data to HDFS and expertise in using spark-SQL with various data sources like JSON,
Parquet and Hive.
 Extensively used Spark Data Frames API over Cloudera platform to perform analytics on Hive data
and also used Spark Data Frame Operations to perform required Validations in the data.
 Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib
and Pandas for organizing data.  Involved in loading the structured and semi structured data into
spark clusters using SparkSQL and DataFrames Application programming interface (API).
 Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive
User Defined Functions (UDF's) as required.
 Excellent knowledge in using Partitions, bucketing concepts in Hive and designed both Managed and
External tables in Hive to optimize performance.
 Proficient in converting Hive/SQL queries into Spark transformations using Data frames and Data
sets.
 Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage.
 Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and
spark jobs on Amazon Web Services (AWS).
 Strong knowledge in working with Amazon EC2 to provide a complete solution for computing, query
processing, and storage across a wide range of applications.
 Capable in using Amazon S3 to support data transfer over SSL and the data gets encrypted
automatically once it is uploaded.
 Skilled in using Amazon Redshift to perform large scale database migrations.
 Ingested data into Snowflake cloud data warehouse using SnowPipe. Extensive experience in working
with micro batching to ingest millions of files on Snowflake cloud when files arrives to staging area.
 Worked in developing Impala scripts for extraction, transformation, loading of data into data
warehouse.
 Extensive knowledge in working with Azure cloud platform (HDInsight, DataLake, DataBricks, Blob
Storage, Data Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer).
 Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
 Implement adhoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.
 Hands on Experience in using Visualization tools like Tableau, Power BI.
 Experience in importing and exporting the data using Sqoop from HDFS to Relational Database
Systems and from Relational Database Systems to HDFS.
TECHNICAL SKILLS:
Big Data Ecosystem HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka, Spark
Flume, Cassandra, Impala, Oozie, Zookeeper, MapReduce, Amazon Web Services (AWS),
EMR
Cloud Technologies AWS, Azure, Google cloud platform (GCP)
IDE’s IntelliJ, Eclipse, Spyder, Jupyter.
Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003
Programming Python, Scala, Linux shell scripts, Java Scripting, PL/SQL, Java, Pig Latin, HiveQL
languages
Databases Oracle, MySQL, DB2, MS-SQL Server, MongoDB, HBASE
Web Dev. Technologies HTML, XML, JSON, CSS, JQUERY, JavaScript
Java Technologies Core Java, Servlets, JSP, JDBC, Java Beans, J2EE
Business Tools Tableau, Power BI
PROFESSIONAL EXPERIENCE
Role: Senior AWS Data Engineer
Client: Fiserv, Brookfield, WI Jan 2023 to Present
Responsibilities:
 Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed
AWS Lambda code from Amazon S3 buckets.
 Created a Lambda Deployment function, and configured it to receive events from your S3 bucket.
 Designed the data models to be used in data intensive AWS Lambda applications which are aimed to
do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key
Business elements from Aurora.
 Writing code that optimizes performance of AWS services used by application teams and provide
Code level application security for clients (IAM roles, credentials, encryption, etc.).
 Using SonarQube for continuous inspection of code quality and to perform automatic reviews of
code to detect bugs. Managing AWS infrastructure and automation with CLI and API.
 Creating AWS Lambda functions using python for deployment management in AWS and designed,
investigated and implemented public facing websites on Amazon Web Services and integrated it with
other applications infrastructure.
 Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is
accessible via Lambda function.
 Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB,
Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service
Catalog.
 Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory
check, Load check, Disk space verification, to ensure the application availability and performance by
using cloud watch and AWS X-ray. implemented AWS X-Ray service inside Confidential, it allows
development teams to visually detect node and edge latency distribution directly from the service
map Tools.
 Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like
S3, ORC/Parquet/Text Files into AWS Redshift.
 Automate Datadog Dashboards with the stack through Terraform Scripts.
 Developed file cleaners using Python libraries and made it clean.
 Experience in building Snow pipe, In-depth knowledge of Data Sharing in Snowflake Database,
Schema and Table structures.
 Exploring DAG's, their dependencies and logs using Airflow pipelines for automation with a creative
approach.
 Designed and implemented a fully operational production grade largescale data solution on
Snowflake.
 Utilized Python Libraries like Boto3, NumPy for AWS.
 Used Amazon EMR for map reduction jobs and test locally using Jenkins.
 Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
 Create external tables with partitions using Hive, AWS Athena and Redshift.
 Developed the PySprak code for AWS Glue jobs and for EMR. Install and configured Splunk clustered
search head and Indexer, Deployment servers, Deployers.  Designing and implementing Splunk -
based best practice solutions. Designed and Developed ETL jobs to extract data from Salesforce
replica and load it in data mart in Redshift.
 Responsible for Designing Logical and Physical data modelling for various data sources on
Confidential Redshift.
 Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS
resources.
 Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages
and updated the status into DynamoDB table.
Technologies: Python, Power BI, AWS Glue, Athena, SSRS, SSIS, AWS S3, AWS Redshift, AWS EMR, AWS
RDS, DynamoDB, SQL, Tableau, Distributed Computing, Snowflake, Spark, Kafka, MongoDB, Hadoop, Linux
Command Line, Data structures, PySpark, Oozie, HDFS, MapReduce, Cloudera, HBase, Hive, Pig, Docker,
Tableau.
Role: Senior Azure Data Engineer

Client: American Airlines, Fort Worth, TX Jan 2021 to Aug
2022
Responsibilities:
 Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of
data.
 Understand current Production state of application and determine the impact of new
implementation on existing business processes.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a
combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
 Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure
DW) and processing the data in In Azure Databricks.
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load
data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool
and backwards.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and
aggregation from multiple file formats for analyzing & transforming the data to uncover insights into
the customer usage patterns.
 Experience in building Snow pipe, In-depth knowledge of Data Sharing in Snowflake Database,
Schema and Table structures.
 Exploring DAG's, their dependencies and logs using Airflow pipelines for automation with a creative
approach.
 Designed and implemented a fully operational production grade largescale data solution on
Snowflake.
 Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks
cluster.
 Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct
level of Parallelism and memory tuning.
 To meet specific business requirements wrote UDF’s in Scala and Pyspark. Developed JSON Scripts
for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
 Hands-on experience on developing SQL Scripts for automation purpose. Created Build and Release
for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).
Technologies: PL/SQL, Python, Azure-Data factory, Azure Blob storage, Azure table storage, Azure SQL
server, Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle 12c, SQL Server, Teradata SQL
Assistant, Teradata Vantage, Microsoft Word/Excel, Flask, Snowflake, DynamoDB, Athena, Lambda,
MongoDB, Pig, Sqoop, Tableau, Power BI and UNIX, Docker, Kubernetes.
Client: Honeywell, India Apr 2017 to Dec 2020
Role: Data Engineer
Responsibilities:
 Experienced in building and architecting multiple Data pipelines, end to end ETL and ELT
process for Data Ingestion and transformation in AWS and Spark.
 Leveraged cloud and GPU computing technologies for automated machine learning and analytics
pipelines, such as AWS
 Participated in all phases of data mining; data collection, data cleaning, developing models,
validation, visualization, performed Gap analysis provide feedback to the business team to
improve the software delivery.
 Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data
Validation, Predictive modeling, Data Visualization on provider, member, claims, and service
fund data.
 Involved in Developing a RESTful API's (Microservices) using Python Flask framework that is
packaged in Docker and deployed in Kubernetes using Jenkins Pipelines.
 Experience in building and architecting multiple Data pipelines, end to end ETL, and ELT processes
for Data ingestion and transformation in Pyspark.
 Created reusable Rest API's that exposed data blended from a variety of data sources by reliably
gathering requirements from businesses directly.
 Worked on the development of Data Warehouse, a Business Intelligence architecture that
involves data integration and the conversion of data from multiple sources and platforms.
 Responsible for full data loads from production to AWS Redshift staging environment and
worked on migrating EDW to AWS using EMR and various other technologies.
 Experience in Creating, Scheduling, and Debugging Spark jobs using Python. Performed Data
Analysis, Data Migration, Transformation, Integration, Data Import, and Data Export through
Python.
 Gathered and processed raw data at scale (including writing scripts, web scraping, calling
APIs, writing SQL queries, and writing applications).
 Creating reusable Python scripts to ensure data integrity between the source
(Teradata/Oracle) and target system.
 Migrated on-premise database structure to Confidential Redshift data warehouse.
 Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket
and then into HDFS and delivered high success metrics.
 Implemented for authoring, scheduling, and monitoring Data Pipelines using Scala and spark.
 Developed and designed system to collect data from multiple platforms using Kafka and then
process it using spark.
 Created modules for spark streaming in data into Data Lake using Spark and Worked with
different feeds data like JSON, CSV, XML and implemented Data Lake concept.
Technologies: Python, Power BI, AWS Glue, Athena, SSIS, AWS S3, AWS Redshift, AWS EMR, AWS RDS,
DynamoDB, SQL, AWS Lambda, Scala, Spark
Role: Hadoop Developer
Client: Soft labs Group, India Aug 2015 to Mar 2017
Responsibilities:
 Worked closely with business, transforming business requirements to technical requirements part of
Design Reviews & Daily project scrums and wrote custom MapReduce programs by writing Custom
input formats.
 Created Sqoop jobs with incremental load to populate Hive External tables.
 Involved in the development of real time streaming applications using PySpark, Kafka on distributed
Hadoop Cluster.
 Worked on Partitioning, Bucketing, Join Optimizations, and query optimizations in Hive.
 Worked closely with business, transforming business requirements to technical requirements.
 Designed and developed Hadoop MapReduce programs and algorithms for analysis of cloud scale
classified data stored in Cassandra.
 Optimized the Hive tables using optimization techniques like partitioning and bucketing to provide
better performance with HiveQL queries.
 Evaluated data import-export capabilities, data analysis performance of Apache Hadoop framework.
 Involved in installation of HDP Hadoop, configuration of the cluster and the eco system components
like Sqoop, Pig, Hive, HBase and Oozie.
 Created reports for BI team using Sqoop to export data into HDFS and Hive.
 Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational
Database system and vice versa.
 Created RDD’s in Spark technology.
 Extracted data from data warehouse (Tera Data) on the spark RDD’s.
 Experience on Spark with Scala/Python.
 Working on stateful Transformation in Spark streaming.
 Worked on Batch processing and Real-time data processing and Spark Streaming using Lambda
architecture.
 Worked on Spark SQL UDF’s and Hive UDF’s.
Technologies: Spark, Kafka, Hadoop, Linux Command Line, Data structures, PySpark, Oozie, HDFS,
MapReduce, Cloudera, HBase, Hive, Pig, Docker

DON

Uploaded by

Copyright:

Available Formats

You might also like

DON

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DON

Uploaded by

Copyright:

Available Formats

Pavani

Senior Data Engineer

Role: Senior Azure Data Engineer

You might also like