Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Chaitanya N Email: nchaitanya008@gmail.

com
Sr. Data Engineer Phone: +1(385)999-0502

Summary:

 Over 9+ years of experience in Information Technology which includes experience in Big data, HADOOP
Ecosystem, Core Java/J2EE, Databricks/Spark and strong in Design, Software processes, Requirement gathering,
Analysis and development of software applications
 Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
 Experience in building bigdata solutions using Lambda Architecture using Cloudera distribution of Hadoop,
MapReduce, Cascading, HIVE and Sqoop.
 Experience on Google cloud with deep understanding, design, and development experience with the Google Cloud
Platform(GCP)
 Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting tool.
 Strong development experience in Java, Apache, Maven, Jenkins, Pig, Servlets, Spring Boot, JUnit, Log4j.
 Experienced in J2EE Design Pattern.
 Experience implementing Data Pipelines using Lambda Architecture and Kappa Architecture.
 Excellent working experience on Big Data Integration and Analytics based on Hadoop, Spark, Kafka frameworks.
 Hands on experience working on NoSQL databases including HBase, MongoDB.
 Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data
using HIVEQL.
 Good understanding on Cloud Based technologies like AWS, GCP, Azure and Spark Architecture with
Databricks.
 Good knowledge on Bit Bucket and GitHub Enterprise.
 Knowledge on Docker to create Containers using Docker file and orchestrate using Docker Compose and
Kubernetes .
 Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including
performance tuning and query optimizing of databases.
 Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the
target data warehouse.
 Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
 Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time
dashboards.
 Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including
performance tuning and query optimizing of databases.
 Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time
dashboards.
 Expertise in Amazon Web Services including Elastic Cloud Compute (EC2) and Dynamo DB.
 Good understanding and experience with Software Development methodologies like Agile and Waterfall.
 Experienced in importing and exporting data using Sqoop from HDFS (Hive & HBase) to Relational Database
Systems (Oracle, MySQL) and vice-versa.
 Experience on Google cloud with deep understanding, design, and development experience with the Google Cloud
Platform(GCP)
 Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting tool.
 Experienced in developing and designing Web Services (Restful Web services).
 Expertise in various Java/J2EE technologies like Servlets, Spring Boot.

Education:

Bachelors: Computer Science, JNTU K, AP, India-2014 GPA:3.05


Masters: Systems Engineering, Colorado Technical University, CO-2017 GPA-3.20

Professional Experience:

NS Corp, Atlanta, GA Jan 2022 – Till Date


Sr. Data Engineer
Responsibilities:

 Developed Batch Processing data pipelines using Spark.


 Developed Stream Processing data pipelines using Spark Streaming, and Kafka.
 Handled large amounts of data across many commodity servers with Apache.
 Used Spark to perform in-memory processing on data in Hive.
 Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation,
and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the
customer usage patterns.
 Experience in building ETL scripts in languages like PLSQL, Informatica, Hive, Pig, and PySpark and expertise in
Creating, Debugging, Scheduling, and Monitoring jobs using Airflow and Oozie.
 Transform and analyze the data using Pyspark, HIVE, based on ETL mappings
 Utilized Azure Data Factory to orchestrate data migration pipelines and transfer data from various data sources
such as Azure SQL databases and Azure Blob storage to GCP storage solutions like BigQuery and Google Cloud
Storage.
 Configured network connectivity between Azure and GCP to enable data transfer and maintain data security
through VPC peering and VPN gateways.
 Utilized GCP Transfer Service to simplify the migration process and transfer large amounts of data from Azure
Blob storage to Google Cloud Storage in a highly available and scalable manner.
 Used Azure Analysis Services to design and develop measures, KPIs, derived and custom fields to meet business
needs and improve data analysis.
 Implemented monitoring and logging using GCP Stackdriver to track data transfer and identify any anomalies or
errors in real-time, ensuring quick resolution of issues.
 Implemented Data Ingestion, Data Processing frameworks for building Scalable Data Platform solutions.
 Developed a plumbing services to gather metrics for monitoring and bench marking the Data Pipeline.
 Migrate the data to the target state and build sub-Data Pipeline in Google Cloud Platform.( GCP)
 Explored with the Spark improving the performance and optimization of the existing algorithms
in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
 Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.
 Deployed application to AWS and monitored the load balancing of different EC2 instances
 Created pipeline scripts for build and deployment automation for Java based project by using Jenkins, UDF.
 Deployed Spring Boot based micro services Docker container using Amazon EC2 container services.
 Developed a POC for project migration from on prem Hadoop MapR system to Snowflake
 Developed Kafka producer and consumers using the Kafka Java API.
 Written python scripts to analyse the data of the customer.
 Successfully translated the SQL scripts to Spark jobs.
 Experience in using Python for Data Engineering and Modeling.
 Generated script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and aggregation on
PySpark code.
 Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and
exposed as the Restful Web services.
 Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.
 Involved in converting MapReduce programs into Spark transformations using Spark RDD's on Scala.
 Implemented using SCALA, Python and SQL for faster testing and processing of data. Real time streaming the
data using with Kafka.
 Developed and designed automation framework using Shell scripting.
 Developed Hive Scripts, UNIX Shell scripts, programming for all ETL loading processes and converting the files
into parquet in the Hadoop File System.
 Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code
review sessions
 Experience in developing taking requirements on raw data and produce the consumable data based on the
requirement using DataStage, pure standard ETL development experience
 Worked on Version control systems like Subversion, GIT, CVS.

Broadridge, Lake Success, NY May 2020 – Dec 2021


Sr. Data Engineer
Responsibilities:

 Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed
multiple MapReduce jobs in java for data cleaning and processing.
 Worked on implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
 Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved
throughout Software Development Life Cycle (SDLC)
 Led the rollout of data management toolsets on GCP, including sandbox testing and training for platform
engineers, to support efficient data management and storage across the organization.
 Worked on CI/CD environment on deploying application on Amazon Elastic Kubernetes Service (EKS).
 Deployed and managed Airflow on AWS using Kubernetes, resulting in improved system scalability and
reliability, and reduced infrastructure costs.
 Used Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
 Build the infrastructure required for optimal extraction, transformation, and loading (ETL) of data from a wide
variety of data sources like Salesforce, SQL Server, Oracle & SAP using Azure, Spark, Python, Hive, Kafka and
other Bigdata technologies.
 Worked on Apache Knife as ETL tool for batch processing and real time processing.
 Containerized applications using Docker and orchestrated using Kubernetes.
 Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka
 Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and translate to
MapReduce jobs.
 Implemented design patterns in Scala for the application.
 Developed ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes Snow
SQL and Writing SQL queries against Snowflake.
 Worked on Migrating Objects from Netezza to Snowflake.
 Worked on writing Python scripts to parse JSON documents and load the data into the database.
 Worked on data cleaning and reshaping and generated segmented subsets using NumPy and Pandas in Python.
 Developed Python scripts to automate the data sampling process for the Payment system.
 Developed PySpark frame to bring data from DB2 to Amazon S3.
 Setting up infrastructure Implementing Configuring Externalizing HTTPD mod jk mod rewrite. mod proxy JNDI
SSL etc.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
 Developed UDFs in Java as and when necessary to use in HIVE queries.
 Coordinated with various stakeholders such as the End Client, DBA Teams, Testing Team and Business Analysts.
 Involved in gathering requirements and developing a project plan.
 Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting
tool.
 Developed Hive Scripts, UNIX Shell scripts, programming for all ETL loading processes and converting the files
into parquet in the Hadoop File System.
 Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code
review sessions.

Change healthcare, Nashville, TN June 2018 to April 2020

Sr. Data Engineer


Responsibilities:

 Proactively monitored systems and services, architecture design and implementation of Hadoop deployment,
configuration management, backup, and disaster recovery systems and procedures
 Configured Spark streaming to receive real time data from the Kafka and store the stream data
to HDFS using Scale.
 Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed
multiple MapReduce jobs in java for data cleaning and processing.
 Involved in converting Hive/SQL queries into transformations using Python
 Worked on implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
 Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved
throughout Software Development Life Cycle (SDLC)
 Built and optimized data pipelines in Apache Airflow for ETL-related jobs, resulting in a 60% increase in data
processing efficiency.
 Developed and implemented Ab Initio ETL workflows to extract, transform, and load data from various sources
such as flat files, databases, and APIs into data warehouses.
 Collaborated with data engineers, data analysts, and business stakeholders to deliver Airflow-based data
solutions that met business requirements and industry best practices.
 Designed and implemented over 50 Airflow DAGs to manage ETL workflows and data pipelines, resulting in the
successful processing of over 10 TB of data daily.
 Created custom Airflow operators and sensors to handle various data sources and destinations, resulting in a
reduced development time and improved data processing accuracy.
 Containerized applications using Docker and orchestrated using Kubernetes.
 Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka
 Involved in creating Hive tables, loading the data and writing hive queries, which will run internally in map
reduce.
 Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
 Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using Spark.
 Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the
log data.
 Worked in different parts of data lake implementations and maintenance for ETL processing.
 Developed Spark Streaming application using Scala and python for processing data from Kafka
 Performed end-to-end delivery of pyspark on ETL pipelines on GCP databricks.
 Implemented various optimization techniques in spark streaming with python applications.
 Imported batch data using Sqoop to load data from MySQL to HDFS at regular intervals.
 Extracted data from various APIs, data cleansing, and processing using Java and Scala.
 Converted Hive queries into SparkSQL that integrated Spark environment for optimized runs.
 Developed a migration data pipeline from HDFS on-prem cluster to HD Insights.
 Performed end-to-end delivery of pyspark on ETL pipelines on GCP data bricks.
 Developed Complex queries and ETL processes in Jupyter notebooks using data bricks spark.
 Created UDFs in Hive to implement custom funtions.
 Involved in developing Shell scripts to easy execution of all other scripts (Hive, and MapReduce) and move the
data files within and outside of HDFS.
 Collaborate with a team to develop, deploy, maintain, and update cloud solutions Design solution architecture
on the Google Cloud Platform( GCP)
 Experience building data integrations with Python, Python API data extraction, Airflow, container applications,
Docker, Kubernetes, BigQuery, GCP
 Implemented a script to transmit suspiring information from Oracle to HBase using Sqoop.
 Implemented best income logic using Pig scripts and UDFs.
 Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for
the BI team.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
 Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data.
 Generated Java APIs for retrieval and analysis on No-SQL database such as HBase.
 Created ETL jobs to generate and distribute reports from MySQL database using Pentaho Data Integration.
 Worked on loading data from UNIX file system to HDFS
 Advanced knowledge of the Google Cloud Platform(GCP) ecosystem around Big Query.
 Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
 Continuous integration using Jenkins for nightly builds and send automatic emails to the team.
 Used Jenkins plugins for code coverage and also to run all the test before generating war file.

Charter Communications, Neganee, MI March 2016 to May 2018

Big Data Analyst

Responsibilities:

 Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and
used servlets for handling the business.
 Developed Scala programs with Spark for data in Hadoop ecosystem.
 Used Service REST Web Services (JAX-RS) for integration with other systems.
 Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and
exposed as the Restful Web services.
 Data modeled HBase tables to load large sets of structured, semi-structured and unstructured data coming
from UNIX, NoSQL and a variety of data sources.
 Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and translate to
MapReduce jobs.
 Developed UDFs in Java as and when necessary to use in HIVE queries.
 Coordinated with various stakeholders such as the End Client, DBA Teams, Testing Team and Business Analysts.
 Involved in gathering requirements and developing a project plan.
 Involved in understanding requirements, functional specifications, designing documentations and testing
strategies.
 Involved in UI designing, Coding, Database Handling.
 Involved in Unit Testing and Bug Fixing.
 Worked over the entire Software Development Life Cycle (SDLC) as a part of a team as well as independently.
 Written SQL queries to query the database and providing data extracts to users as per request

Hudda InfoTech Private Limited Hyderabad, India October 2014 to November 2015

Java Developer
Responsibilities

 Develop Web tier using Spring MVC Framework.


 Perform database operations on the consumer portal using Spring Jdbc template.
 Implemented design patterns in Scala for the application.
 Setting up infrastructure Implementing Configuring Externalizing HTTPD mod jk mod rewrite. mod proxy JNDI
SSL etc.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
 Implemented Restful services in Spring.
 Serialize and de-serialize objects using Play Json library.
 Developing traits and case classes etc in scala.
 Develop quality code adhering to Scala coding Standards and best practices.
 Writing complex Sql queries.
 Develop GUI using JQuery Json and Java script.
 Unit testing Integration testing and bug fixing.
 Followed Agile methodology (Stand up meetings, retrospective meetings, sprint development and Pair
programming).
 Developed application code using Eclipse IDE and configured with Maven, Glassfish server and JUnit.
 Developed Use Case Diagrams, Sequence Diagrams and Class Diagrams using Rational Rose.
 Developed the controller servlet to handle the requests and responses.
 Developed JSP pages with MVC architecture using Spring MVC, Servlets and Simple tags.
 Configured Maven dependencies for application building processes.
 Used Spring Dependency Injection to set up dependencies between the objects.
 Optimized the source code and queries to improve performance using Hibernate.
 Assisted other team members with various technical issues including JavaScript, CSS, JSP and Server related
issues.

Dhruvsoft Services Private Limited, Hyderabad, India June 2013 to September 2014

Data Engineer

Responsibilities

 Migrating data from FS to Snowflake within the organization


 Imported Legacy data from SQL Server and Teradata into Amazon S3.
 Created consumption views on top of metrics to reduce the running time for complex queries.
 Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.
 Compare the data in a leaf level process from various databases when data transformation or data loading takes
place. I need to analyze and look into the data quality when these types of loads are done (To look for any data
loss, data corruption).
 As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history
data from Teradata SQL to snowflake.
 Developed SQL scripts to Upload, Retrieve, Manipulate and handle sensitive data (National Provider Identifier
Data I.e. Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases
for the Project
 Worked on to retrieve the data from FS to S3 using spark commands

 Implemented Restful services in Spring.


 Serialize and de-serialize objects using Play Json library.
 Developing traits and case classes etc in scala.
 Develop quality code adhering to Scala coding Standards and best practices.
 Writing complex Sql queries.

You might also like