Professional Documents
Culture Documents
Chaitanya - Sr. Data Engineer
Chaitanya - Sr. Data Engineer
com
Sr. Data Engineer Phone: +1(385)999-0502
Summary:
Over 9+ years of experience in Information Technology which includes experience in Big data, HADOOP
Ecosystem, Core Java/J2EE, Databricks/Spark and strong in Design, Software processes, Requirement gathering,
Analysis and development of software applications
Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
Experience in building bigdata solutions using Lambda Architecture using Cloudera distribution of Hadoop,
MapReduce, Cascading, HIVE and Sqoop.
Experience on Google cloud with deep understanding, design, and development experience with the Google Cloud
Platform(GCP)
Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting tool.
Strong development experience in Java, Apache, Maven, Jenkins, Pig, Servlets, Spring Boot, JUnit, Log4j.
Experienced in J2EE Design Pattern.
Experience implementing Data Pipelines using Lambda Architecture and Kappa Architecture.
Excellent working experience on Big Data Integration and Analytics based on Hadoop, Spark, Kafka frameworks.
Hands on experience working on NoSQL databases including HBase, MongoDB.
Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data
using HIVEQL.
Good understanding on Cloud Based technologies like AWS, GCP, Azure and Spark Architecture with
Databricks.
Good knowledge on Bit Bucket and GitHub Enterprise.
Knowledge on Docker to create Containers using Docker file and orchestrate using Docker Compose and
Kubernetes .
Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including
performance tuning and query optimizing of databases.
Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the
target data warehouse.
Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time
dashboards.
Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including
performance tuning and query optimizing of databases.
Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time
dashboards.
Expertise in Amazon Web Services including Elastic Cloud Compute (EC2) and Dynamo DB.
Good understanding and experience with Software Development methodologies like Agile and Waterfall.
Experienced in importing and exporting data using Sqoop from HDFS (Hive & HBase) to Relational Database
Systems (Oracle, MySQL) and vice-versa.
Experience on Google cloud with deep understanding, design, and development experience with the Google Cloud
Platform(GCP)
Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting tool.
Experienced in developing and designing Web Services (Restful Web services).
Expertise in various Java/J2EE technologies like Servlets, Spring Boot.
Education:
Professional Experience:
Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed
multiple MapReduce jobs in java for data cleaning and processing.
Worked on implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved
throughout Software Development Life Cycle (SDLC)
Led the rollout of data management toolsets on GCP, including sandbox testing and training for platform
engineers, to support efficient data management and storage across the organization.
Worked on CI/CD environment on deploying application on Amazon Elastic Kubernetes Service (EKS).
Deployed and managed Airflow on AWS using Kubernetes, resulting in improved system scalability and
reliability, and reduced infrastructure costs.
Used Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
Build the infrastructure required for optimal extraction, transformation, and loading (ETL) of data from a wide
variety of data sources like Salesforce, SQL Server, Oracle & SAP using Azure, Spark, Python, Hive, Kafka and
other Bigdata technologies.
Worked on Apache Knife as ETL tool for batch processing and real time processing.
Containerized applications using Docker and orchestrated using Kubernetes.
Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka
Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and translate to
MapReduce jobs.
Implemented design patterns in Scala for the application.
Developed ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes Snow
SQL and Writing SQL queries against Snowflake.
Worked on Migrating Objects from Netezza to Snowflake.
Worked on writing Python scripts to parse JSON documents and load the data into the database.
Worked on data cleaning and reshaping and generated segmented subsets using NumPy and Pandas in Python.
Developed Python scripts to automate the data sampling process for the Payment system.
Developed PySpark frame to bring data from DB2 to Amazon S3.
Setting up infrastructure Implementing Configuring Externalizing HTTPD mod jk mod rewrite. mod proxy JNDI
SSL etc.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
Developed UDFs in Java as and when necessary to use in HIVE queries.
Coordinated with various stakeholders such as the End Client, DBA Teams, Testing Team and Business Analysts.
Involved in gathering requirements and developing a project plan.
Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting
tool.
Developed Hive Scripts, UNIX Shell scripts, programming for all ETL loading processes and converting the files
into parquet in the Hadoop File System.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code
review sessions.
Proactively monitored systems and services, architecture design and implementation of Hadoop deployment,
configuration management, backup, and disaster recovery systems and procedures
Configured Spark streaming to receive real time data from the Kafka and store the stream data
to HDFS using Scale.
Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed
multiple MapReduce jobs in java for data cleaning and processing.
Involved in converting Hive/SQL queries into transformations using Python
Worked on implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved
throughout Software Development Life Cycle (SDLC)
Built and optimized data pipelines in Apache Airflow for ETL-related jobs, resulting in a 60% increase in data
processing efficiency.
Developed and implemented Ab Initio ETL workflows to extract, transform, and load data from various sources
such as flat files, databases, and APIs into data warehouses.
Collaborated with data engineers, data analysts, and business stakeholders to deliver Airflow-based data
solutions that met business requirements and industry best practices.
Designed and implemented over 50 Airflow DAGs to manage ETL workflows and data pipelines, resulting in the
successful processing of over 10 TB of data daily.
Created custom Airflow operators and sensors to handle various data sources and destinations, resulting in a
reduced development time and improved data processing accuracy.
Containerized applications using Docker and orchestrated using Kubernetes.
Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka
Involved in creating Hive tables, loading the data and writing hive queries, which will run internally in map
reduce.
Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using Spark.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the
log data.
Worked in different parts of data lake implementations and maintenance for ETL processing.
Developed Spark Streaming application using Scala and python for processing data from Kafka
Performed end-to-end delivery of pyspark on ETL pipelines on GCP databricks.
Implemented various optimization techniques in spark streaming with python applications.
Imported batch data using Sqoop to load data from MySQL to HDFS at regular intervals.
Extracted data from various APIs, data cleansing, and processing using Java and Scala.
Converted Hive queries into SparkSQL that integrated Spark environment for optimized runs.
Developed a migration data pipeline from HDFS on-prem cluster to HD Insights.
Performed end-to-end delivery of pyspark on ETL pipelines on GCP data bricks.
Developed Complex queries and ETL processes in Jupyter notebooks using data bricks spark.
Created UDFs in Hive to implement custom funtions.
Involved in developing Shell scripts to easy execution of all other scripts (Hive, and MapReduce) and move the
data files within and outside of HDFS.
Collaborate with a team to develop, deploy, maintain, and update cloud solutions Design solution architecture
on the Google Cloud Platform( GCP)
Experience building data integrations with Python, Python API data extraction, Airflow, container applications,
Docker, Kubernetes, BigQuery, GCP
Implemented a script to transmit suspiring information from Oracle to HBase using Sqoop.
Implemented best income logic using Pig scripts and UDFs.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for
the BI team.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data.
Generated Java APIs for retrieval and analysis on No-SQL database such as HBase.
Created ETL jobs to generate and distribute reports from MySQL database using Pentaho Data Integration.
Worked on loading data from UNIX file system to HDFS
Advanced knowledge of the Google Cloud Platform(GCP) ecosystem around Big Query.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Continuous integration using Jenkins for nightly builds and send automatic emails to the team.
Used Jenkins plugins for code coverage and also to run all the test before generating war file.
Responsibilities:
Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and
used servlets for handling the business.
Developed Scala programs with Spark for data in Hadoop ecosystem.
Used Service REST Web Services (JAX-RS) for integration with other systems.
Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and
exposed as the Restful Web services.
Data modeled HBase tables to load large sets of structured, semi-structured and unstructured data coming
from UNIX, NoSQL and a variety of data sources.
Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and translate to
MapReduce jobs.
Developed UDFs in Java as and when necessary to use in HIVE queries.
Coordinated with various stakeholders such as the End Client, DBA Teams, Testing Team and Business Analysts.
Involved in gathering requirements and developing a project plan.
Involved in understanding requirements, functional specifications, designing documentations and testing
strategies.
Involved in UI designing, Coding, Database Handling.
Involved in Unit Testing and Bug Fixing.
Worked over the entire Software Development Life Cycle (SDLC) as a part of a team as well as independently.
Written SQL queries to query the database and providing data extracts to users as per request
Hudda InfoTech Private Limited Hyderabad, India October 2014 to November 2015
Java Developer
Responsibilities
Dhruvsoft Services Private Limited, Hyderabad, India June 2013 to September 2014
Data Engineer
Responsibilities