Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Name: Akhil Sai Bheemshetty

Email: akhil@crispmails.com
Contact: +1 6147697922
Professional Summary:

• Around 5 years of Professional experience in IT Industry, involved in developing, Implementing,


configuring, testing Hadoop ecosystem components and maintenance of various web-based
applications.
• Experience in designing and implementing complete end-to-end Hadoop Infrastructure using
MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Oozie and Zookeeper.
• Expert Hands-on in Installing, Configuring, Testing Hadoop Ecosystem components
• Experience on Hadoop clusters using major Hadoop Distributions – Cloudera and Hortonworks (HDP).
• Familiar and good exposure with Apache Spark ecosystem such as Shark, Spark Streaming using Scala
and Python
• Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task
Tracker, Name Node, Data Node and MapReduce concepts.
• Experience in working with MapReduce programs using Hadoop for working with Big Data.
• Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs.
• Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and
vice-versa.
• Collecting and aggregating large amount of Log Data using Apache Flume and storing data in HDFS for
further analysis.
• Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager. 
• Job/workflow scheduling and monitoring tools like Oozie.
• Using Informatica PowerCenter Designer analyzed the source data to Extract & Transform from various
source systems(oracle 10g,DB2,SQL server and flat files) by incorporating business rules using different
objects and functions that the tool supports
• Experience in designing both time driven, and data driven automated workflows using Oozie.
• Experience in different layers of Hadoop Framework – Storage (HDFS), Analysis (Pig and Hive),
Engineering (Jobs and Workflows).
• Good understanding of NoSQL databases and hands on experience with HBase.
• Experienced in loading dataset into Hive for ETL (Extract, Transfer and Load) operation.
• Working knowledge in SQL, PL/SQL, Stored Procedures, Functions, Packages, DB Triggers, Indexes, SQL*
Loader.
• Experience in Amazon AWS cloud services (EC2, EBS, S3).
• Experienced in using Integrated Development environments like Eclipse, NetBeans, Kate and gEdit.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies Hadoop, Map Reduce, Sqoop, Hive, Oozie, Spark, Zookeeper and
Cloudera Manager, Kafka, Flume.
ETL Tools Informatica
NO SQL Database Cosmos,HBase, Cassandra, Dynamo DB, Mongo DB.
Monitoring and Reporting Tableau, Custom shell scripts, PowerBI
Hadoop Distribution Horton Works, Cloudera
Build Tools Maven
Programming & Scripting Python, Scala, SQL, Shell Scripting, C, C++
Databases Oracle, MY SQL, Teradata
Version Control GIT
Operating Systems Linux, Unix, Mac OS-X, CentOS, Windows 10, Windows 8,
Windows 7
Cloud Computing AWS, AWS EC2, AWS S3, Azure SQL Database, Azure Data Studio,
Azure SQL Datawarehouse, Azure Data Factory (ADF)
Web Technologies HTML, XML, JDBC, JSP, CSS, JavaScript, SOAP

EDUCATION:

MASTERS IN COMPUTER SCIENCE : UNIVERSITY OF CENTRAL MISSOURI December 2022

PROFESSIONAL EXPERIENCE:

Client: DaVita Health care, TN February 2022 - Present


Role: Senior Data Engineer

Responsibilities:

• Proficient in working with Azure cloud platform (HDInsight, DataLake, DataBricks, Blob Storage, Data
Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer).  
• Designed and deployed data pipelines using DataLake, DataBricks, and Apache Airflow.
• Enabling other teams to work with more complex scenarios and machine learning solutions.  
• Worked on Azure Data Factory to integrate data of both on-prem (MY SQL, Cassandra) and cloud (Blob
storage, Azure SQL DB) and applied transformations to load back to Azure Synapse.  
• Evolved in Spark Scala functions for mining data to provide real time insights and reports.
• Configured spark streaming to receive real time data from the Apache Flume and store the stream data
using Scala to Azure Table.
• DataLake is used to store and do all types of processing and analytics. 
• Created Data Marts after analyzing the raw data in Data warehouse and store the data in different
sections based on the business or department area and make this data available to the next step i.e to
easy access for users to insights.
• Ingested data into Azure Blob storage and processed the data using Databricks. Involved in writing Spark
Scala scripts and UDF's to perform transformations on large datasets.  
• Utilized Spark Streaming API to stream data from various sources. Optimized existing Scala code and
improved the cluster performance.  
• Involved in using Spark DataFrames to create Various Datasets and applied business transformations and
data cleansing operations using DataBricks Notebooks. 
• Efficient in writing Python scripts to build ETL pipeline and Directed Acyclic Graph (DAG) workflows
using Airflow, Apache NiFi.
• Tasks are distribution on celery workers to manage communication between multiple services.  
• Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from
Cassandra to Azure SQL Datawarehouse and improved the query performance.  
• Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data
Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).  
• Designed custom-built input adapters using Spark, Hive, and Sqoop to ingest and analyze data
(Snowflake, MS SQL, MongoDB) into HDFS.  
• Loaded data from Web servers and Teradata using Sqoop, Flume and Spark Streaming API. 
• Used Flume sink to write directly to indexers deployed on cluster, allowing indexing during ingestion. 
• Migrated from Oozie to Apache Airflow. Involved in developing Oozie and Airflow.
• Implemented performance tuning logic on Targets, Sources, Mappings and Sessions to provide maximum
efficiency and performance
• workflows for daily incremental loads, getting data from RDBMS (MongoDB, MS SQL).  
• Managed resources and scheduling across the cluster using Azure Kubernetes Service. AKS can be used
to create, configure and manage a cluster of Virtual machines. 
• Extensively used Kubernetes which is possible to handle all the online and batch workloads required to
feed, analytics and machine learning applications.
• Used Azure DevOps and VSTS (Visual Studio Team Services) for CI/CD, Active Directory for
authentication and Apache Ranger for authorization.
• Using Informatica PowerCenter Designer analyzed the source data to Extract & Transform from various
source systems(oracle 10g,DB2,SQL server and flat files) by incorporating business rules using different
objects and functions that the tool supports.
• Using Informatica PowerCenter created mappings and mapplets to transform the data according to the
business rules.
• Tuned the Informatica mappings for optimal load performance
• Experience in working with Spark applications like batch interval time, level of parallelism, memory
tuning to improve the processing time and efficiency. 
• Used Scala for amazing concurrency support, and Scala plays the key role in parallelizing processing of
the large data sets.
• Developed map reduce jobs using Scala for compiling the program code into bytecode for the JVM for
data processing.
• Proficient in utilizing data for interactive Power BI dashboards and reporting purposes based on business
requirements.  

Environment: Azure HDInsight, Databricks (ADBX), DataLake (ADLS), CosmosDB, MySQL, Snowflake,
MongoDB, Teradata, Ambari, Flume, VSTS, Tableau, PowerBI, Azure DevOps, Ranger, Informatica,
Azure AD, Git, Blob Storage, Data Factory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS,
MapReduce, Yarn), Spark v2.0.2, Airflow, Hive, Sqoop, HBase.

Client: The Bridge Corp, Philadelphia, PA August 2021 – February


2022
Role: Data Engineer

Responsibilities:

• Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
• Created yaml files for each data source and including glue table stack creation
• Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena, Snowflake.
• Extensive experience on Hadoop ecosystem components like Hadoop, Map Reduce, HDFS, HBase, Hive,
Sqoop, Pig, ZooKeeper and Flume.  
• Developed a python script to hit REST API’s and extract data to AWS S3
• Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda,
AWS Glue and StepFunctions
• Expertise in modelling HANA calculation views, extend CDS views for external data consumption, flow graphs
design and SAP Predictive Analytics Library for building Machine learning models.
• Developed real time SLA monitoring dashboards in Tableau for the Kafka messages load in SapHANA
• Proposed an automated system using Shell script to Sqoop the job.
• Developed Spark jobs using Scala for faster real-time analytics and used Spark SQL for querying
• Data acquisition from REST API / json; data wrangling wifi Python and unix tools; segment and organize data
from disparate sources and data loading to Google Big Query
• Implemented data ingestion and handling clusters in real time processing using Kafka.
• Good knowledge on AWS cloud formation templates and configured SQS service through java API to send
and receive the information.
• Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic Map
Reduce.
• Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables
and connected to Tableau for generating interactive reports using Hive server2.
• Used Sqoop to channel data from different sources of HDFS and RDBMS.
• Extensive Experience on importing and exporting data using stream processing platforms
like Flume and Kafka.
• Enhance HANA costing models (SAP Purchasing/Order Management).
• Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation
templates.
• Deep analytics and understanding of Big Data and algorithms using Hadoop, Map Reduce, NoSQL and
distributed computing tools.
• Developed Oozie workflow schedulers to run multiple Hive and Pig jobs that run independently with time
and data availability.
• Experienced in troubleshooting errors in HBase Shell/API, Pig, Hive and map Reduce.
• Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation and
aggregation from multiple file formats.
• Used HBase/Phoenix to support front end applications that retrieve data using row keys
• Developed a strategy for Full load and incremental load using Sqoop.
• Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
• Worked on a python script to extract data from Netezza databases and transfer it to AWS S3
• Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers (SQS,
Event Bridge, SNS)
• Imported documents into HDFS, HBase and creating HAR files.
• Involved in in importing real time data to Hadoop using Kafka and implemented zombie runner job for daily
imports.
• Good Knowledge and experience in Amazon Web Service (AWS) concepts like EMR and EC2 web services
successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
• Involved in creating Hive QL on HBase tables and importing efficient work order data into Hive tables
• Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
• Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
• Using Hadoop on Cloud service (Qubole) to process data in AWS S3 buckets.
• Executing parameterized Pig, Hive, impala, and UNIX batches in Production.

Environment: Bigdata, Hadoop, Oracle, Pl/Sql, Scala, Spark-Sql, PySpark, Python, Kafka, Sas, Sql, Oozie,
Ssis, T-Sql, Etl, Hdfs, Cosmos, Aws, Zookeeper, hive, HBase.

Client: Hyundai auto ever America, Fountain valley, CA March 2020 – August
2021
Role: Big Data Engineer

Responsibilities:

• Responsible for building scalable distributed data solutions using Hadoop.


• Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
• Developed Simple to complex Map/Reduce Jobs using Hive and Pig.
• Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data
Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi). 
• Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
• Performed processing on large sets of structured, unstructured and semi structured data.
• Handled importing of data from various data sources, performed transformations using Hive,
MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
• Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
• Used Pig UDF's to implement business logic in Hadoop.
• Avoided MapReduce by using PySpark for boosting performance to 3x times.  
• Worked on RDD and DataFrame techniques in PySpark for processing data at a faster rate.  
• Involved in ETL Data Cleansing, Integration and Transformation using Hive and PySpark.
• Involved in setting up Spark notebook on Ubuntu Operating system.  
• Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real
time fraud detection.
• Used Spark to store data in-memory.
• Implemented batch processing of data sources using Apache Spark. 
• Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
• Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing
with Pig and HiveQL.
• Developed the Pig UDF’S to pre-process the data for analysis.
• Involved in creating Hive tables, loading with data and writing hive queries which will run internally in
map reduce way.
• Cluster co-ordination services through ZooKeeper.
• As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
• Worked with application teams to install operating system, Hadoop updates, patches, version upgrades
as required.
• Installed Oozie workflow engine to run multiple Hive and Pig jobs.
• Exported the analyzed data to the relational databases using Sqoop for visualization and to generate
reports for the BI team.

Environment: Hadoop, MapReduce, Horton Works, HDFS, Hive, SQL, Cloudera Manager, Pig, Apache
Sqoop, Spark, Oozie, HBase, AWS, PL/SQL, MySQL and Windows.

Client: IBM- Hyderabad, India             May 2018 – March


2020
Role: Hadoop Developer

Responsibilities:

• Involved in loading data into HBase NoSQL database.


• Building, Managing, and scheduling Oozie workflows for end-to-end job processing.
• Worked on Hortonworks-HDP 2.5distribution.
• Responsible for building-scalable distribution data solution using Hadoop.
• Involved in importing data from MS SQL Server, MySQL, and Teradata into HDFS using Sqoop.
• Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
• Wrote Hive QL queries for integrating different tables for create views to produce result set.
• Collected the log data from Web Servers and integrated into HDFS using Flume.
• Worked on loading and transforming of large sets of structured and unstructured data
• Used Map Reduce programs for data cleaning and transformations and load the output into the Hive tables
in different file formats.
• Worked on extending Hive and Pig core functionality by writing custom UDFs using Java.
• Analyzing of Large volumes of structured data using Spark SQL.
• Migrated Hive QL queries into Spark SQL to improve performance.

Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell
Scripting, Scala, Spark, SparkSQL.

You might also like