B

Professional Summary:
 Implement scalable and sustainable data engineering solutions using tools such as AWS, Azure Data Lake, Databricks,
Apache Spark using Scala and Python.
 Experience in developing Spark applications using Spark-SQL in Databricks.
 Experience in data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the
data to uncover insights using Azure Databricks and build analysis services cubes to be used by PowerBI.
 Experience as a Data Engineer in large scale Data Lake/Cloud Data Platform implementation projects.
 I have working experience on Apache Pulsar & Kafka which is a cloud-native, distributed messaging and streaming platform
originally created at Yahoo.
 5 years of programming experience in developing web-based applications and client-server technologies using Java, J2EE.
 Data migration experience from On-premises to Cloud.
 Building and implementing analytical tools to utilize the data pipeline, providing actionable insights into key business
performance metrics.
 Automate jobs on schedule or data trigger and availability to restart.
 Experience in writing workflows and scheduling jobs using Oozie.
 Experienced in working with Hive data warehouse tool creating tables, distributing data by implementing Partitioning and
Bucketing strategies, writing, and optimizing the HiveQL queries
 Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data
storage like Hive or RDBMS.
 Working Experience in CI/CD process which bridges the gaps between development and operation activities and teams by
enforcing automation in building, testing and deployment of applications and for complete automation of development and
deployment.
 Good experience with Apache Zeppelin (Multi-purpose Notebook). A web-based notebook that enables interactive data
analytics.
 Experience with version control (GITHUB), Bit Bucket & issue tracking (Jira).
 Experience in working in Agile/scrum delivery model.
Technical Skills:
Java/J2EE Java, J2EE, Struts, Liferay, Apache SOLR

Big Data tools Horton Works, Cloudera, Hadoop, HDFS, Map Reduce, Scala, Pig, Sqoop, Flume, Pig,
Hive, Impala, PySpark, Apache Nifi, Oozie, H2o
Cloud Technologies AWS, Microsoft AZURE, Data Bricks, Open-Source Delta Lake
Databases Oracle, PostgreSQL, DB2, SQL Server & MongoDB
Operating Systems Windows, UNIX
Scripting & HTML Python, JavaScript, Shell Scripting & HTML
IDE IBM RAD 7.0, IntelliJ, Eclipse 3.1, and Net Beans
Code Collaboration tools Bit Bucket, GITHUB, SVN, JIRA, Screwdriver, AZURE DevOps
Academics:
Bachelor of Technology from J.N.T University, Hyderabad.
Certifications:
 Microsoft Certified Azure Data Engineer Associate.
Senior Data Engineer. Duration: August 2022 to Till Date

Project Title: Aggrify/IRIS.
Client: Johnson & Johnson (New Jersey)
Description:
The Purpose of Aggrify is to track and manage ETL jobs that are run daily throughout Johnson and Johnson. The Aggrify application
has several reports that slice and present the data in different ways to help analyze the transformation of data across the company.
Business Deliverable - Name of the application as defined in the AGGRIFY database. A single Business Deliverable can have
multiple SLT IDs. This is to allow the application team to group their tasks and batches under separate but unique SLT IDs.
However, to allow grouping on the report, the teams can use the same business deliverable name for all their SLT IDS.
Environment: AWS, EMR, EC2, Hadoop 2.7.3, Spark3.3, Python 3.8, Open-Source Delta Lake, Kafka, Oracle, VS code.
Azure Data Engineer.
Project Title: ClaimSphere
Client: Molina Health Care, Long Beach (California) Duration: Nov 2020 to July 2022
Description: It is a digital platform that provides health plans to have a better understanding of their population, best care, and
reduced cost. It is a simple-to-use product used to analyze, monitor, intervene, and improve quality of care management.
Roles & Responsibilities:

 Design and implement ETL pipelines to consume data from data sources.
 Extensively worked on Migration of existing application from Cloudera to Azure Platform.
 Working with Databricks notebooks using Databricks utilities, magic commands etc.
 Working with Databricks Tables, Databricks File System (DBFS) etc.
 Developed code in Azure Data Bricks for the curation of the source data.
 Automation of the pipelines in Azure.
Project Title: Rx Surveillance Tool
Description: The Rx Surveillance project monitors invoice claims to ensure they are correctly processed based on the set of criteria or
rules, thousands of claims everyday are identified as outliers. Businesses need to verify that each claim has been adjudicated properly.
This tool works as a single repository of all the outliers claims and allows users to filter and writeback comments. Writeback allows
users to add comments to a claim or mass update to multiple claims. Comment updates are then saved in real time to a database.
Role and Responsibilities:

 Developed spark SQL code using Scala to migrate existing Impala code to enable faster and in memory computing.
 Developed scripts to load and process the data using Hive QL
 Worked on using Jenkins, an open- source automated build platform designed for CI/CD i.e., tests pull requests, builds the
merged commits, and deploying the code to respective non-prod and prod environments.
 Customized scheduling using AutoSys to run complete end to end flow of self-contained Model, which have a functionality
of integrated Spark functionality.
 Production Support
Environment: Microsoft AZURE, Cloudera Hadoop 6.3.3, Impala 3.2.0, Hive 2.1.1, Oozie, Spark 2.4, Scala 2.11.2, GIT HUB, JIRA,
Jenkins, IntelliJ, HUE, AutoSys, Data Bricks.
Sr Bigdata Engineer
Project Title: Network Personalization
Client: Verizon, Branchburg (New Jersey) Duration: March 2020 to Sep 2020
Description: NSP (Network Service Personalization) is one of the critical system applications in the Network Personalization program
that creates an interface that is specific to the needs of the Network Repair Bureau (NRB) technicians and ties into the functions they
perform day-to-day.
NSP leverages scoring data produced by Network Data Lake (NDL a Hadoop Platform) based on specific Businesses identified Key
Performance Indicators (KPIs) and presents all the customer experience in form of Interactive Grafana dashboard. NSP also combines
NRB ticket information, Provisioning Data, and record details in a single pane of glass used for network outage troubleshooting. It
includes automation of process streaming and removing unnecessary troubleshooting steps and downstream communication with other
applications to prevent redundant tickets from reaching the NRB.
The main objective of this project is to achieve better control over automation of Tickets/Customer Experience.

 Developed spark SQL code using Scala to migrate existing pig scripts to enable faster and in memory computing.
 Developed spark code to load the data from Hive tables and csv files and publish the data to Pulsar broker using batch and
stream processing.
 Developed scripts to load and process the data using Hive QL, Pig.
 Worked with Verizon Data science team to provide and integrate the KPIs scoring data into Spark code for scoring.
 Worked on using Screwdriver pipeline, an open-source automated build platform designed for CI/CD i.e., tests pull requests,
builds the merged commits, and deploying the code to respective non-prod and prod environments.
 Customized scheduling in Oozie to run complete end to end flow of self-contained Model, which have a functionality of
integrated Hive, Spark, and Pig jobs.
 Developed UDF in spark Scala for custom functionality.
Environment: Hadoop 2.8, HDFS, Pig 0.14.0, Hive 1.2, Oozie, Spark 2.4, Scala 2.11, GIT HUB, JIRA, Screwdriver, IntelliJ, HUE,
Jenkins
Sr Bigdata Engineer
Project Title: Payment Integrity
Client: Aetna Inc., Hyderabad (India)/New York (U.S) Duration: Sep 2015 to Dec 2019
Description: Aetna is a US based health care company, which sells traditional, and consumer directed health care insurance plans and
related services, such as medical, pharmaceutical, dental, behavioral health, long-term care, and disability plans. On average Aetna
receives 1 million claims each day. The sheer number of providers, members and plan types makes the pricing of these claims
incredibly complex. Through misinterpretation of provider contracts and human errors a small number of claims are paid improperly.
As part of the Data Science team all the data from the critical domains like Aetna Medicare, Traditional Group membership, Member,
Plan, Claim, Provider are migrated to Hadoop Environment. All the demographic information is moved from MySQL to Hadoop and
the analysis is done on the data. Also, the claims data will be moved from MQ to Hadoop and after processing the claims sending the
response back to MQ and build history to track all the changes corresponding to the processed claims.

 Design and implement data pipelines to consume data from heterogeneous data sources and build an integrated health
Insurance Claims view of data. Use Hortonworks data platform, which consists of Red Hat Linux edge nodes with
availability of Hadoop Distributed File System (HDFS). Data processing and storage is done across a 1000 node cluster.
 Developed scripts to load and process the data in Hive, Pig
 Worked on performance tuning of Hive and Pig queries to improve data processing and retrieving
 Worked closely with data scientists to provide the required data for building the model features.
 Created multiple Hive tables, implemented partitioning, dynamic partitioning in Hive for efficient data access.
 Developed scripts in scheduling the jobs using Zeke Framework.
 Worked on migration of pig scripts to Pyspark code to enable faster and in memory computing. Perform ad hoc analytics on
large/diverse data using PySpark.
 Customized scheduling in Oozie to run complete end to end flow of integrated Hive and Pig jobs.
 Extensively used GIT as a code repository for managing day agile project development process and to keep track of the
issues and blockers.
 Sharing the Production Outcome Analysis report from different jobs to the end users on a weekly basis.
 Developed UDF in Pyspark for the custom functionalities.
Environment: Hadoop 2.7, HDFS, Pig 0.14.0, Hive 0.13.0, Sqoop, Flume, Apache NIFI, Oozie, GIT, JIRA, Pyspark, H2o, Netezza,
MY SQL, Aginity.
Big Data Engineer

Project Title: iTunes OPS reporting & GBI Fraud Analytics
Client: Apple, Hyderabad (India) Duration: Nov 2014 to Sep 2015
Description: “iTunes OPS Reporting” is a near real time data warehouse and reporting solution for iTunes Online Store and acts as a
reporting system for external and operational reporting needs. It also publishes data to downstream systems like Piano (for Label
Reporting) and ICA (for Business Objects Reporting, Campaign List Pull and Analytics). De-normalized data is used for publishing
various reports to the users. In addition, this project caters to the need of the ITS (iTunes Store) Business user groups. A lot of
complex analytical expertise is required which involves a lot of domain knowledge, detailed understanding of the features of iTunes,
its data flow and measuring the accuracy of the system in place.
Role and Responsibilities

 Worked on Analyzing the Tera Data Procedures.
 Worked on Developing the Graffle Design Documents for the Teradata Procedures in Hive
 Creation of design document for implementing HQL for the same.
 Developed HQL scripts for creating the tables and populating the data.
 Worked on testing the Map Side Joins for Performance.
 Developed Oozie scripts for automation.
 Production Support.
Environment: Hadoop 2.x7, HDFS, Hive 0.13.0, Oozie, GIT, JIRA, Java, Teradata

B

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B

Uploaded by

Copyright:

Available Formats

Professional Summary:

Java/J2EE Java, J2EE, Struts, Liferay, Apache SOLR

Senior Data Engineer. Duration: August 2022 to Till Date

Roles & Responsibilities:

Project Title: Rx Surveillance Tool

Role and Responsibilities:

Role and Responsibilities:

Role and Responsibilities:

Big Data Engineer

Role and Responsibilities

You might also like