Professional Documents
Culture Documents
B
B
Implement scalable and sustainable data engineering solutions using tools such as AWS, Azure Data Lake, Databricks,
Apache Spark using Scala and Python.
Experience in developing Spark applications using Spark-SQL in Databricks.
Experience in data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the
data to uncover insights using Azure Databricks and build analysis services cubes to be used by PowerBI.
Experience as a Data Engineer in large scale Data Lake/Cloud Data Platform implementation projects.
I have working experience on Apache Pulsar & Kafka which is a cloud-native, distributed messaging and streaming platform
originally created at Yahoo.
5 years of programming experience in developing web-based applications and client-server technologies using Java, J2EE.
Data migration experience from On-premises to Cloud.
Building and implementing analytical tools to utilize the data pipeline, providing actionable insights into key business
performance metrics.
Automate jobs on schedule or data trigger and availability to restart.
Experience in writing workflows and scheduling jobs using Oozie.
Experienced in working with Hive data warehouse tool creating tables, distributing data by implementing Partitioning and
Bucketing strategies, writing, and optimizing the HiveQL queries
Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data
storage like Hive or RDBMS.
Working Experience in CI/CD process which bridges the gaps between development and operation activities and teams by
enforcing automation in building, testing and deployment of applications and for complete automation of development and
deployment.
Good experience with Apache Zeppelin (Multi-purpose Notebook). A web-based notebook that enables interactive data
analytics.
Experience with version control (GITHUB), Bit Bucket & issue tracking (Jira).
Experience in working in Agile/scrum delivery model.
Technical Skills:
Business Deliverable - Name of the application as defined in the AGGRIFY database. A single Business Deliverable can have
multiple SLT IDs. This is to allow the application team to group their tasks and batches under separate but unique SLT IDs.
However, to allow grouping on the report, the teams can use the same business deliverable name for all their SLT IDS.
Environment: AWS, EMR, EC2, Hadoop 2.7.3, Spark3.3, Python 3.8, Open-Source Delta Lake, Kafka, Oracle, VS code.
Azure Data Engineer.
Project Title: ClaimSphere
Client: Molina Health Care, Long Beach (California) Duration: Nov 2020 to July 2022
Description: It is a digital platform that provides health plans to have a better understanding of their population, best care, and
reduced cost. It is a simple-to-use product used to analyze, monitor, intervene, and improve quality of care management.
Description: The Rx Surveillance project monitors invoice claims to ensure they are correctly processed based on the set of criteria or
rules, thousands of claims everyday are identified as outliers. Businesses need to verify that each claim has been adjudicated properly.
This tool works as a single repository of all the outliers claims and allows users to filter and writeback comments. Writeback allows
users to add comments to a claim or mass update to multiple claims. Comment updates are then saved in real time to a database.
Environment: Microsoft AZURE, Cloudera Hadoop 6.3.3, Impala 3.2.0, Hive 2.1.1, Oozie, Spark 2.4, Scala 2.11.2, GIT HUB, JIRA,
Jenkins, IntelliJ, HUE, AutoSys, Data Bricks.
Sr Bigdata Engineer
Project Title: Network Personalization
Client: Verizon, Branchburg (New Jersey) Duration: March 2020 to Sep 2020
Description: NSP (Network Service Personalization) is one of the critical system applications in the Network Personalization program
that creates an interface that is specific to the needs of the Network Repair Bureau (NRB) technicians and ties into the functions they
perform day-to-day.
NSP leverages scoring data produced by Network Data Lake (NDL a Hadoop Platform) based on specific Businesses identified Key
Performance Indicators (KPIs) and presents all the customer experience in form of Interactive Grafana dashboard. NSP also combines
NRB ticket information, Provisioning Data, and record details in a single pane of glass used for network outage troubleshooting. It
includes automation of process streaming and removing unnecessary troubleshooting steps and downstream communication with other
applications to prevent redundant tickets from reaching the NRB.
The main objective of this project is to achieve better control over automation of Tickets/Customer Experience.
Environment: Hadoop 2.8, HDFS, Pig 0.14.0, Hive 1.2, Oozie, Spark 2.4, Scala 2.11, GIT HUB, JIRA, Screwdriver, IntelliJ, HUE,
Jenkins
Sr Bigdata Engineer
Project Title: Payment Integrity
Client: Aetna Inc., Hyderabad (India)/New York (U.S) Duration: Sep 2015 to Dec 2019
Description: Aetna is a US based health care company, which sells traditional, and consumer directed health care insurance plans and
related services, such as medical, pharmaceutical, dental, behavioral health, long-term care, and disability plans. On average Aetna
receives 1 million claims each day. The sheer number of providers, members and plan types makes the pricing of these claims
incredibly complex. Through misinterpretation of provider contracts and human errors a small number of claims are paid improperly.
As part of the Data Science team all the data from the critical domains like Aetna Medicare, Traditional Group membership, Member,
Plan, Claim, Provider are migrated to Hadoop Environment. All the demographic information is moved from MySQL to Hadoop and
the analysis is done on the data. Also, the claims data will be moved from MQ to Hadoop and after processing the claims sending the
response back to MQ and build history to track all the changes corresponding to the processed claims.
Environment: Hadoop 2.7, HDFS, Pig 0.14.0, Hive 0.13.0, Sqoop, Flume, Apache NIFI, Oozie, GIT, JIRA, Pyspark, H2o, Netezza,
MY SQL, Aginity.
Description: “iTunes OPS Reporting” is a near real time data warehouse and reporting solution for iTunes Online Store and acts as a
reporting system for external and operational reporting needs. It also publishes data to downstream systems like Piano (for Label
Reporting) and ICA (for Business Objects Reporting, Campaign List Pull and Analytics). De-normalized data is used for publishing
various reports to the users. In addition, this project caters to the need of the ITS (iTunes Store) Business user groups. A lot of
complex analytical expertise is required which involves a lot of domain knowledge, detailed understanding of the features of iTunes,
its data flow and measuring the accuracy of the system in place.
Environment: Hadoop 2.x7, HDFS, Hive 0.13.0, Oozie, GIT, JIRA, Java, Teradata