Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Summary:

9+ years of experience with strong emphasis on methodologies to process big data using Hadoop, Spark, Java/J2EE,
Scala and Python. Around 6 years of experience in working with different Hadoop Ecosystem development including
MapReduce, HDFS, Hive, Pig, Spark SQL, Spark Streaming, YARN, Kafka, HBase, Cassandra, Zoo Keeper, Sqoop,
Flume, Impala, Oozie and Storm.

Skills Inventory:
● Experience in different Hadoop distributions like Cloudera (CDH) and Horton works Distributions (HDP) and Elastic
Map Reduce (EMR).
● Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient
processing of Big Data.
● Strong experience in writing complex Pig Scripts, Hive data modeling and core functionality by writing custom UDFs.
● Hands-on experience with message brokers such as Apache Kafka.
● Comprehensive knowledge in Debugging, Optimizing and Performance Tuning of DB2, Oracle and MySQL
databases.
● Experience in handling different file formats like Parquet, Apache Avro, Sequence file, JSON, Spreadsheets, Text
files, XML and Flat file format.
● Expertise in writing Shell-Scripts and Python Scripts, Cron Automation and Regular Expressions.
● Working knowledge of ETL tools like Informatica/Power Center, RDBMS/Oracle, SDLC, QA/UAT & Technical
documentation.
● Data visualization using Tableau, Pentaho, and Google Visualization API.
● Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile
Methodology and Scrum software development processes.

Professional Experience:
Client: American Express, Phoenix, AZ Nov, 2021 to Till Present
Role: Sr. Hadoop/Big Data Developer

Responsibilities:
● Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Spark, Kafka,
Hive, Cassandra, Oozie, ZooKeeper, Elastic search, Consul, Kubernetes with Cloudera Distribution.
● Real time streaming of data using Spark with Kafka.
● Experienced in implementing Spark RDD transformations, actions to implement business analysis.
● Created Hive tables and involved in data loading and writing Hive UDFs.
● Loaded data into the cluster from dynamically generated files and from relational database management systems.
● Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
● Worked on the framework to generate the different kinds of reports and a framework was developed to parse those
reports to acknowledge and send out the confirmation if it is accepted/rejected.
● Experience in writing and implementing the Cassandra queries.
● Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator
jobs in Oozie.
Environment: Spark, Spark Streaming, Spark SQL, Kafka, Hadoop, MapReduce, HDFS, Hive, Cassandra, Elastic
search, Consul, Kubernetes, Git, Java, Scala, Python, Shell Scripting.

Client: Florida Blue, Jacksonville, FL Nov, 2018 to Oct,2021


Role: Sr. Hadoop Developer

Responsibilities:
● Expertise with Hadoop cluster and various Big Data analytic tools including Spark, Spark Streaming, Hive, Kafka,
Sqoop, HBase and Oozie with both Hortonworks and Cloudera Hadoop distribution.
● Developed Spark/Scala code and Spark SQL for Data Analysis of the legacy system to provide the information to the
business with the user id using those tables so as to retire the tables with the information.
● Gathered requirements and designed solutions to migrate the Ab-Initio jobs into spark/Scala and moved the data from
DB2 to hive by using Sqoop.
● Experience in building data dictionaries, functions and synonyms for NoSQL (Elasticsearch).
● Created stored procedures to import data into Elasticsearch engine.
● Worked on streaming data using Kafka with multiple producers, to produce data into Kafka topics with multiple
partitions, also spark streaming to consume data from Kafka topics and write into HDFS.
● Transformed and landed large sets of structured, semi-structured and unstructured data into HDFS using Sqoop
imports.
● Create the DDL for the hive table/hive view on top of HBase and Wrote Hive queries for data analysis to meet the
business requirements.
● Read the data from raw schema and apply transformations by using the Spark/Python and Spark SQL, write it to
HBase for analysis and DB2 for the business users.
● Worked on XL, Sequence files, ORC, bucketing, partitioning for hive performance enhancement and storage
improvement.
● Writing build scripts for various tasks to automate the build process and deployment using the Jenkins continuous
integration tool to make the build passes before deploying the code to all environments (DEV, TEST, UAT, Prod).
● Write the shell scripts to run the job in production through the Control-M jobs scheduler.
Environment: AWS S3, EMR, Spark Core, Spark SQL, Spark Streaming, Hadoop, Map Reduce, HDFS, Hive, Python,
Scala, SQL, Sqoop, HBase, Kafka, Zookeeper, IBM DB2, IBM Sailfish, Microsoft SQL Server, Oracle, Elastic Search, Git,
Jenkins and Control-M.

Client: Hexstream, Boston, MA May, 2016 to Aug,


2018
Role: Hadoop/Spark Developer

Responsibilities:
● Expertise on designing and deployment of Hadoop cluster and various Big Data analytic tools including Spark, Spark
Streaming, Hive, Sqoop, HBase, NiFi and Oozie with AWS Elastic MapReduce Hadoop distribution.
● Configured, deployed and maintained multi-node development and Tested Kafka Clusters and Kafka brokers to
pipeline server logs data into Spark streaming.
● Used AWS services like EC2 for deploying NiFi to take the data from SQL Server and store it in S3.
● Collects data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations
and Aggregations to build the data model and persists the data in HDFS. To divide streaming data into batches as an
input to Spark engine used Spark Streaming for batch processing.
● Developed Spark code using Scala and Spark-SQL to process the huge amount of data and Implemented Spark
RDD, DataFrames transformations, actions to migrate Map reduce algorithms.
● Used D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
● Worked with parsing XML files using Map reduce to extract attributes and store it in HDFS.
● Worked on Sequence files, ORC files, bucketing, partitioning for hive performance enhancement and storage
improvement.
● Used Bash Shell Scripting, Sqoop, AVRO, Hive, Scala, Map/Reduce daily to develop ETL, batch processing, and data
storage functionality.
● Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Environment: AWS S3, EMR, Athena, Spark Core, Spark SQL, Spark Streaming, Hadoop, Map Reduce, HDFS, Hive,
Python, Scala, SQL, Sqoop, HBase, Zookeeper, Apache NiFi, SQL Server, and Oozie.

IGATE, Bangalore, India Oct, 2011 - Dec, 2014


Role: Java/Hadoop Developer

Responsibilities:
● Involved in the complete SDLC software development life cycle of the application which includes requirement
analysis, design, development, and testing.
● Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP, Servlets, EJB,
Form Bean and Action classes.
● Involved in writing SQL queries & PL/SQL - Stored procedures, functions, triggers, cursors, object, types, sequences,
indexes etc.
● Developed UI using JavaScript, JSP, HTML, CSS and AngularJS for interactive cross browser functionality and
complex user interface.
● Developed web based presentation-using JSP, AJAX using Servlet’s technologies and implemented using struts
framework.
● Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Environment: Java, Spring MVC, Struts, Hibernate, JavaScript, JSP, AJAX, IBM WebSphere, Apache Tomcat, Oracle
10g, SQL, XML, UML, Eclipse.

You might also like