Sanket Patel

(201) 917-8118


Around 10+ years of technical expertise in all phases of SDLC(Software Development Life
Cycle) which includes a major concentration on Big Data analyzing frame works, various
Relational Databases, NoSQL Databases and Java/J2EE technologies with highly
recommended software practices.
● Worked on various diversified Enterprise Applications concentrating in Banking,
Financial, Health Care sectors as a Big Data Engineer with good understanding of Hadoop
framework and various data analyzing tools.
● 4+ years of industrial IT experience in Data manipulation using Big Data Hadoop Eco
system components Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, Hbase, Spark, Kafka, Flume,
Sqoop, Flume, Oozie, Avro, AWS, Spark integration with Cassandra, Solr and Zookeeper.
● Extensive Experience in working with Cloudera (CDH4 & 5), and Hortonworks Hadoop
distros and AWS Amazon EMR, to fully leverage and implement new Hadoop features.
● Hands on experience with data ingestion tools Kafka, Flume and workflow management
tools Oozie.
● Strong experience and knowledge of real time data analytics using Spark Streaming,
Kafka and Flume. Good experience in writing Spark applications using Python and Scala.
● Experience in developing Spark Programs for Batch and Real-Time Processing.
Developed Spark Streaming applications for Real Time Processing.
● Implemented pre-defined operators in spark such as map, flat Map, filter, Reduce By Key,
Group By Key, Aggregate By Key and Combine By Key etc.
● Involved in converting Cassandra/Hive/SQL queries into Spark transformations using
RDD's and Scala.
● Knowledge about unifying data platforms using Kafka producers/ consumers, implement
pre-processing using storm topologies
● Experience processing Avro data files using Avro tools and MapReduce programs.
● Hands on experience in writing Map Reduce programs using Java to handle different data
sets using Map and Reduce tasks.
● Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
● Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table
● Implemented Ad-hoc query using Hive to perform analytics on structured data.
● Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into
Hive Queries.
● Experienced in optimizing Hive queries by tuning configuration parameters.
● Involved in designing the data model in Hive for migrating the ETL process into Hadoop
and wrote Pig Scripts to load data into Hadoop environment.
● Compared performance on hive and Big SQL for our data warehousing systems.
● Written and implemented custom UDF's in Pig for data filtering.
● Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
● Extensively used Apache Flume to collect the logs and error messages across the cluster.
● Worked on NoSQL databases like HBase, Cassandra and MongoDB.
● Used Cassandra CQL with Java API's to retrieve data from Cassandra tables.
● Extracted and updated the data into MONGODB using MONGO import and export
command line utility interface.
● Experience in composing shell scripts to dump the shared information from MySQL
servers to HDFS.
● Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data
● Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services
(AWS), VPC, EC2, Route S3, Dynamo DB, Elastic Cache Glacier, RRS, Cloud Watch, Cloud
Front, Kinesis, Redshift, SQS, SNS, RDS.
● Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient
processing of Big Data.
● Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP,
EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD,
XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
● Worked with Oozie and Zoo-keeper to manage the flow of jobs and coordination in the
● Experience in performance tuning, monitoring the Hadoop cluster by gathering and
analyzing the existing infrastructure using Cloudera manager.
● Added security to the cluster by integrating Kerberos.
● Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs
● Worked with BI(Business Intelligence) teams in generating the reports and designing
ETL workflows on Tableau.
● Experienced writing Test cases and implement unit test cases using testing frame works
like J-unit, Easy mock and mockito.
● Worked on Talend Open Studio and Talend Integration Suite.
● Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box,
● Good knowledge of Web/Application Servers like Apache Tomcat, IBM WebSphere and
Oracle WebLogic.
Work Experience
Sr. Spark/ Scala developer
Microsoft, WA
May 2018 to Present
Client Description: Microsoft is an American multinational innovation organization with
base camp in Redmond, Washington. It creates, fabricates, licenses, backings and offers PC
programming, buyer hardware, PCs, and administrations. The organization likewise
delivers an extensive variety of other customer and undertaking programming for work
areas and servers, including the computerized administrations, advertise (through MSN),
blended reality (HoloLens), distributed computing (Azure) and programming advancement
(Visual Studio).
• Worked with the Spark for improving the performance and optimization of the existing
algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark
• Serializing JSON data and storing the data into tables using Spark SQL.
• Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's
and Scala.
• Knowledge of cloud infrastructure technologies in Azure.
• Experience with Microsoft Azure Cloud services, Storage Accounts, Azure date storage,
Azure Data Factory, Data Lake and Virtual Networks.
• Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
• Worked with Azure Monitoring and Data Factory.
• Supported migrations from on premise to Azure.
• Providing support services to enterprise customers related to Microsoft Azure Cloud
networking and experience in handling critical situation cases.
• Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for
faster processing of data.
• Experience in writing Shell scripts to automate the process flow.
• Experience in performing business analytical scripts using HiveSQL.
• Provided consulting and cloud architecture for premier customers and internal projects
running on MS Azure platform for high-availability of services, low operational costs.
• Optimized test content and process with a reduction of 20% in false positives. Used SQL
and excel to pull, analyze, polish and visualize data.
• Followed agile methodology and SCRUM meetings to track, optimize and tailored features
to customer needs.
Environment: Hadoop, Hive, Spark, Spark-SQL, Spark-Streaming, Scala, Hortonworks,
Azure Data factory, Azure Storage and Agile Methodologies.
Sr. Spark/ Scala developer
Northern trust - Chicago, IL
April 2017 to May 2018

• Worked with Spark for improving performance and optimization of the existing
algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame sand Pair RDD's.
• Performed advanced procedures like text analytics and processing, using the in-memory
computing capabilities of Spark using Scala.
• Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and
process data in the form of Data Frame and save the data as Parquet format in HDFS.
• Experienced in writing real-time processing and core jobs using Spark Streaming with
Kafka as a data pipe-line system.
• Configured Spark streaming to get ongoing information from the Kafka and store the
stream information to HDFS.
• Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the
Scala API.
• Experienced in using the spark application master to monitor the spark jobs and capture
the logs for the spark jobs.
• Worked on Spark using Python and Spark SQL for faster testing and processing of data.
• Implemented Spark sample programs in python using pyspark.
• Analyzed the SQL scripts and designed the solution to implement using pyspark.
• Developed pyspark code to mimic the transformations performed in the on-premise
• Developed multiple Kafka Producers and Consumers as per the software requirement
• Used Spark Streaming APIs to perform transformations and actions on the fly for building
common learner data model which gets the data from Kafka in near real time and persist it
to Cassandra.
• Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
• Used Kafka and Kafka brokers, initiated the spark context and processed live streaming
information with RDD and Used Kafka to load data into HDFS and NoSQL databases.
• Used Zookeeper to store offsets of messages consumed for a specific topic and partition
by a specific Consumer Group in Kafka.
• Used Kafka functionalities like distribution, partition, replicated commit log service for
messaging systems by maintaining feeds and created applications, which monitors
consumer lag within Apache Kafka clusters.
• Using Spark-Streaming APIs to perform transformations and actions on the fly for
building the common learner data model.
• Involved in Cassandra Cluster planning and had good understanding in Cassandra cluster
• Responsible in development of Spark Cassandra connector to load data from flat file to
Cassandra for analysis, modified Cassandra.yaml and files to set various
configuration properties.
• Used Sqoop to import the data on to Cassandra tables from different relational databases
like Oracle, MySQL and Designed Column families.
• Responsible in development of Spark Cassandra connector to load data from flat file to
Cassandra for analysis.
• Developed efficient MapReduce programs for filtering out the unstructured data and
developed multiple MapReduce jobs to perform data cleaning and preprocessing on
• Used Hortonworks Apache Falcon for data management and pipeline process in the
Hadoop cluster.
• Implemented Data Interface to get information of customers using Rest API and Pre-
Process data using MapReduce 2.0 and store into HDFS (Hortonworks).
• Maintained ELK (Elastic Search, Logstash, and Kibana) and Wrote Spark scripts using
Scala shell.
• Worked in AWS environment for development and deployment of custom Hadoop
• Strong experience in working with ELASTIC MAPREDUCE (EMR)and setting up
environments on Amazon AWS EC2 instances.
• Written Oozie workflow to run the Sqoop and HQL scripts in Amazon EMR.
• Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and
usefulness of Python into Pig Latin and HQL (HiveQL).
• Developed shell scripts to generate the hive create statements from the data and load
data to the table.
• Involved in writing custom Map-Reduce programs using java API for data processing.
• The Hive tables are created as per requirement were Internal or External tables defined
with appropriate static, dynamic partitions and bucketing, intended for efficiency.
• Got chance working on Apache NiFi like executing Spark script, Sqoop scripts through
NiFi, worked on creating scatter and gather pattern in NiFi, ingesting data from Postgres to
HDFS, Fetching Hive metadata and storing in HDFS, created a custom NiFi processor for
filtering text from Flow files etc.
• Cluster coordination services through Zookeeper.
Environment: HDP 2.3.4, Hadoop, Hive, HDFS, Spark, Spark-SQL, Spark-Streaming, Scala,
KAFKA, AWS, Cassandra, Hortonworks, ELK, Java and Agile Methodologies.
Spark/Scala Developer
Molina healthcare - Long Beach, CA
Jan 2016 to April 2017
• Performed advanced procedures like text analytics and processing using the in-memory
computing capabilities of Spark.
• Developed Spark code using Scala and Spark-SQL for faster processing and testing.
• Worked on Spark SQL for joining multi hive tables and write them to a final hive table and
stored them on S3.
• Implemented Spark RDD transformations to Map business analysis and apply actions on
top of transformations.
• Created Spark jobs to do lighting speed analytics over the spark cluster.
• Evaluated Spark's performance vs Impala on transactional data. Used Spark
transformations and aggregations to perform min, max and average on transactional data.
• Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
• Worked on MongoDB for distributed storage and processing.
• Used MongoDB to store processed products and commodities data, which can be further
down streamed into web application (Green Box/ Zoltar).
• Responsible to store processed data into MongoDB.
• Experienced in migrating Hive QL into Impala to minimize query response time.
• Experience using Impala for data processing on top of HIVE for better utilization.
• Performed querying of both managed and external tables created by Hive using Impala.
• Developed Impala scripts for end user / analyst requirements for adhoc analysis.
• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
• Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
• Collected data using Spark Streaming from AWSS3 bucket in near-real- time and performs
necessary Transformations and Aggregations to build the data model and persists the data
in HDFS.
• Responsible in creating Hive tables, loading with data and writing Hive queries.
• Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation
function on multiple rows.
• Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
• Wrote Map Reduce jobs using Java API and Pig Latin.
• Responsible in creating mappings and workflows to extract and load data from relational
databases, flat file sources and legacy systems using Talend.
• Fetch and generate monthly reports, Visualization of those reports using Tableau.
• Used Oozie Workflow engine to run multiple Hive and Pig jobs.
• Expertise with web based GUI architecture and development using HTML, CSS, AJAX,
JQuery, Angular Js, and JavaScript.
Environment: Hadoop, Cloudera, Flume, HBase, HDFS, MapReduce, YARN, Hive, Pig, Sqoop,
Oozie, Tableau, Java, Solr, JUnit, agile methodologies
Big Data Hadoop Consultant
Bank of America - Hyderabad, India
September 2012 to Feb 2014
• Collected and aggregated large amounts of web log data from different sources such as
web servers, mobile and network devices using Apache Flume and stored the data into
HDFS for analysis.
• Collecting data from various Flume agents that are imported on various servers using
Multi-hop flow.
• Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume.
• Worked with NoSQL databases like HBase in making HBase tables to load expansive
arrangements of semi structured data.
• Involved in transforming data from Mainframe tables to HDFS, and HBase tables using
• Acted for bringing in data under HBase using HBase shell also HBase client API.
• Experienced with handling administration activations using Cloudera manager.
• Involved in developing Impala scripts for extraction, transformation, loading of data into
data warehouse.
• Experience working with Apache SOLR for indexing and querying.
• Created custom SOLR Query segments to optimize ideal search matching.
• Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental
load on variety of sources like web server, RDBMS and Data API's.
• Installed Oozie workflow engine to run multiple Hive and Pig jobs which run
independently with time and data availability.
• Implemented the workflows using Apache Oozie framework to automate tasks.
• Involved in migrating tables from RDBMS into Hive tables using SQOOP and later
generate visualizations using Tableau.
• Involved in writing optimized Pig Script along with developing and testing Pig Latin
Java Developer
Sakthi Finance, India - IN
August 2009 to September 2012
Client Description: Sakthi Finance offers various financing schemes to cater to the funding
requirements of commercial vehicle operators. The company is one of the leading
companies engaged in commercial vehicle financing.
• Prepare Functional Requirement Specification and done coding, bug fixing and support.
• Involved in various phases of Software Development Life Cycle (SDLC) as requirement
gathering, data modeling, analysis, architecture design & development for the project.
• Implemented GUI pages by using JSP, JSTL, HTML, XHTML, CSS, JavaScript, and AJAX.
• Developed JSPs and Servlets to dynamically generate HTML and display the data to the
client side.
• Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies
• Responsible for deploying the application using WebSphere Server and worked with
SOAP, XML messaging.
• Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking,
CONFLUENCE for documentation purpose, GIT for version control, ARC (Advanced Rest
Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.
• Used JUnit to develop Test cases for performing Unit Testing.
• Used JSP and JSTL Tag Libraries for developing User Interface components.
• Configured log4j to log the warning and error messages.
• Developing new and maintaining existing functionality using SPRING MVC, Hibernate.
• Used JIRA as a bug-reporting tool for updating the bug report.
Environment: HTML, Java Script, Ajax, Servlets, JSP, JavaScript, CSS, XML, ANT, GIT,
ECLIPSE, Soap, Log4j, JIRA.
Junior Java Developer
Tmw systems Chennai, India
August 2007 to July 2009

• Actively involved from fresh start of the project, requirement gathering to quality
assurance testing.
• Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
• Involved in gathering business requirements, analyzing the project and created UML
diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts
• Working on developing client-side Web Services components using Jax-Ws technologies.
• Extensively worked on JUnit for testing the application code of server-client data
• Developed front end using JSTL, JSP, HTML, and Java Script.
• Creating new and maintaining existing web pages build in JSP, Servlet.
• Extensively worked on Views, Stored Procedures, Triggers and SQL queries and for
loading the data (staging) to enhance and maintain the existing functionality.
• Involved in developing Web Services using SOAP for sending and getting data from
external interface.
• Involved in Database design and developing SQL Queries, stored procedures on MySQL.
• Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing
payments to/from customers.
• Developed Hibernate Mapping file (. hbm.xml) files for mapping declarations.
• Writing/Manipulating the database queries, stored procedures for Oracle9i.
Environment: Java JDK (1.5), Java/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP,
HTML, JavaScript, JMS, Servlets, UML, XML, WSDL, SOAP, UDDI, JUnit.
Hadoop (7 years), Hive (7 years), HADOOP (7 years), JAVA (7 years), SQL (8 years),
Drupal(Less than 1 year), NET (Less than 1 year), MVC (Less than 1 year), C# (Less than 1
year), Angular (Less than 1 year)
Driver's License
Additional Information
Technical Skills:

Big Data Ecosystems Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie,
Flume, Yarn, Spark, NiFi
Database Languages SQL, PL/SQL, Oracle
Programming Languages Java, Scala
Frameworks Spring, Hibernate, JMS
Scripting Languages JSP, Servlets, JavaScript, XML, HTML, Python
Web Services RESTful web services
Databases RDBMS, HBase, Cassandra
IDE Eclipse, IntelliJ
Platforms Windows, Linux, Unix
Application Servers Apache Tomcat, Web Sphere, Web logic, JBoss
Methodologies Agile, Waterfall
ETL Tools Talend

