Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Shesh Raj

Email ID: sheshrajc11@gmail.com


Contact: 469-294-5069

Professional Summary:

 Highly dedicated with over 8 Plus years of IT industry experience exploring various technologies, tools and databases like
Big Data, AWS, S3, Snowflake, Hadoop, Hive, Spark, python, Sqoop, CDL(Cassandra), Teradata, Tableau, SQL, PLSQL
and Redshift.
 Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google
Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer.
 Hands-on experience on implementing multiple Azure Big Data and Data transformation in the various industries like
Banking and Financial Services, High Tech and Utilities industries.
 Expertise in creating Apache Spark program using Python(PySpark) to build a topology and well versed in using Datasets
and dataframes.
 Good Knowledge and hands-on experience in Azure Cloud services like Azure Databricks, Azure Data Factory, Azure
Storage, Azure SQL, and Azure DW.
 Strong knowledge on creating and monitoring Hadoop clusters on Azure Databricks
 Around 5 years of experiences in End-to-End in Big Data Application Designing with strong experience on major
components of Hadoop Ecosystem like Apache Hadoop Map Reduce, Sorm, Spark, HDFS, HIVE, PIG, HBase,
ZooKeeper, Sqoop, Oozie, Kafka, Python, Scala, Cassandra.
 4+ years of experience in NoSQL databases HBase & MangoDB and search engines like Solr and Elastic Search.
 Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data
Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases
to Azure Data lake store using Azure Data factory.
 Skillful in building big data applications with open-source frameworks like Hadoop, HIVE, Apache Spark, MapReduce,
and Python.
 Good at developing applications with various Azure Components like HDInsight, Data Factory, Data Lake, Storage and
Machine Learning Studio.
 Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and
efficient processing of Teradata Big Data Analytics and using S3 as a storage mechanism.
 Working experience on ETL implementation using AWS services like Glue, Lambda, EMR,Athena,S3,SNS,Kinesis, Data-
Pipelines, PySpark, etc. is required.
 Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory,
Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
 Working experience with Functional programming languages like Scala, and Java.
 Hands on experience in CI/CD tool Jenkins and containerization tools like Docker, Kubernetes
(K8s).
 Experience in optimization of Map reduce algorithm using combiners and partitioners to deliver best results.
 Experience in importing and exporting relational data using Sqoop and custom map-reduce into HDFS.
 Involved in developing Dashboards using Tableau and PowerBI to generate the reports.
 Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sig
 Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached &
Redis)
 Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache,
Dynamo DB, Redshift, RDS, Athena, Zeppelin & Airflow.
 Experienced in Manipulating, merging and creating restructured datasets from big data using Machine Learning, Hadoop,
SAS, Python, SQL and built predictive models.
 Developed Reusable solution to maintain proper coding standard across different java project.
 Very good in Application Development and Maintenance of SDLC projects using different programming languages such
as Java, C, Scala, SQL, and NoSQL.
 Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions,
cursors, Index, triggers, and packages.
 Have good knowledge on NoSQL databases like HBase, Cassandra, MongoDB and REDIS
 Expertise in core Java, J2EE, Multithreading, JDBC, Shell Scripting and proficient in using Java APIs for application
development.
 Extensive experience in SQL, Stored Procedures, Functions and Triggers with databases such as Oracle, IBM DB2 and
MS SQL Server.
 2+ years of experience as Azure Cloud Data Engineer in Microsoft Azure Cloud technologies including Azure Data
Factory(ADF), Azure Data Lake Storage(ADLS), Azure Synapse Analytics(SQL Data warehouse), Azure SQL Database,
Azure Analytical services, Polybase, Azure Cosmos NoSQLDB, Azure Key vaults, Azure Devops, Azure HDInsight
BigData Technologies like Hadoop, Apache Spark and Azure Data bricks.
 Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, and Hue. 
 Setting up the High-Availability for Hadoop Clusters components and Edge nodes.
 In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task
Tracker, Name Node, Data Node and MapReduce concepts
 Worked on Cascading API for Hadoop application development and workflows.
 Very Good understanding of SQL, ETL and Data Warehousing Technologies and Have sound knowledge on designing
data warehousing applications with using Tools like Teradata, Oracle, and SQL Server
 Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
 Expertise in using Kafka as a messaging system to implement real-time Streaming solutions and implemented Sqoop for
large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
 Worked on Apache Nifi as ETL tool for batch processing and real time processing.
 Experience in migrating ETL process into Hadoop, Designing Hive data model and wrote Pig Latin scripts to load data
into Hadoop.

TECHNICAL SKILLS:

Hadoop/ Big Data Apache Spark, Hadoop, HDFS, Map Reduce, PIG, Hive, Sqoop, Oozie, Flume,
HBase, YARN, Cassandra, Phoenix, Airflow
Frameworks Hibernate, Spring, Cloudera CDHs, Hortonworks HDPs, MAPR
Programming & Scripting Languages Java, Python, R, C, C++, HTML, JavaScript, XML, Git
Database Oracle 10g/11g, PostgreSQL, DB2, SQL Server, MySQL, Redshift
NoSQL Database HBase, Cassandra, MongoDB
IDE Eclipse, Net beans, Maven, STS (Spring Tool Suite), Jupyter Notebook
ETL Tools Pentaho, Informatica, Talend
Reporting Tool Tableau, PowerBI
Operating Systems Windows, UNIX, Linux, Sun Solaris
Testing Tools Junit, MRUnit
EMR, Glue, Athena, Dynamo DB, Redshift, RDS, Data Pipelines, Lake formation,
AWS
S3, IAM, CloudFormation, EC2, ELB/CLB.
Data Lakes, Data Factory, SQL Data warehouse, Data Lake Analytics, Databricks,
Azure
other azure services.

Professional experience

Advent Health- Orlando, FL Aug 2020 - Present


Big Data Developer
Roles and Responsibilities:

 Designing the applications from the ingestion to reports delivery to third party vendors using big data technologies
flume, kafka, sqoop, map-reduce, hive, pig
 Process the raw log files from the set top boxes using java map reduce code and shell scripts and stored them as text
files in HDFS.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more
Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure
Databricks.
 Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and
designed and implemented system architecture for Microsoft Azure based cloud-hosted solution for the client
 Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud
infrastructure- Azure Databricks and Azure Data Factory.
 Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume java map reduce
programs, hive queries and pig scripts.
 Generating the required reports using Oozie workflow and Hive queries for operations team from the ingested data.
 Alert and monitoring mechanism for the oozie jobs for the failure conditions and successful conditions using email
notifications
 Involved in bluecoat proxy approach while sharing the data from the in-house Hadoop cluster to external vendors
 Working with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
 Writing Map Reduce code to make un-structured and semi- structured data into structured data and loading into
Hive tables.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure
Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure
Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
 Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
 Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse,
Azure Data Factory, Azure SQL Data warehouse.
 Worked on debugging, performance tuning of Hive & Pig Jobs
 Analyzing system failures, identifying root causes, and recommended course of actions as part of operations support.
 Involved in Spark streaming solution for the time sensitive revenue generating reports to match the pace with
upstream (STB) systems data
 Experience in working on Hbase with Apache phoenix as a data layer to serve the web requests to meet the SLA
requirements
 Created HBASE tables and load the data using spark-streaming application
 Worked on SFDC ODATA connector to get the data from NodeJS services which in turns the fetch the data from
HBASE
 Utilized AWS S3 services to push/store and pull the data from AWS from external applications
 Responsible for functional requirements gathering, code reviews, deployment scripts and procedures, offshore
coordination and on-time deliverables

Environment: Hadoop, HDFS, Pig, Hive, Flume, Kafka, Azure,MapReduce, Sqoop, Spark, Oozie, LINUX, NodeJS, SFDC,
ODATA and AWS.

Comcast Aug 2018 – Jun 2020


Data Engineer
Roles and Responsibilities:

 Worked on designing the content and delivering the solutions based on understanding the requirements.
 Wrote web service client for tracking operations for the orders which is accessing web services API and utilizing in our
web application.
 Used AJAXAPI for intensive user operations and client-side validations.
 Worked with Java, J2EE, SQL, JDBC, XML, JavaScript, web servers.
 Utilized Servlet for the controller layer, JSP and JSP tags for the interface
 Worked on Model View Controller Pattern and various design patterns.
 implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data
Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
 Worked with designers, architects, developers for translating data requirements into the physical schema definitions for
SQL sub-programs and modified the existing SQL program units.
 Designed and Developed SQL functions and stored procedures.
 Involved in debugging and bug fixing of application modules.
 Efficiently dealt with exceptions and flow control.
 Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying
transformations on it.
 Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Scala.
 Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline
using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.
 Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and
Implement data quality metrics using necessary query’s or python scripts based on source.
 Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
 Loading the data from multiple Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive
tables
 Worked on Object Oriented Programming concepts.
 Added Log4j to log the errors.
 Spearheaded coding for site management which included change of requests for enhancing and fixing bugs pertaining to
all parts of the website.
Environment: Java, JavaScript, JSP, Rest API, JDBC, Servlets, MS SQL, XML, Windows XP, Ant, SQL Server database,
Eclipse Luna, SVN.

Client: Hancock Whitney Nov 2017 – Jun 2018

Role: Data Engineer

 Built Real time streaming and ingestion platform for an Event to publish events and subscribe events.
 Worked on several use cases such as AutoPay, Flexloan, ATMDebit Card Pin Request, Salesforce Integration
(Address update), Zelle Email address and Phone number Update and Negative data sharing.
 Initiated Data Governance for real time events by designing and Implementing Schema Registry (Avro format)
 EAP (Enterprise Application platform) Maintenance and persistence of real time and batch data.
 Tokenization Implementation using DTAAS, Previtaar for supporting REST Level, file level, field level encryption.
 Kafka Connects: AWS S3 Sink Connect, Salesforce Connect, Splunk Connect and HDFS Connect.
 CMP (Market place) Integration with Kafka Admin API’s.
 Schema Registry API’s development and Implemented to PROD.
 Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake Store (ADLS) using Azure
Data Factory (ADF V1/V2).
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure
Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure
Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
 Implemented Proof of concepts for SOAP & REST APIs
 Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data
Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise
databases to Azure Data lake store using Azure Data factory.
 Build Transformations using Data bricks, Spark SQL, SCALA/Python stored in to ADLS
 Developed Hadoop Solutions on AWS developer to admin roles utilizing the Hortonworks Hadoop Stack.
 Developed Producer and Consumer SDK and imported the libraries to JFROG Artifactory.
 Connecting Kafka servers using SSL and SASL Connectivity.
 Persisting data in EAP (HDFS), creating tables using hive and storing the data in Hbase.
 Persisting data in S3 using kinesis Firehouse, VPC, EMR, Lambda and Cloud Watch.
 Mongo DB persistence and maintenance.
 Developed workflow in Oozie also in Airflow to automate the tasks of loading data into HDFS and pre- processing
with Hive.
 Responsible for migrating data from on prem to Azure cloud.
 Created Pipelines in Azure Data Factory using Linked Services/Datasets/Pipeline/ to Extract, Transform and load
data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and
backwards.
 Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process
 Created a Python/Django based web application using Python scripting for data processing, MySQL for the database,
and HTML/CSS/JQuery and HighCharts for data visualization of the served pages.
 Used existing Deal Model in Python to inherit and create object data structure for regulatory reporting.
 Used Python library BeautifulSoup for webscrapping to extract data for building graphs.
 Involved in loading data from edge node to HDFS using shell scripting.
 Verify data consistency between systems.
 Work extensively with flat files. Loading them into on-premise applications and retrieve data from applications to
files.
 Monitoring Kafka metrics, performance metrics and Health checks using App Dynamics.
 Log monitoring and Info monitoring using Splunk.
 Built real time dashboards using Grafana.
 Onboarding APIM, PSG, OAUTH, SSO and channel ID for the applications and API’s and deployed in PCF.
Spark Streaming integration with Kafka POC

Maruti Suzuki , India Nov 2016 – Oct 2017


Big Data Engineer
Roles and Responsibilities:

 Built Real time streaming and ingestion platform for an Event to publish events and subscribe events.
 Worked on several use cases such as AutoPay, Flexloan, ATM Debit Card Pin Request, Salesforce Integration
(Address update), Zelle Email address and Phone number Update and Negative data sharing.
 Initiated Data Governance for real time events by designing and Implementing Schema Registry (Avro format)
 EAP (Enterprise Application platform) Maintenance and persistence of real time and batch data.
 Tokenization Implementation using DTAAS, Previtaar for supporting REST Level, file level, field level encryption.
 Kafka Connects: AWS S3 Sink Connect, Salesforce Connect, Splunk Connect and HDFS Connect.
 CMP (Market place) Integration with Kafka Admin API’s.
 Experience in migrating existing databases from on premise to AWS Redshift using various AWS services
 Developed the Pysprk code for AWS Glue jobs and for EMR.
 Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform
event driven processing
 Worked on configuring and managing disaster recovery and backup on Cassandra Data.
 Schema Registry API’s development and Implemented to PROD.
 Message level encryption for an event all levels.
 Developed Producer and Consumer SDK and imported the libraries to JFROG Artifactory.
 Connecting Kafka servers using SSL and SASL Connectivity.
 Persisting data in EAP (HDFS), creating tables using hive and storing the data in Hbase.
 Persisting data in S3 using kinesis Firehouse, VPC, EMR, Lambda and Cloud Watch.
 Mongo DB persistence and maintenance.
 Monitoring Kafka metrics, performance metrics and Health checks using App Dynamics.
 Log monitoring and Info monitoring using Splunk.
 Built real time dashboards using Grafana.
 Onboarding APIM, PSG, OAUTH, SSO and channel ID for the applications and API’s and deployed in PCF.
 Spark Streaming integration with Kafka POC.

Datamatic Technology, India Sep 2014 – Oct 2015


SQL Developer
Roles and Responsibilities:

 Developing Oracle PL/SQL stored procedures, Functions, Packages, SQL scripts.


 Participated in Designing databases (schemas) to ensure that the relationship between data is guided by tightly bound Key
constraints.
 Worked with users and application developers to identify business needs and provide solutions. 
 Created Database Objects, such as Tables, Indexes, Views, and Constraints. 
 Extensive experience in Data Definition, Data Manipulation, Data Query and Transaction Control Language
 Enforced database integrity using primary keys and foreign keys. 
 Tuned pre-existing PL/ SQL programs for better performance. 
 Created many complex SQL queries and used them in Oracle Reports to generate reports. 
 Implemented data validations using Database Triggers. 
 Used import export utilities such as UTL_FILE for data transfer between tables and flat files.
 Performed SQL tuning using Explain Plan. 
 Provided support in the implementation of the project. 
 Created scripts for system administration and AWS using languages such as BASH and Python and experience in
Streaming tools Spark, Spark Structure, Kafka Streaming.
 Experience in GoLang concepts like Slices, Maps, Structs, Interfaces, Go routines and Channels
 Developed a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL and custom tools
developed in Python and Bash and also used Pymongo to perform CRUD operations on Mongodb
 Deployed AWS VPC, Ingress and Egress rules, AWS EMR, S3 and many other inter connected components using cloud
formation template which is auto generated by a python script.
 Using AWS Redshift, I Extracted, transformed and loaded data from various heterogeneous data sources and destinations
 Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.
 Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and
metadata repositories using ERWIN and MB MDR
 I have written shell script to trigger data Stage jobs.
 Worked with built-in Oracle standard Packages like DBMS_SQL, DBMS_JOBS and DBMS_OUTPUT. 
 Created and implement report modules into database from client system using Oracle Reports as per the business
requirements. 
 Used PL/SQL Dynamic procedures during Package creation. 
Environment: Oracle 9i, Oracle Reports, SQL, PL/SQL, SQL*Plus, SQL*Loader, Windows XP.

You might also like