Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Ram Madhav

Azure Data Engineer


Phone: +1(913)-208-1755
E-Mail: mrammadhav07@gmail.com

PROFESSIONAL SUMMARY:

To secure a challenging role as a Data Engineer, leveraging my 9+ years of software industry experience, with a focus
on Azure cloud services and Big Data technologies like Spark, MapReduce, Hive, Yarn, and HDFS, using programming
languages such as Scala and Python. With my 4 years of experience in Data Warehouse, I possess a deep understanding
of ETL processes, data modeling, and data warehousing. I am committed to delivering efficient and scalable data
solutions that drive business growth and support strategic decision-making.

✔ In depth knowledge on Big Data with Hadoop, Hive, Map Reduce, Spark, Spark Core, Spark SQL, and Data
Frames/Data Sets/RDD API
✔ Developed applications using Spark with PySpark, Scala for data processing
✔ Experience in writing Spark Jobs for data cleansing and transformations
✔ Good knowledge on Spark architecture and real-time streaming using Spark
✔ Experience writing in house UNIX shell scripts for Hadoop&Big Data Development
✔ Have good experience working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse
analytics (DW).
✔ Hands on experience on Python programming PySpark implementation azure data farctory building data
pipelines infrastructure to support deployments for Machine Learning models.
✔ Proficient writing complex spark (pyspark) User defined functions (UDFs), Spark SQL and HiveQL.
✔ Experience working on Azure Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data Bricks,
Data factory, Logic Apps, Functional App and EventHub services.
✔ Experience in developing data pipeline using Hive and Sqoop, to extract the data from weblogs and store in
HDFS and developing HiveQL for data analytics
✔ Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data
✔ Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL
development using Kafka, and Sqoop
✔ Developed Spark scripts by using Scala shell commands as per the requirement
✔ Good experience in writing Sqoop queries for transferring bulk data between ApacheHadoop and structured
data stores
✔ Substantial experience in writing Map Reduce jobs in Java.
✔ Experience in Developing Spark applications using PySpark, and Spark-SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats(structured/unstructured) for analyzing and
transforming the data to uncover insights into the customer usage patterns.
✔ Experience in Data Warehousing, Data Mart, Data Wrangling using Azure Synapse Analytics 
✔ Experience in understanding business requirements for analysis, database design&development of applications
✔ Worked with Kafka tools like Kafka migration, Mirror maker and Consumer offset checker
✔ Experience with realtime data ingestion using Kafka.
✔ Experience with CI/CDpipelines with Jenkins, Bitbucket, GitHub etc.
✔ Strong expertise in troubleshooting and performance fine-tuning Spark, and Hive applications
✔ Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core,
SparkStreaming and SparkSQL
✔ Extensive experience in developing applications that perform DataProcessing tasks using Teradata, Oracle, SQL
Server and MySQL database
✔ Worked on data warehousing and ETL tools like Informatica and PowerBI,
✔ Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings
with great communication skills

EDUCATION

✔ Masters in Cybersecurity, USA.

TECHNICAL SKILLS

✔ Big Data: Hadoop 3.0, Spark 2.3, Hive 2.3, Cassandra 3.11, MongoDB 3.6, MapReduce, Sqoop.
✔ Programming Languages: Python, Scala, SQL, Pyspark
✔ Big Data Technologies: Spark, Hadoop, HDFS, Hive, Yarn
✔ Databases: PostgreSQL, MySQL, Oracle, MongoDB, DynamoDB
✔ Other Tools: Eclipse, PyCharm, GitHub, Jira
✔ Cloud: Azure data Factory, Azure Data Bricks, Logic Apps, Functional App, Synopses, EventHub, Azure DevOps.
✔ Methodologies: Agile, Waterfall Model.

WORK EXPERIENCE

Client: Caseys, Ankeny, Iowa Sep 2021 - Present


Role: Azure Data Engineer
Responsibilities:
✔ Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS,
processing and analyzing the data in HDFS
✔ Working knowledge on Azure cloud components (Databricks, Data Lake, Blob Storage, Data Factory, Storage
Explorer, SQL DB, SQL DWH, CosmosDB).
✔ Experience in analyzing data from Azure data storages using Databricks for deriving insights using Spark cluster
capabilities.
✔ Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, data bricks, Pyspark, Spark SQL and U-SQL Azure Data Lake Analytics.
✔ Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from
Azure SQL, Blob storage, and Azure SQL Data warehouse
✔ Worked with Azure BLOB and Data lake storage and loading data into Azure SQL Synapseanalytics (DW).
✔ Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and
Spark SQL.
✔ Designed and implemented a real-time data streaming solution using Azure EventHub.
✔ Conducted performance tuning and optimization activities to ensure optimal performance of Azure Logic Apps
and associated data processing pipelines.
✔ Develop a Spark Streaming application to process real-time data from various sources such as Kafka, and Azure
Event Hubs.
✔ Build streaming ETL pipelines using Spark Streaming to extract data from various sources, transform it in real-
time, and load it into a data warehouse such as Azure Synapse Analytics
✔ Use tools such as Azure Databricks or HDInsight to scale out the Spark Streaming cluster as needed.
✔ Developed Spark API to import data into HDFS from Teradata and created Hive tables
✔ Created Partitioned and Bucketed Hive tables in ParquetFile Formats with Snappy compression.
✔ Loaded data into Parquet Hive tables from Avro Hive tables.
✔ Learner data model which gets the data from Kafka in real time and persist it to Cassandra. 
✔ Hands on programming experience in scripting languages like python and Scala
✔ Involved in running all the Hive scripts through Hive on Sparkand some through SparkSQL
✔ Using the JSON and XMLSerDe's for serializationand de-serialization to load JSON and XML data into HIVEtables
✔ Implemen Hive partitioning, bucketing, optimization code through set parameters/
✔ Perform different types of joins on Hive tables and implement Hive SerDe like Avro, JSON.
✔ Managing logs through Kafka with Logstash
✔ Involved in performance tuning of Hive from design, storageand query perspectives
✔ Developed Kafka consumer's API in Scala for consuming data from Kafka topics 
✔ Monitored Spark cluster using Log Analytics and Ambari Web UI.
✔ Developed Spark core and Spark SQL scripts using Scala for faster data processing.
✔ Worked on Hadoop ecosystem in PySpark on HDInsight and Databricks.
✔ Extensively used Spark core – Spark Context, Spark SQL and Spark Streaming for real time data
✔ Performed data profiling and transformation on the raw data using Python.
✔ Orchestrated number of Sqoop and Hives cripts using Oozie workflow and scheduled using Oozie coordinator 
✔ Implement RDD/Datasets/Data frame transformations in Scala through Spark Context and HiveContext
✔ Used Jira for bug tracking and BitBucket to check-in and checkout code changes
✔ Experienced in version control tools like GIT and ticket tracking platforms like JIRA.
✔ Expert at handling Unit Testing using Junit4, Junit5 and Mockito

Environment: Azure, Hadoop, HDFS, Yarn, MapReduce, Hive, Sqoop, Oozie, Kafka, SparkSQL, Spark Streaming, Eclipse,
Informatica, Oracle, CI/CD, PL/SQL UNIX Shell Scripting, Cloudera.

Client: Capital One, Richmond, Virginia Nov 2019 - July 2021


Role: Azure Data Engineer
Responsibilities
✔ Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with
MapReduce, Hive
✔ Hands - on experience in Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis
services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
✔ Created Batch & Streaming Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline/ to
Extract, Transform and load data.
✔ Created Azure Data Factory (ADF) Batch pipelines to Ingest data from relational sources into Azure Data Lake
Storage (ADLS gen2) & incremental fashion and then load into Delta tables after cleansing
✔ Created Azure logic apps to trigger when a new email received with an attachment and load the file to blog
storage
✔ Implemented CI/CD pipelines using Azure DevOps in cloud with GIT, Maven, along with Jenkins plugins.
✔ Build a Spark Streaming application to perform real-time analytics on streaming data.
✔ Use Spark SQL to query and aggregate data in real-time, and output the results to various visualizations such as
Power BI or Azure Data Studio.
✔ Develop a Spark Streaming application that integrates with event-driven architectures such as Azure Functions
or Azure Logic Apps.
✔ Use Spark Streaming to process events in real-time, and trigger downstream workflows based on the results.
✔ Involved in creating Hivetables and loading and analyzing data using hive queries
✔ Designed and developed custom HiveUDF’s
✔ Using the JSON and XMLSerDe's for serializationand de-serialization to load JSON and XML data into HIVEtables
✔ Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
✔ Implemented to reprocess the failure messages in Kafka using offsetid
✔ Used HiveQL to analyze the partitioned and bucketed data.
✔ Developed a Spark job in Java which indexes data into azure functions from external Hive tables which are in
HDFS.
✔ Written Hive queries on the analyzeddata for aggregation and reporting
✔ Developed SqoopJobsto load data from RDBMS to external systems like HDFS and HIVE
✔ Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation
from multiple file formats.
✔ Worked on converting the dynamic XMLdata for injection into HDFS
✔ Transformed and Copied data from the JSON files stored in a Data Lake Storage into an Azure Synapse Analytics
table by using Azure Databricks 
✔ Azure Databricks, Azure Storage Account etc. for source stream extraction, cleansing, consumption and
publishing across multiple user bases.
✔ Created resources, using Azure Terraform modules, and automated infrastructure management.
✔ Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data
✔ Loading data from UNIX file system to HDFS
✔ Configured spark streaming to receive real time data from the Apache Flume and store the stream data using
Scala to Azure Tables.
✔ Loaded the data into SparkRDD and do in memory data Computation to generate the Output response
✔ Used several RDD transformation to filter the data injected into SparkSQL
✔ Used HiveContext and SQLContext to integrate Hive metastore and SparkSQL for optimum performance.
✔ Used the version control system GIT to access the repositories and used in coordinating with CI tools.

Environment: Spark SQL, HDFS, Hive, Pig, Apache Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL Oracle
Enterprise DB, IntelliJ, CI/CD, Oracle, Subversion, and Agile Methodologies.

Client: Guild Mortgage - Houston, TX Sep 2018 - Oct 2019


Role: Data Engineer
Responsibilities
✔ Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data
technologies including but not limited to Hadoop, Hive, Spark, Kafka.
✔ Experience in developing scalable & secure data pipelines for large datasets.
✔ Gathered requirements for ingestion of new data sources including life cycle, data quality check,
transformations, and metadata enrichment.
✔ Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into
HDFS for analysis. 
✔ Supported data quality management by implementing proper data quality checks in data pipelines.
✔ Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
✔ Implemented data streaming capability using Kafka and informatica for multiple data sources.
✔ Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems
and vice versa.
✔ Worked with multiple storage formats (Avro, Parquet) and databases (Hive) Azure SQL.
✔ Optimizing query performance in Hive using bucketing and partitioning techniques
✔ Creating and managing partitions and buckets in Hive tables.
✔ Responsible for maintaining and handling data inbound and outbound requests through big data platform.
✔ Used Sqoop to transfer data between relational databases and Hadoop.
✔ Knowledge on implementing the JILs to automate the jobs in production cluster.
✔ Worked with SCRUM team in delivering agreed user stories on time for every Sprint. 
✔ Worked on analyzing and resolving the production job failures in several scenarios.
✔ Environment Implemented UNIX scripts to define the use case workflow and to process the data files and
automate the jobs.

Environment: Spark, Azue SQL, Python, HDFS, Hive, Sqoop, Scala, Kafka, Shell scripting, Linux, Eclipse, Git, Oozie,
Informatica, Agile Methodology.

Client: UGA Finance, Parkville, MO Jan 2018 - Aug 2018


Role: Hadoop Developer
Responsibilities
✔ Worked on GIT to maintain source code in Git and GitHub repositories
✔ Prepared an ETL framework with the help of sqoop , pig and hive to be able to frequently bring in data from the
source and make it available for consumption
✔ Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables
that can be reused across the project.
✔ Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables.
✔ Rigorously used Spark -Scala (RRD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector API's for various
tasks (Data migration, Business report generation etc.)
✔ Developed Spark Streaming application for real time sales analytics
✔ Analyzed the source data and handled efficiently by modifying the data types. Used excel sheet, flat files, CSV
files to generated PowerBI ad-hoc reports
✔ Analyzed the SQL scripts and designed the solution to implement using PySpark
✔ Extracted the data from other data sources into HDFS using Sqoop
✔ Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
loaded data into HDFS.
✔ Extracted the data from MySQL into HDFS using Sqoop
✔ Implemented automation for deployments by using YAML scripts for massive builds and releases
✔ Installed and configured Apache Hadoop clusters using YARN for application development and Apache toolkits
like
✔ Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.
✔ Implemented Data classification algorithms using MapReduce design patterns.
✔ Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of
MapReduce jobs.
✔ Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL.

Client: Huntington bank, Ohio Mar 2015 – Jun 2017


Role: Data Warehouse Developer
Responsibilities:
✔ Created new database objects like Procedures, Functions, Packages, Triggers, Indexes & Views using T-SQL in
Development and Production environment for SQL Server 2008R2.
✔ Developed Database Triggers to enforce Data integrity and Referential Integrity.
✔ Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database
links and formatted the results into reports and kept logs.
✔ Defined relationship between tables and enforced referential integrity.
✔ Extensively used various SSIS objects such as Control Flow Components, Dataflow Components, Connection
managers, Logging, Configuration Files etc
✔ Established connectivity to database and monitored systems performances.
✔ Performance tuning and monitored both T-SQL and PL/SQL blocks.
✔ Used SQL Profiler and Query Analyzer to optimize DTS package queries and stored procedures.
✔ Wrote T-SQL procedures to generate DML scripts that modified database objects dynamically based on user
inputs.
✔ Created Stored Procedures to transform the Data & worked extensively in T-SQL for various needs of the
transformations while loading the data.
✔ Participated in performance tuning using indexing (Cluster Index, Non-Cluster index) tables.
✔ Environment: SQL Server 2008R2, SSIS, Windows server, SQL Profiler, SQL Query Analyzer, T-SQL.

Client: AutoZone, Dallas Apr 2013 – Feb 2015


Role: Data Warehouse Developer
Responsibilities:
✔ Responsible for mentoring Developers and Code Review of Mappings developed by other developers
✔ Meeting with Business Process and Solution Development teams to deliver technical solutions based on
functional requirements.
✔ Coordinated with Business Users to understand business needs and implement the same into a functional Data,
warehouse design.
✔ Used various transformations such as expression, filter, joiner and look ups for better data messaging, migrate
clean and consistent data
✔ Developed mapping using Informatica transformations
✔ Involved in ETL design for the new requirements
✔ Implemented procedures, triggers, cursors using Sybase T-SQL
✔ Finalized flat file structure with business Used INFORMATICA (Power Center) tool as ETL tool for constantly
moving the data from sources into staging area creating complex SQL
✔ Trouble shoot and tune SQL using EXPLAIN PLAN
✔ Handled quality assurance such as test case writing, integration testing and unit testing
✔ Fixed the issues that come out of integration testing
✔ Environment: Oracle, Informatica Power Centre, Sybase, UNIX Scripting, Selenium, Maven, Eclipse, TOAD

You might also like