Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Ahmed Mohammed

(804) 368-4870| mdahmed.tech@yahoo.com


Software Engineer – Big Data
Professional Summary

 8 years of experience in the software industry, specialized in software development using Hadoop
and Java/J2EE technologies.
 6 Years of Hands on experience in working with Hadoop ecosystems– MapReduce, HDFS, YARN,
Hive, Pig, HBase, Zookeeper, Sqoop, Flume, Spark, Kafka, Storm and Cassandra, AWS EMR and
AWS EC2.
 Extensive work experience in the areas of Automobile, Banking and Financial Services.
 Expeirence of working with Google DFP (AdManager), Salesforce CRM and Hubspot Marketing
platform.
 Strong ability to understand simple to complex processing needs of Big Data and develop
programs and modules to address those needs. 
 Experienced and well versed in understanding data architecture, data mining, data modeling and
reporting needs and perform data cleansing and preprocessing by Hadoop Ecosystem.
 Strong ability to write MapReduce Programs, Pig Latin scripts and Hive queries for processing
data in HDFS and further monitor job executionand recover job failures arising from task failure,
node failure etc.
 Strong ability to write custom User Defined Functions in java for Pig and Hive to perform
operations such as aggregation and transformation extending Hive and Pig core functionality.
 Experienced and well versed with No-SQL databases: HBase, MongoDB and Cassandra and using
native shell commands and Java APIs to create tables, insert new data and retrieve data by row-
key.
 Experience of working with Spark modules, Spark Streaming and Spark SQL to process data on top
of Hadoop by creating Resilient Distributed Datasets (RDD).
 Experience of working with streaming technologies like Apache Kafka and Storm.
 Experience in importing and exporting structured data using Sqoop between HDFS and RDBMS
like MySQL, Oracle etc.
 Well versed in installing, configuring, supporting and managing - Cloudera Hadoop platform along
with CDH3 and CDH4 clusters, IBM’s BigInsight for Hadoop ecosystem.
 Experience in configuring Java API’s to create development environment for MapReduce
applications, create test cases using MRunit, JUnit frameworks and debug jobs to remove errors
from code.
 Experience of working with Web UI and Web tools like Hue, HTTP, REST APIs, jQuery, AngularJS,
JSON, Node.js to access data across Hadoop ecosystem.
 Expeirence of creating RESTFUL WebServices using Spring Boot and Jersey Framework.
 Experience of writing Unix Shell Scripts to schedule Cron jobs and automate tasks.
 Hardworking, enthusiastic and highly committed to the growth and success of the organization.
 Possess strong analytical, verbal, inter-personal skills that help communicating with the team
members or do any requirement analysis.
 Highly capable of picking up any new technology with a minimum learning curve and have ability
to work independently.

Technical Skills
Big Data: Hadoop Framework, HDFS, MapReduce, YARN, Pig, Hive, HBase, Zookeeper, Hue, Impala,
Sqoop, Flume, Spark, Kafka, Storm, Oozie, AWS EMR, AWS EC2, AWS Data Pipeline and Amazon Redshift.
Programming and Scripting Languages: Java, C, Python, Scala and UNIX Shell Script.
Java & J2EE Development: Servlets, JSP, JDBC, Java Beans
Web Development: HTML5, CSS3, JavaScript, jQuery and Node.js
Database Management Systems: MS SQL Server, Oracle, MySQL and NoSQL Database
IDE: Eclipse, JetBrain’s IDEA, NetBeans and PyCharm
Web Servers: Apache Tomcat 7, Oracle Web Logic and IBM Web Sphere
ETL Tools: AWS Data Pipeline, Talend for Big Data and Informatica
Predictive Modeling and Analytics Tools: Tableau, SAP BW 7.3 and IBM Cognos
Software Engineering Methods: Object Oriented Methodologies, Scrum and Agile Methodologies
Virtual Machine: VMware Workstation 11, Oracle Virtual Box
Operating Systems/Environment: Windows (XP/7/8/10), Linux (Ubuntu/CentOS), Cloudera CDH

Professional Experience

Software Engineer–Big Data July 2019 – Present


Choice Hotels International, Phoenix AZ
Choice Hotels is an exciting intersection of the travel, hospitality, and franchising sectors, fueled by the
power of cutting-edge technology to welcome every guest, every partner, everywhere their journey takes
them. Choice Hotels is building a Data Acquisition Platform (DAP) to ingest data from multiple systems like
Kafka, Relational Database, File System and process using Hadoop/Spark and expose the processed data
for various purposes like Data Analysis, Machine Learning and BI Reporting etc. Framework is developed
and orchestrated on AWS (Amazon Web Services).
Responsibilities:
 Involve in implementation of various modules in Hadoop Big Data Platform and AWS Cloud
Platform.
 Involve in design and development of Data Acquisition Platform (DAP) solution using Big Data
technologies such as Apache Spark, Apache Kafka, Hive, HBase and MongoDB.
 Develop Datapipeline using Kafka streaming to ingest data from Choice Central System that
includes JSON Files and Oracle Database into Delivery Layer hosted on AWS S3.
 Write Spark Jobs using Java Spark API and PySpark Module to parse the raw data collected on
AWS S3 and store the processed data in SSD (Single Source Drive) layer.
 Create Spark Dataset to load data from multiple sources such as Hive, JSON and Parquet files.
 Perform transformations, cleaning and filtering on data stored in Spark Dataset and load the final
data in Redshift.
 Develop Kafka producer and consumer applications using Java API on Kafka cluster.
 Create Hive Scripts to preprocess the data stored on AWS S3 and load the processed data in
Redshift tables.
 Create Hive Partitions and Buckets to optimize the performance of join operations.
 Write SQL queries to perform several operations on data stored in Redshift.
 Write User Defined Functions (UDF’s) in Python to to parse and extract information the URL’s
stored in Redshift Tables.
 Create AWS Data Pipelines for daily ETL activities that include loading data from Oracle Database
on to S3 and process using PySpark module and load into Redshift tables.
 Create AirFlow DAG’s using Python to schedule the tasks, define relationships and dependencies
between the tasks.
 Develop microservices with REST architecture using Spring Boot framework to automate Kafka
Streaming, Spark Jobs and data processing activities.
 Work closely in setting up of different environment and updating configurations.
 Write Shell Scripts to run HDFS, Spark, Kafka commands, Unix File processing commands and
cron jobs.
 Manage the day-to-day operations of AWS Cloud services - EC2 Instances, S3, EMR, Data Pipelines
and Redshift.

Environment: Hadoop, Spark, Kafka, Amazon EMR, Amazon Redshift, AWS S3, Hive, AWS EC2, Docker,
Jenkins, Java (jdk1.8), RESTFUL Web Services, AirFlow, Python, Scala, Shell Scripts, SQL Server, MySQL,
Oracle 11g.

Software Developer–Big Data April 2016 – June 2019


Advantage Business Marketing, Rockaway NJ
Advantage Business Marketing (ABM) helps more than one million innovators at science, design
engineering, and manufacturing companies discover and procure new technologies that give them a
competitive advantage. ABM implemented Hadoop for two purposes. Firstly, to store and analyse legacy
data generated over the years and secondly to ingest data from sources like Google AdManager,
Salesforce CRM, ExactTarget and MuleSoft Middleware onto HDFS. With Big Data implementation ABM
achived significant improvement in reporting performance which helped in reaching right customer,
providing content based on the interest, helping our publishers with customer behavioral experience and
provide the advertisers real time reports on Ad Performance on the Magazine websites.
Responsibilities:
 Set-up and configured AWS EMR environment for applications like MapReduce, Hive, Pig, HBase,
Spark, Sqoop, Flume to store and process data collected over the years in legacy warehouse which
includes SQL server, FTP server, Raw files collected on S3 and Redshift database.
 Involded in designing data flow model for end-to-end ETL process of extracting raw data,
processing and transforming and loading to data stage environment in AWS environment.
 Developed MapReduce jobs to parse the raw data like nested JSON files and XML files collected
from FTP server to structured format and store on s3.
 Create Sqoop jobs to import large data from SQL server on to s3 to be processed by Hive and
Spark.
 Wrote Hive and Pig Scripts to pre-process decision driven data and loaded to Redshift database to
be analyzed by Tableau reporting tool.
 Worked with different storage formats such as AVRO, Parquet, ORC files for read and write
operations in Hive.
 Migrated data to Spark Cluster from sources like SQL Server, remote Hadoop Cluster etc.
 Used SparkSQL to analyze structured and semi-structured data in real time collected from sources
sources like aws S3 and Hive.
 Develop Custom User Defined Functions in hive to parse large number of nested JSON files and
XML files to CSV format and add any additional features to the raw files based on reporting
requirements.
 Resolved connection issues between AWS EMR, Spark Cluster, Amazon Redshift and Tableau
Server.
 Set Up and Maintain security rules in clouds services like AWS EMR, Redshift and Tableau by
definining inbound rules for SSH, TCP, HTTP access by internal users.
 Created and maintain loggly for capturing logs from java applications.
 Created Java REST applications to pull data from the remote servers into the Hadoop
Environment.
 Created messaging Queue using AWS SQS service and Apache Kafka paltforms. Created Producer
and Consumer applications using Java to address the use case requirement.
 Actively participated in the code reviews, meetings and resolving any technical issues.
 Provided assistance for root cause of performance issues in Tableau reporting.
Environment: Hadoop, Spark, Amazon EMR, Amazon Redshift, AWS S3, Hive, Pig, HBase, AWS EC2, Java
(jdk1.8) RESTFUL Web Services, Python , Scala, Shell Scripts, SQL Server, MySQL, SalesForce, Hubspot,
DFP.

Hadoop Developer Dec 2013 – April 2016


Client: Cox Automotive Inc., Atlanta, GA
Employer: Intone Networks Inc.
Cox Automotive is a leading provider of vehicle remarketing services and digital marketing and software
solutions for the automotive industry. Cox implemented Cloudera Enterprise Data Hub to store an
abundant amount of data from multiple sources and allow for much faster analysis. By incorporating
Spark, we built executive dashboards that will showcase core metrics in real time, allowing Cox and its
clients to make changes that improve behaviors and results. Cox has achieved its goal of delivering new,
better customer experiences through greater data insight, while improving operational efficiencies.
Responsibilities:
 Involved in Implementation of Cloudera Enterprise Data Hub (EDH) powered by Hadoop for data
processing, analytics and to prepare data to be loaded into BI tools for query, data visualization
and reporting.
 Understand data architecture, data mining, data modeling and reporting needs and perform data
cleansing and preprocessing by Hadoop Ecosystem.
 Developed MapReduce programs to cleanse, parse and transform raw data in HDFS obtained from
various data sources.
 Used Sqoop to import data from sources like Oracle, MySQL and Netezza into hive tables for
further processing and exported the Hive processed data to Oracle database for reporting by BI
team. 
 Developed Hive queries to get insight of decision driven structured data stored in Hive tables.
 Wrote Pig scripts to transform raw data from several data sources into forming baseline data.
 Created partitions and buckets on Hive tables to analyze and compute various metrics.
 Created Hive UDF libraries and used in Hive queries to analyze the data.
 Created HBase tables to load large data sets of structured and semi-structed data in HDFS and
analyzed data using HBase shell commands and native Java API’s.
 Involved in integration of Spark with Hadoop for running Spark on YARN.
 Loaded data from sources like HDFS/Hive/HBase into Spark RDD for in-memory computation
helping realtime reports.
 Used Sqoop on Spark for data ingestion from sources like Hive, HBase and Kafka by running sqoop
jobs.
 Used SparkSQL to analyze structured and semi-structured data collected from sources sources like
Hive, JSON and Parquet.
 Developed Kafka and Storm based data pipelines for real-time streaming analytics on fast-moving
data.
 Involved in processing the streaming data using Kafa, Flume integrating with Spark streaming API.
 Created Oozie workflows to automate Hive and Pig jobs using Hue UI.
 Monitored the progress and status updates of MapReduce jobs including Pig and Hive queries.
 Involved in testing and debugging MapReduce jobs and created counters to keep the statics of
MapReduce jobs.
 Developed Project Reports and Updates regularly and shared them among the team members.
Environment: Cloudera CDH4, Hadoop, MapReduce, HDFS, Hive, Pig, HBase, Cassandra, Spark, Cloudera
Impala, Zookeeper, MySQL, Oracle 11g, Amazon EC2, Java (jdk1.7), Web Logic, Vertica, Tableau, Talend
Open Studio for Big Data, Windows, Linux.

Software Developer - Java Aug 2010 – Mar 2012


Amazon, Hyderabad, India
Developed Big Data tools and applications to analyze, develop and test potential business use cases on
larger data sets which improved user experience to query and analyze data.
Responsibilities:
 Involved in Implementation and configuration of AWS EMR to manage, query and analyze
business data by assessing the existing MySQL and Oracle databases.
 Developed MapReduce programs to support distributed processing of data generated by various
business units.
 Worked on various file formats such as Sequence files, AVRO, HAR files to be processed by
MapRedce programs.
  Continuously monitor and control MapReduce job executions and recover job failures arising
from task failure, node failure etc.
 Developed a log parser using MapReduce, where daily logs were Stored on AWS S3 and further processed with
EMR applications for analytical queries.
 Loaded data from UNIX file system to HDFS by writing cron jobs.
 Created Pig Latin scripts in the areas where extensive MapReduce code can be reduced.
 Developed Pig Scripts to perform operations on data such as Filter, Joins, Grouping, Sorting and
Splitting.
 Developed User Defined Functions in Pig using java to perform data load/store, column
transformation, and aggregation.
 Removed bugs from code by performing testing and debugging of MapReduce jobs and Pig Latin
scripts.
 Responsible for performing code reviews, troubleshooting issues and maintaining status report. 

Environment: Java, J2EE, Hadoop, MapReduce, Apache Pig, AWS EMR, AWS EC2, MySQL Server, Oracle
10g, JavaScripts, XML, Unit Case, JUnit

Education

 Texas A&M University-Kingsville


M.S. in Computer Science
 Jawaharlal Nehru Technological University-Hyderabad
B.Tech in Computer Science Engineering

You might also like