Anwesh Babu: Hadoop Developer - Wells Fargo

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Anwesh Babu

Hadoop Developer - Wells Fargo


Alpharetta, GA - Email me on Indeed: indeed.com/r/Anwesh-Babu/96074ef43f6504f7
7+ years of professional experience in IT industry, with 3 years' experience in Hadoop ecosystem's
implementation, maintenance, ETL and Big Data analysis operations.
Excellent understanding of Hadoop architecture and underlying framework including storage management.
Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, ZooKeeper, HBase,
Sqoop, Oozie, Flume and SOLR for data storage and analysis.
Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/
Java into PigLatin and HQL (HiveQL)
Experience with Oozie Scheduler in setting up workflow jobs with Map/Reduce and Pig jobs.
Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
Experience in managing Hadoop clusters and services using Cloudera Manager.
Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce.
Experience in importing and exporting data between HDFS and Relational Database Management systems
using Sqoop.
Collected logs data from various sources and integrated in to HDFS using Flume.
Assisted Deployment team in setting up Hadoop cluster and services.
Hands-on experience in setting up Apache Hadoop and Cloudera CDH clusters on Ubuntu, Fedora and
Windows (Cygwin) environments.
In-depth knowledge of modifications required in static IP (interfaces), hosts and bashrc files, setting up
password-less SSH and Hadoop configuration for Cluster setup and maintenance.
Excellent understanding of Virtualization, with experience of setting up a POC multi-node virtual cluster by
leveraging underlying Bridge Networking and NAT technologies.
Experience in loading data to HDFS from UNIX (Ubuntu, Fedora, Centos) file system.
Knowledge of project life cycle (design, development, testing and implementation) of Client Server and Web
applications.
Experience in writing batch scripts in Ubuntu/UNIX to automate sequential script entry.
Knowledge of Hardware, Software, Networking and external tools including but not limited to Excel, Access
and experience in utilizing their functionality as and when required to enhance productivity and ensure
accuracy.
Determined, committed and hardworking individual with strong communication, interpersonal and
organizational skills.
Technology enthusiast, highly motivated and an avid blog reader, keeping track of latest advancements in
hardware and software fields.

WORK EXPERIENCE

Hadoop Developer
Wells Fargo - New York, NY - July 2013 to Present
Wells Fargo & Company is an American multinational diversified financial services company. The CORE
project deals with improving end-to-end approach to real estate-secured lending, the overall customer
experience and achieving the vision of satisfying all the customers' financial needs. The purpose of the project
is to build a big data platform that would be used to load, manage and process terabytes of transactional data,
machine log data, performance metrics, and other ad-hoc data sets and extract meaningful information out of
it. The solution is based on Cloudera Hadoop.

Responsibilities:
Worked on implementation and maintenance of Cloudera Hadoop cluster.
Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and
Hbase.
Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
Used Hadoop FS scripts for HDFS (Hadoop File System) data loading and manipulation.
Performed Hive test queries on local sample files and HDFS files.
Developed and optimized Pig and Hive UDFs (User-Defined Functions) to implement the functionality of
external languages as and when required.
Extensively used Pig for data cleaning and optimization.
Developed Hive queries to analyze data and generate results.
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report
generation.
Managed, reviewed and interpreted Hadoop log files.
Worked on SOLR for indexing and search optimization.
Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases
like HBase, Cassandra to determine the optimal DB.
Analyzed user request patterns and implemented various performance optimization measures including but
not limited to implementing partitions and buckets in HiveQL.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive
queries and Pig Scripts
Monitored workload, job performance and node health using Cloudera Manager.
Used Flume to collect and aggregate weblog data from different sources and pushed to HDFS.
Integrated Oozie with Map-Reduce, Pig, Hive, and Sqoop.
Environment: Hadoop 1x, HDFS, MapReduce, Pig 0.11, Hive 0.10, Crystal Reports, Sqoop, HBase, Shell
Scripting, UNIX.

Hadoop Developer
PG&E - San Francisco, CA - May 2012 to June 2013
The Pacific Gas and Electric Company, commonly known as PG&E, is an investor-owned utility that provides
natural gas and electricity to most of the northern two-thirds of California, from Bakersfield to the Oregon
border. The purpose of this project was to build and maintain a bill forecasting product that will help in reducing
electricity consumption by leveraging the features and functionality of Cloudera Hadoop. A second cluster
was implemented for historic Data warehousing, increasing the sample size for power and gas usage pattern
analysis and for readily available data storage by leveraging the functionality of HBase.
Responsibilities:
Involved in development and design of a 3 node Hadoop cluster using Apache Hadoop for POC and sample
data analysis.
Successfully implemented Cloudera on a 30 node cluster for P&G consumption forecasting.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing
Hadoop clusters.
Involved in planning and implementation of an additional 10 node Hadoop cluster for data warehousing,
historical data storage in HBase and sampling reports.
Used Sqoop extensively to import data from RDMS sources into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded
final data into HDFS.
Developed Pig UDFs to pre-process data for analysis

Worked with business teams and created Hive queries for ad hoc access.
Responsible for creating Hive tables, partitions, loading data and writing hive queries.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Worked on Oozie to automate job flows.
Maintained cluster co-ordination services through ZooKeeper.
Generated summary reports utilizing Hive and Pig and exported these results via Sqoop for Business reporting
and Intelligence analysis to ascertain whether the power saving programs implemented were fruitful or not.
Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java Eclipse, SQL Server, Shell Scripting.

Hadoop Developer
RelayHealth - Atlanta, GA - October 2011 to April 2012
RelayHealth, a subsidiary of McKesson, processes healthcare provider-to-payer interactions between 200,
000 physicians, 2, 000 hospitals, and 1, 900 payers (health plans) . We processed millions of claims per day
on Cloudera Enterprise, analyzing more than 1 million (150GB) log files per day and integrating with multiple
Oracle systems. As a result, we were able to assist our healthcare providers to get paid faster, improving their
cost models and productivity.
Responsibilities:
Involved in the process of load, transform and analyze health care data from various providers into Hadoop
using flume on an on-going basis.
Filtered, transformed and combined data from multiple providers based on payer filter criteria using custom
Pig UDFs.
Analyzed transformed data using HiveQL and Hive UDF's to generate payer by reports for transmission to
payers for payment summaries.
Exported analyzed data to downstream systems using Sqoop-RDBMS for generating end-user reports,
Business Analysis reports and payment reports.
Responsible for creating Hive tables based on business requirements.
Analyzed large data sets by running Hive queries and Pig scripts
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Analyzed large amounts of data sets from hospitals and providers to determine optimal way to aggregate
and generate summary reports.
Worked with the Data Science team to gather requirements for various data mining projects
Load and transform large sets of structured, semi structured and unstructured data.
Developed Pig Latin scripts to extract data from the web server output files to load into HDFS
Extensively used Pig for data cleansing.
Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java Eclipse, SQL Server, Shell Scripting.

Java/J2EE Interface Developer


Avon Products - New York, NY - October 2010 to September 2011
Avon Products, Inc. is an American international manufacturer and direct selling company in beauty,
household, and personal care categories. The object of this project was to support existing applications,
develop an M-Commerce application for Avon mobile purchase portal.
Responsibilities
Created Use case, Sequence diagrams, functional specifications and User Interface diagrams
Involved in complete requirement analysis, design, coding and testing phases of the project.

Participated in JAD meetings to gather the requirements and understand the End Users System.
Migrated global internet applications from standard MVC to Spring MVC and hibernate.
Integrated content management configurations for each page with web applications JSPs.
Assisted in design and development of Avon M-Commerce application from the scratch using HTTP, XML,
Java, Oracle objects, Toad and Eclipse.
Created Stored Procedures & Functions.
Used JDBC to process database calls for DB2 and SQL Server databases.
Developed user interfaces using JSP, HTML, XML and JavaScript.
Actively involved in code review and bug fixing for improving the performance.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML,
JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.

Java Developer
D&B Corporation - Parsippany, NJ - November 2009 to September 2010
D&B is world's leading provider of business information, helping reduce the credit risk and manages business
between customers and vendors efficiently. The D&B stores and maintains information over 77 million
companies worldwide.
Responsibilities
Utilized Agile Methodologies to manage full life-cycle development of the project.
Implemented MVC design pattern using Struts Framework.
Form classes of Struts Framework to write the routing logic and to call different services.
Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts
framework.
Developed web application using JSP custom tag libraries, Struts Action classes and Action.
Designed Java Servlets and Objects using J2EE standards.
Used JSP for presentation layer, developed high performance object/relational persistence and query service
for entire application utilizing Hibernate.
Developed the XML Schema and Web services for the data maintenance and structures.
Used Web Sphere Application Server to develop and deploy the application.
Worked with various Style Sheets like Cascading Style Sheets (CSS)
Involved in coding for JUnit Test cases.
Environment: Java/J2EE, Oracle 11g, SQL, JSP, Struts 1.2, Hibernate 3, Web Logic 10.0, HTML, AJAX, Java
Script, JDBC, XML, JMS, UML, JUnit, log4j, Web Sphere, My Eclipse

Java/J2EE developer
Wilshire Software Technologies - Hyderabad, Andhra Pradesh - April 2007 to October 2009
Wilshire Technologies is committed to provide high quality service with elevated level of client satisfaction.
Wilshire has just the right mix of technical skills and experience to provide real time client solutions. For this
we are facilitating high end infrastructure for design & development.
Responsibilities:
Developed the application under JEE architecture, developed, Designed dynamic and browser compatible
user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
Deployed & maintained the JSP, Servlets components on Web logic 8.0
Developed Application Servers persistence layer using JDBC and SQL.
Used JDBC to connect the web applications to Databases.

Implemented Test First unit testing framework driven using Junit.


Developed and utilized J2EE Services and JMS components for messaging in Web Logic.
Configured development environment using Web logic application server for developers integration testing.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, AJAX, Java Script, Web Logic 8.0, HTML, JDBC
REFERENCES WILL BE PROVIDED ON REQUEST

ADDITIONAL INFORMATION
Technical Skills:
Hadoop Ecosystem HDFS, MapReduce, YARN, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume and Avro.
Web Technologies HTML, XML, JDBC, JSP, JavaScript, AJAX
Methodologies Agile, UML, Design Patterns (Core Java and J2EE)
NOSQL Databases HBase, MongoDB, Cassandra
Data Bases Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access
Programming Languages C, C++, Java, SQL, PL/SQL, Python, Linux shell scripts.
Tools Used Eclipse, Putty, Cygwin, MS Office, Crystal Reports

You might also like