Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Professional summary:

 A Certified AWS Sr. Data Scientist with lots of work experience in Machine Learning, Artificial
Intelligence, Python, PySpark and with over 10+ years of professional experience in Machine
Learning, Statistical Modeling, Deep Learning, Data Analytics, Data Modeling, Data
Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP),
Artificial Intelligence/Machine Learning algorithms, Business Intelligence.
 Experienced in building Analytics Models like Decision Trees, Linear & Logistic Regression,
Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL, Erwin.
 Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep
understanding & exposure of Big Data Eco-system.
 Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation,
Integration, Data Import, and Data Export through the use of multiple ETL tools such as
Informatica Power Center.
 Good Knowledge and experience in deep learning algorithms such as Artificial Neural network
(ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM
and RNN based speech recognition using TensorFlowGood Knowledge of Natural Language
Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python
and R.
 Hands on experience in Hadoop ecosystem and Spark framework such as HDFS,
MapReduce, HiveQL, SparkSQL, PySpark.
 Experience in building visual data processing pipelines, including the design of necessary
validation automation, Well versed in problem solving, debugging, and troubleshooting skills.

Technical Skills:
Languages Python (2.x/3.x), R, SAS, SQL, T-SQL
AWS Cloud AWS Services Amazon EC2, Amazon S3, Amazon Simple DB,
Amazon MQ, Amazon ECS, Amazon Lambdas, Amazon Sagemaker,
Amazon RDS, Amazon Elastic Load Balancing, Elastic Search,
Amazon SQS, AWS Identity and access management, AWS Cloud
Watch, Amazon EBS and Amazon CloudFormation.
Databases MySQL, PostgreSQL, Oracle, HBase, Amazon Redshift, MS SQL
Server 2016/2014/2012/2008 R2/2008, Teradata
Statistical Methods Hypothetical Testing, ANOVA, Time Series, Confidence Intervals,
Bayes Law, Principal Component Analysis (PCA), Dimensionality
Reduction, Cross-Validation, Autocorrelation
AI/ML Regression analysis, Bayesian Method, Decision Tree, Random
Forests, Support Vector Machine, Neural Network, Sentiment
Analysis, K-Means Clustering, KNN and Ensemble Method
Hadoop Ecosystem Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Sqoop, Flume
Reporting Tools Tableau Suite of Tools 10.x, 9.x, 8.x which includes
Desktop, Server and Online, Server Reporting Services (SSRS)
Data Analytics Tools Python (NumPy, SciPy, pandas, Genism, Keras), R (Caret, Weka,
ggplot)
Data Visualization Tableau, Visualization packages, Matplotlib, Seaborn, ggplot2,
Microsoft Office
Operating Systems PowerShell, UNIX/UNIX Shell Scripting , Linux and Windows

Certifications:
 AWS Certified Solutions Architect Associate.
 Google Certified Tensor Flow Developer.

Professional Experience:

T-Mobile, NJ Apr 2022 – Present


Sr. Data Scientist (Machine Learning, Artificial Inetelligence, Python)
Responsibilities:
 Performed statistical analysis on textual data, building Artificial Intelligence/Machine Learning/
Deep Learning models in the domain of Natural Language.
 Used Python 3.X (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark2.0 (PySpark,
MLlib) to develop variety of models and algorithms for analytic purposes.
 Application of various Artificial Intelligence/Machine Learning algorithms and statistical
modeling like decision trees, Random Forest(XGBOOST) regression models, neural
networks, SVM, clustering to identify Volume using the scikit-learn package in python,
MATLAB.
 Performed Data Collection, Data Cleaning, Data Visualization and developing Artificial
Intelligence/Machine Learning Algorithms by using several packages: NumPy, Pandas, Scikit-
learn and Matplotlib.
 Designed and developed NLP models for sentiment analysis.
 Applied Machine learning to fraud detection using Hadoop/Spark/Hive (Bigdata), Python,
Scikit-Learn, MLlib, C++.
 NLP & Recurrent Neural Networks (LSTM RNNs) learnt using Deep Learning techniques
applied to a fraud detection system.
 Developed Python tools for data visualization that helped determine errors in the capture
system.
 Performed research and development to solve problems across the entire Security &
Transportation Technology business unit. Performed data imputation using Scikit-learn
package in Python.
 Implementation of new statistical algorithms and operators on Hadoop and SQL platforms and
utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and
other approaches.
 Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.

Amazon, Seattle Feb 2020 – Feb 2022


Sr. Data Scientist
Responsibilities:
 Tackled highly imbalanced Fraud dataset using under sampling, oversampling with SMOTE
and cost sensitive algorithms with Python Scikit-learn.
 Wrote complex Spark SQL queries for data analysis to meet business requirement.
 Developed MapReduce/Spark Python modules for predictive analytics & Artificial
Intelligence/Machine Learning in Hadoop on AWS.
 Built detection and classification models using Python, TensorFlow, Keras, scikit-learn.
 Constructed data ingestion pipeline, implementing APIs, and model deployment framework to
add deep-learning generated intelligence to our system.
 Implemented MQTT protocol, REST APIs, serialization, designing schema, improving data
visualization using Python and designed algorithm to route the fuel efficiency and reduce
consumption.
 Built Optimization models using XGBOOST and Deep Learning algorithms.
 Worked on data cleaning and ensured Data Quality, consistency, integrity using Pandas,
NumPy.
 Participated in feature engineering such as feature intersection generating, feature normalize
and label encoding with Scikit-learn preprocessing.
 Improved default performance by using random forest and gradient boosting for feature
selection with Python Scikit-learn.
 Performed feature engineering, performed NLP by using some techniques like Word2Vec,
BOW (Bag of Words), Tf-Idf, Word2Vec, Doc2Vec.
 Performed Naïve Bayes, KNN, Logistic Regression, Random Forest, SVM and XGboost to
identify whether a design will default or not.
 Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan
default loss.
 Performed data cleaning and feature selection using MLlib package in PySpark and working
with deep learning frameworks.
 Developed text mining models using TensorFlow & NLP (NLTK, SciPy, and Core NLP) on call
transactions & social media interaction data for existing customer management.
 Experienced in Data Integration Validation and Data Quality controls for ETL process and
Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
 Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several
types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific
jobs.
 Installed and configured Hive and written Hive UDFs in java and python.

Transunion, Chicago, IL Jan 2019 – Feb 2020


Data Scientist
Responsibilities:
 Worked with data designer to leverage existing data models and build new ones for fraud and
anomaly detection as needed.
 Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
Performed Data Cleaning, features scaling, features engineering using pandas and NumPy
packages in python.
 Implemented a CI/CD pipeline using AWS DevOps(VSTS,TFS) in both cloud and on-premises
with GIT, MS Build, Docker, Maven along with Jenkins plugins.
 3NF business area data modeling with de-normalized physical implementation data and
information requirements analysis using ERWIN tool.
 Worked on the Snow-flaking the Dimensions to remove redundancy. Worked in using
Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel
Transporter (TPT) and BTEQ.

ACL Digital – Bangalore, India Oct 2014 – Dec 2016


BI Developer
Responsibilities:
 Used SSIS to create ETL packages to Validate, Extract, Transform and Load data into Data
Warehouse and Data Mart.
 Created Views and Table-valued Functions, Common Table Expression (CTE), joins, complex
subqueries to provide the reporting solutions.
 Optimized the performance of queries with modification in T-SQL queries, removed the
unnecessary columns and redundant data, normalized tables, established joins and created
index.
 Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling,
operations decisioning, and loss forecasting such as product-specific fraud, or buyer vs seller
fraud.
 Algorithms were used to obtain numerical results by running simulations many times in
succession to calculate probabilities with machine learning.
 Created SSIS packages using Pivot Transformation, Fuzzy Lookup, Derived Columns,
Condition Split, Aggregate, Execute SQL Task, Data Flow Task and Execute Package Task.
 Migrated data from Legacy system to SQL Server using SQL Server Integration Services
2012.
 Used C# scripts to map records. Created and modified Stored Procedures, Functions, and
Indexes.
 Used SQL to pull data out from databases and aggregate to provide detailed reporting based
on the user requirements. Provided statistical research analyses and Data Modeling support
for mortgage products.

Hyundai Motors Group, Hyderabad, INDIA June 2012 – Sep 2014


Data Analyst
Responsibilities:
 Wrote SQL queries for data validation on the backend systems and used various tools like
TOAD & DB Visualizer for DBMS(Oracle).
 Performed Data analysis, Backend Database testing, Data Modeling, and developed SQL
Queries to solve problems and meet user's need for Database management in Data
Warehouse.
 Utilized object-oriented languages, concepts, database design, star schemas and databases.
 Created algorithms as needed to manage and implement proposed solutions.
 Participated in test planning and test execution for functional, system, integration, regression,
UAT (User Acceptance Testing), load and performance testing.
Education:
 Master’s degree in computer science from SUNY Binghamton.

You might also like