Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

R. Joseph Manoj Ph.D.

,
Data Scientist (Full Stack), Katalyst Labs (www.dataflo.io), Chennai

rjmanoj79@gmail.com | 98414 93362 | www.datasigns.info


https://www.linkedin.com/in/dr-rjmanoj | https://github.com/rjmanoj79

PROFESSIONAL SUMMARY_________________________________________________________________
• 4+ Years of experience in Data Science and solving business problems in Sales, Marketing, and Retail
• Experience in Data Vizualization & Storytelling. Rich Experience in Statistics, ML Modelling, NLP (NER,
Redaction, Document Classification)
• Proficiency in ML Model Development, Productization and AWS MLOPs Framework, GitLab, Docker
• Experience in AWS Sagemaker, Kinesis, EMR, Glue, S3, RDS, EC2, Lambda, ECS, EKS, Secrets
• Familiar in Data Ingestion Pipeline Designing using Azure ADF, Logic App, EventHub, Blob, SQL
Analytics, ADLS, CosmosDB, Azure Databricks/Spark
• Exposure to Big data technologies Hadoop, Sqoop, Pig, Hive, NoSQL, Spark, Kafka, Nifi, Zookeeper.
• Involved in Data collection & integration process from different sources to data warehouse
• Involved in Data modeling and data transformation process in snowflake data warehouse/DBT
• Implemented various Regression/Classification Algorithms using python/Sckit-learn
• Forecasted Univariate and multivariate Time-series data using various time-series algorithms
• Handled Streaming and the large volume of data using Databricks/Spark
• Experience in Agile methodologies and writing the user story, Refinement and monitor it
• Certified as “Microsoft Certified Azure Data Scientist” and “Tableau Certified Data Scientist”
• Contributed research projects and published over 30+ research articles/Blogs in the field of Data Science
• Refer to my data science/Machine learning blogs at www.datasigns.info
• Exposure in Quantum Computing (IBM QisKit, QML, QNLP)

TECHNICAL SKILLS_______________________________________________________________________
• Programming Language: Python 3.0
• Packages: Pandas, NumPy, Matplotlib, seaborn, Plotly, Scikit-Learn, BeautifulSoup, Pyspark, Flask
• Cloud Services: AWS (Sagemaker, EMR, ECS, EKS, Fargate, Glue, Kinesis), Azure (Databricks, ADF,
EventHub, Logical App, Blob, CosmosDB), GitLab and Docker
• Big Data: Apache Spark, Hadoop, HIVE, Nifi, PIG
• Database: MySQL, PostgreSQL, MongoDB (NoSQL)
• CloudML/AutoML: H2O.ai, MS-Azure ML Studio, AWS Sagemaker
• ML Algorithms: Linear/Logistic Regression, K-Means, Random Forest, SVM, XGBoost, LightGBM etc.,
• DL/NLP: NLTK, spaCy, LSTM, GAN and exposure in RNN, CNN Network
• Data warehouse: Snowflake; BI Tools: Tableau, MS-Excel
• Statistical Testing: A/B Testing, Hypothesis Testing and ANOVA testing
• Quantum Computing - IBM QisKit, QML, QNLP (Beginner)

WORK EXPERIENCE_______________________________________________________________________
• Data Scientist, Katalyst Labs Pvt Ltd (www.dataflo.io), Chennai Jan 2021 - Till Date
• Data Scientist, Hexaware Technologies, Chennai Sep.2019 - Dec.2020 (1.4 Years)
• Data Science Mentor, Stigmata Technologies, Chennai (Freelance) Jan 2018 - Sep. 2019 (1.9 Years)
• Prof/Researcher, St. Joseph’s College of Engg, Chennai June 2005 - Sep 2019 (14.4 Years)
• Software Programmer, Swaminathan Networking, Chennai June 2003 - May 2005 (2 Years)

EDUCATION_______________________________________________________________________________
• Ph.D. (CSE) Manonmanium Sundaranar University, Tirunelveli, TamilNadu, India. Year: Mar.2015
• M.E. (CSE) Sathyabama University, Chennai, TamilNadu, India. Year: Apr.2009
• M.C.A. Manonmanium Sundaranar University, Tirunelveli, TamilNadu, India. Year: Apr.2002
• B.Sc. (Chem) Manonmanium Sundaranar University, Tirunelveli, TamilNadu, India. Year: Apr.1999

CERTIFICATIONS__________________________________________________________________________
• Microsoft Azure Certified Data Science Associate, January 2020
• Tableau Certified Data Scientist, May 2020
• AgileKB Certification in Agile Project Management & Delivery, May 2020
1
Curriculum Vitae
PROFESSIONAL EXPERIENCE_______________________________________________________________

1. DATA SCIENTIST, KATALYST LABS (WWW.DATAFLO.IO), CHENNAI JAN.2021 – TILL DATE


It’s a SaaS based IT product company founded in 2020. Their product called dataflo.io integrates data from
different CRM applications and Ad applications and generate data insights using ML/NLP/DL.
Responsibilities @ Katalyst Labs, Chennai:
• Data Science Team member, involved in various studies includes selecting use cases after the discussion
with clients
• Listed out various clients use cases in sales, marketing and Ad app data and examined the data sources
and quality of data
• Researched technology stack to develop, deploy the ML models in productization and done few PoCs to
understand the data
• Supported data engineering team to create the transformed data with needed KPIs

Project #1 (Marketing KPIs Anomaly Detection): Anomaly Detection in Marketing KPIs (Google Analytics) –
Google Analytics KPIs like sessions, sessions/user and bounce rate anomalies forecasting using time-series data. It
gives alerts and helps marketing people to understand the website performance and make decisions

Environment: AWS Sagemaker/python, S3, Snowflake, FIVETRANS, DBT, PostgreSQL, Sci-kit learn
Deployment: AWS Sagemaker/AWS Fargate/ECS/AWS Secrets Manager/AWS Cloud watch, GitLab/Flask
Responsibilities:
• Business use cases and requirements discussions with various stakeholders like External Clients, Product
Manager, Delivery head
• Participating in Agile and demonstrated the demo and progress of data science projects
• Support to write user scrum story and story refinement process
• Involved in setting data pipeline from data sources (Sales & Marketing apps) using FIVETRAN
• Setting data pipeline with snowflake using snow pipe and fetch the data
• Creating PostgreSQL to store the model inferences or output
• Model selection, Writing the source code using AWS Sagemaker, Cloning the code in GitLab
• Docker creation and REST API Implementation using FLASK
• Storing key credentials in AWS secret key manager
• Data pipeline and Model Pipeline Implementations using AWS Services (Sagemaker, Fargate)
• Data collection, feature engineering/selection, Hyper parameter tuning, model optimization
• Model Building, Deploying the model and monitoring the model (MLOPs)

Project #2 (Sales Conversion Prediction): Prediction of conversion probability of the sales leads – Fetch live
update of number of leads and applies ML Model to find the conversion probability of each new lead so that sales
people may focus high probability leads and make them to convert quickly.

Environment: AWS Sagemaker/python, AWS EMR, S3, Snowflake, FIVETRANS, DBT


Deployment: AWS Fargate/ECS, AWS Secrets Manager/GitLab/FLASK
Responsibilities:
• Client interaction and use case selection for data insights
• Scrum master for data science team - writing user story, allotting the tasks and story refinement
• Setting up data pipeline with snowflake and read the streaming data using AWS EMR/Spark
• Storing the data in AWS S3 and AWS Dynamo DB
• Data collection, feature engineering/selection, Model Building using AWS Sagemaker/Flask
• Cloning the source code with Gitlab, docker creation and CI/CD pipeline
• Support to docker containerization and sync with AWS fargate through ECS
• Exposure in Model monitoring using AWS Cloudwatch
• Co-ordinated with Front-end developers to bring the data insights in dash board
• Co-ordinated with data engineering team in ETL process that converts the raw data into transformed data.
• Exposure in MLOPs operations using AWS Cloudwatch.

2
Curriculum Vitae

2. DATA SCIENTIST, HEXAWARE TECHNOLOGIES, CHENNAI SEPT.2019 – DEC.2020 (1.4 YRS.)


Hexaware is an information technology and service provider company based in Mumbai, India. Founded in 1990
Responsibilities @ Hexaware Technologies, Chennai:
• Member, Decision Sciences Community which guides various Hexaware team to implement the data
science applications.
• Completed PoCs using AutoML tools like H2O.ai and Azure ML Studio
• Demonstrated PoC with webapp developed by streamlit package
• Research various tools and its applications based on the requirements of various teams and submitted
report

Project #1 (Amaze-DIF): Hexaware's Unique Automated Application Transformation Platform. With a high level
of automation and customization capabilities, it can deploy web applications on the cloud seamlessly and make
your application replatforming journey much simple yet secure and future-proof.
Now it is extended for ingesting the structured data into Azure cloud storage based on its metadata. Further
real time analytics will be done based on NLP and other ML algorithms. It’s a generic model which pulls the data
from different RDBMS tools like MySQL, Oracle and push the real processing data into the Azure storage.
Environment: Data bricks, Pyspark, Azure Synapse, Azure Data Lake, EventHUB, LogicApp, MySQL
Responsibilities:
• Connecting Azure Eventhub & Logicapp with MySQL Database and pulling the streaming data
• Processing the consumed data – Cleaning and finding the errors in the data set
• Finding the data match with existing data and loading the data into the Azure Data Lake.
• Pyspark will be used to do real time analytics based on the requirements and final version of data will be
stored in Azure synapse (SQL data warehouse)

Project #2 (Document Redaction): The customer is a world-renowned consulting company had expressed interest
in creating a common repository of all consulting documents without violating the data security rules &
regulations. This was to have knowledge and best practice sharing across the company. Natural language
Processing (NLP) sub fields such as NER and Redaction were applied and deployed successfully.
Environment: Python, SpaCy (NER and Document Redaction), GOCR (OCR Tool)
Responsibilities:
• Client Interaction to Documents Identification and Infrastructure Requirements
• Involved in various phases such as data collection, statistics analysis, Building ML model, optimization
• NLP Model testing using POSTMAN. Involved in Demonstration & Preparing Project Deliverables.
Key Achievements:
• Reduced manual work for identifying the document content and redacting the sensitive data
• More than 50%-man resources saved by the NLP solutions
• Accuracy was improved by 89% by removing/finding the invalid entities from the document

Project #3 (Port Call Cost Prediction): The shipping service company who provides various harbors services to
the ships around the world. They issue base line value for the various services they provide to ships before they
reach harbor. Since actual value varies enormously from base line value, they need to come up with a prediction
model to predict the baseline value correctly. Model and User Interface deployed in AWS cloud.
Environment: Python, Tableau, AWS Cloud Deployment, ML Algorithms: Linear Regression, XGBoost, NodeJS
Responsibilities:
• Use case discussion with Client, Understanding the client requirements from functional document
• Understanding the bus matrix and acquired data from their data warehouse
• Involved in various phases of data science life cycle such as data collection, statistics analysis, Building ML
model, optimization and deployment in Azure cloud/AWS. Model Testing in POSTMAN
Key Achievements:
• Helped the client to quote the base value correctly so that their goodwill maintained with their clients
• Increased customer retention by 23% and quotation of base value is consistent
• Increased model accuracy from 86% to 91% and involved in Model Demonstration to the client
3
Curriculum Vitae

Project #4 (Loss Prevention Insights): The client is the leading provider of insurance and related risk
management services to the international transport and logistics industry. The objective is to analyze internal and
publicly available data on equipment fire incidents to derive actionable insights on loss prevention for claims.
Environment: Conda, JupyterLab, Azure VM, Azure Blob Storage
Role and Responsibilities:
• End to end solution designing, identifying the tools, technologies, and recommendations
• Work with team members for solution implementation, review, and optimization.
• Collect fire incidents data from the public domain using Web Scraping and Web Crawling, and convert the
collected data to structured form using NLP techniques.
• Merge with internally available data, analyze combined data, and visualize the results.
• Interface with the stakeholders to remove roadblocks, Showcase results to clients and act on feedbacks
Key Achievements:
• Supports Insurance Underwriters to handle claim process or claim issuance quickly.
• Reduced considerable time of underwriting process.

Project #5 (Document Classification): The customer is a well-known company serving Health Information
Technology and Clinical Research. They receive millions of clinical Trial documents every year and QC team
review the documents, checks missing fields, classify the documents per reference model, extract metadata,
configure and upload it to ETMF. The spend average of 20 mins/doc and 400 FTE’s manual effort but still there
are high number of backlogs and errors in classification.
Environment: Conda, Jupyter Lab, Pandas, spacy, GOCR, SciKitLearn, Tensorflow
Solution: Provided NLP/ML based document classification, metadata extraction, validation and import of
documents in ETMF. The solution resulted in approximately 75% manual effort reduction.
Responsibilities:
• Involved as a Decision Science Community Member and Participated in DS Community forum
• Assisting in algorithm selection, technique evaluation and optimizing
• Contribution of common knowledge repository

Project #6 (Risk Leaders Board): The client is a marine/defense/property insurer who involved in the issuing
cargo insurance to different ship owners. Customer is running their business under Lloyd’s insurance market. The
proposed system allows them to choose the best leaders or laggards in the market for the particular policy
Environment: H2O.ai (AutoML), Python, PowerBI, ML Algorithms: XGBoost and K-Means
Responsibilities:
• Involved in Use case discussion with Client, Understanding the client requirements by studying functional
document. Understanding the bus matrix and acquired data from their data warehouse
• Involved in various phases of data science life cycle such as data collection, statistics analysis, Building ML
model, Hypothesis Testing, Z-Score, Hyper Parameter Tuning and deployment in Azure cloud/AWS
• Reporting to the onsite coordinator about the status of the test activities on a daily basis
• Applied Agile Methodology and attended Daily stand-up and sprint meeting
Key Achievements:
• Helped Insurance Underwriters to complete the claim process and new policy issue decision quickly.
• Reduced 20% of the underwriting process is reduced. Model accuracy was improved up to 93%
• Developed using AutoML tool and extracted meaningful insights from the historical data

RESEARCH CONTRIBUTIONS - INTERNATIONAL JOURNALS ONLY____________________________


1. R. Joseph Manoj, M.D. Anto Praveena “Analysis and Prediction on crime rate Ascended against Women and
childrens in India”, Research Journal of Pharmaceuticals, Biological and Chemical Sciences, 2018, Vol 9 Iss:2,
ISSN: 0975-8585 (SCI Indexed-Accepted-Yet to be Published)
2. Pandeeswaran Chelliah, Muthuraj Bose, Joseph Manoj. R, “Energy Efficient MAC Protocol Design for
Remote Patient Monitoring and Medication System Using Internet of Things (IoT)”, Journal of Electrical
Engineering, 2018, Vol.18 Issue: 4 PP. No:347-362. (SCI & SCOPUS Indexed).
3. R. Joseph Manoj, M.D. Anto Praveena, K. Vijayakumar, “An ACO–ANN based feature selection algorithm for
big data”, Cluster Computing (Springer), March 2018 (SCI & SCOPUS-Accepted).
4
Curriculum Vitae
4. R. Shiny Sharon, R. Joseph Manoj, “A Literature Survey on Deduplication of Health Care Data in Cloud
Environment”, Research Journal of Pharmaceuticals, Biological and Chemical Sciences” June 2017, Vol 8,
No.3 PP:1723-1727, ISSN: 0975-8585 (SCI and Scopus Indexed).
5. M.D. Anto Praveena, Dr. R. Joseph Manoj, “A Web Services Authentication System Based on Web Server
Log Analysis”, International Journal of Pharmacy & Technology, Sep 2016,Vol 8, No.3, pp.4545-4557, ISSN:
0975-766X (Scopus Indexed).
6. Dr.R.Joseph Manoj, Dr. Adlin Sheeba, Anto Praveena.M.D, “User Behavior Based Trust Estimation For Web
Service Access Control Model, International Journal of Advanced Engineering Technology, Vol 7, No.1, May
2016, pp.791-796, E-ISSN 0976-3945.
7. Dr.Adlin Sheeba, Dr.R.Joseph Manoj,”A Graph-Based Algorithm for Detection Of Composition Loops
Dynamically In Web Services”, International Journal of Advanced Engineering Technology, Vol 7, No.1, May
2016, pp.748-750, E-ISSN 0976-3945.
8. M.D. Anto Praveena, R. Joseph Manoj and Dr. V. Shanthi, “An Authentication Method for Secure Web
Services Access with Preventing Tautology Type SQL Injection” International Journal of Applied Engineering
Research, Vol 9, No.20, pp. 7531-7542, 2014, ISSN: 0973-4562 (Scopus Indexed).
9. R. Joseph Manoj and Dr.A. Chandrasekar, “An Enhanced Trust Authorization Based Web Services Access
Control Model”, Journal of Theoretical and Applied Information Technology, Vol 64,No.2, pp.522-
530 June 2014. ISSN: 1992-8645 (Scopus Indexed).
10. R. Joseph Manoj and Dr.A. Chandrasekar, M. D. Anto Praveena, “An Approach to detect and tautology type
SQL injection in web services based on web services based on XSchema Validation”, International Journal of
Engineering and computer science, Vol:3,No.1, pp.3695- 3699, Jan’14. ISSN: 2319-7242.
11. R. Joseph Manoj and Dr.A. Chandrasekar, “An Authentication System of web services based on Web Server
Log Analysis” International Journal of Engineering and Technology, Vol 5, No. 6, pp.4786-4793, Jan 2014.
ISSN: 0975-4024 (Scopus Indexed).
12. R. Joseph Manoj and Dr.A. Chandrasekar, “An Access control model of web services based on multifactor
trust-based management, International Review on Computers and software”, Vol 8, No.10, pp.2460-2466, Oct
2013. ISSN: 1828-6003 (Scopus Indexed).

Place: Chennai, India


Date: 25-Jun-21

You might also like