Professional Documents
Culture Documents
Partha's Resume
Partha's Resume
Partha's Resume
SKILLS
• Python, PySpark, Data Analysis, Machine Learning, MLOps, MySQL, SQL, Deep Learning, Computer Vision, Natural Language Processing, Tableau, Numpy,
Pandas, Stats & Probability, Neural Networks, Snowflake, Streamsets, Databricks, Streamlit, Flask, AWS, Azure, Product Analytics, Control-M, Cassandra,
Github Actions, Bitbucket, Hadoop, Hive, Java
EXPERIENCE
Data Scientist Jul 2023 - Apr 2024
Vestas Bangalore
Recruited as a Data Scientist, currently working as several Use Cases on Machine Learning, Time Series Forecasting, AWS cloud deployment using
SageMaker, Application Development using Flask and Streamlit, Workflow orchestraction using Databricks,Building data pipelines using Streamsets as data
ingestion tool etc.
Conducted exploratory data analysis (EDA) on sensor data, manufacturing logs, maintenance records to identify key failure patterns and maintenance triggers
by using Numpy, Pandas, Matplotlib, Seaborn & SQL, implemented business logic using Python & Pyspark and stored clearned data into AWS S3 in parque
format.
Deployed model using MLOPs by integrating with Github actions by creating CI/CD pipelines.
Employed 5 different Machine Learning algorithms (Logistic Regression, Decision Tree, Random Forest, Bagging Classifier & Boosting Classifier such
as XGBoost) & 5 fold cross validation was performed to calculate an average customized metric score on training dataset.
Developed a predictive maintenance model using historical sensor and operational data to forecast potential failures before they occur & achieved a remarkable
F1 Score of 0.79 after refining the model iteratively, thus improving efficiency in component failure identification.
Genpact Bangalore
Spearheaded the development of a classification model for identification of potential customers, utilized different aspects of Data Science(through Projects) like
Data Analysis, Machine Learning, Application Development using Flask and Streamlit, Heroku cloud deployment, Workflow orchestraction using Databricks,
cleaning and analyzing customers data with the help of Snowflake, AWS SageMaker and Python.
Conducted data preprocessing and feature engineering on large-scale insurance datasets by using Numpy, Pandas, Matplotlib, Seaborn & EDA, including
policy information, claim history, and external risk factors.
Implemented end-to-end flow starting from building data pipelines using Streamsets for data ingestion,applying ETL operations,stored data into Avro and
parque format in staging and target tables, implemented business logic using Python and Pyspark and deployed using MLOps.
Worked on KPI Monitoring, KPI Analysis, Capacity for different Technologies and currently working on insurance management system for different customers,
delivering an increase in the Project Deliveries by 15%.
Provided Deliveries in Tableau for customer presentations & engagements based on different technologies, increasing the number of business by 30 %.
Decreased claims processing time by 40% and improved customer satisfaction.
Deloitte Bangalore
Trained and worked on different aspects of Data Science (through Projects) like Data Analysis, Machine Learning, Application Development using Flask &
Streamlit, AWS cloud deployment, Workflow orchestraction using Databricks, conducting Statistical & Probabilistic Analysis on a Dataset of customer's
details.
Built Streamsets & Python Competency to support towards Automations & pipelines dvelopment, reducing the manual hours by 20%.
Conducted data preprocessing by performing Exploratory Data Analysis (EDA), SQL, Numpy, Pandas, Matplotlib, Seaborn. Implemented Pyspark for
distributed data processing, deployed model using MLOps, SageMaker by creating CI/CD pipelines using Github Actions.
Developed optimization models to streamline the supply chain and reduce costs.
Intuit Bangalore
Increased the performance of the job from 45 mins to 7 mins by optimizing the batch size in Streamsets data collector tool.
Build data ingestion pipelines using Streamsets & process bigdata (10 GB daily) & schedule jobs by using Control-M.
Utilized Cassandra in staging layer, Tableau for visualization, Hive for data query & analysis, Hadoop for data storage by using Cloudera, Python & Pyspark
for data processing and ETL operations.
Handled bigdata processing by enabling Auto-Scaling and End-time in Databricks.
Built scalable ETL pipelines using Streamsets & schedule jobs by using Control-M.
Optimized batch jobs & Tungsten issues by handling DBNULL. Support all round the clock for 10+ applications.
Designed, develop & support data processing pipelines using Python & Pyspark to process the customer eligibility data.
Migrated from on-premises Cloudera environment (such as Hadoop, Hive, Hue used for data storage and analysis purpose) to cloud based Snowflake
datawarehouse.
Worked in RTB (Run the Bank) team to provide the support for Wealth planning application 24/5 availability.
Expertised in production support processes such as outage/incident, current issues, Onsite/ Offshore coordination workflow, documentation, shift
handover process, Escalation procedures etc.
Analyzed & debugged the production issues and provide the update as quickly as possible.
Initiated conference call / bridge in case of incidents based on severity and make sure to resolve the issues as quickly as possible.
Involved with multiple stakeholders in an organization and handle their support related queries/ issues.
Implemented tech stacks like Java 1.6, J2EE, UNIX, Linux, Oracle 10g, WebSphere, Putty, AutoSys, Service Now (SNOW) as part of this project.
Led on production calls for the development team - understand the infrastructure and production environment, assist with a site availability issue participating
on calls and engage the developers as needed to resolve the problem both short term and long term.
Specialized in production support processes such as outage/incident management. Conducted daily huddle calls with onsite & offshore teams.
Automated application checkout and make sure all the applications are up and running 24/7 after any weekend deployments.
Displayed custom message by using HTML 5 in UI during maintance activities using Log Viewer tool.
Created & exposed web API using Flask and Web development using Java 6 .
Integrated Spring 3.0 MVC with Structs & Hibernate using MVC architecture.
PROJECTS
Time Series Forecasting for Wind Turbine
Led a cross-functional project to develop predictive maintenance models for critical components in wind turbines, aimed at reducing downtime and optimizing
maintenance schedules. Collaborated with engineering, operations, and IT teams to define project objectives and data requirements.
Analyzed sensor data from over 800 wind turbines to identify patterns and optimize energy output that led to a 22% improvement in overall turbine performance.
Evaluated model performance using metrics such as precision, recall, and F1-score, achieving a reduction of 30% in maintenance costs and improved turbine
uptime by 20%.
Saving $4 million annually in maintenance costs, enabling proactive maintenance and improving turbine reliability.
ACHIEVEMENTS
Having more than 5+ years of teaching experience in Data Science along with AI&ML instructor at one of the India's largest Ed-Tech platform.
Completed Oracle Certified Java(1.6) Programmer(OCJP) with 88%.
Received best employee award & sport award in HCL Technologies.
EDUCATION
Scaler 2024
BPUT 2011
BE/B.Tech/BS in Electronics 8.01 CGPA
Completed B.Tech in Electronics and Telecommunication Engineering from 2007-2011, with an aggregate of 8.01 CGPA.