Welcome to Scribd!

Skip carousel

220514_Report

Uploaded by

abhishekraj0093

0% found this document useful (0 votes)

2 views6 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

2 views6 pages

220514_Report

Uploaded by

abhishekraj0093

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 6

Search inside document

CS253 Python Assignment Report

Kawaljeet Singh (220514)

April 13, 2024

1. Methodology
1.1 Data Preprocessing Steps:
1. Reading data: The code reads training and test data from CSV files using
Pandas.

2. Converting amount strings to numeric: A custom function convert to numeric

is defined to convert amount strings (e.g., ”10 Crore+”) to numeric values.
3. Handling missing values: Rows with missing values are dropped from both
the training and test datasets using dropna.

4. Target and features definition: The target variable is defined as ’Edu-

cation’, while the independent variables are extracted from the dataset,
excluding ’ID’ and ’Candidate’ columns.

1.2 Feature Engineering:

No explicit feature engineering is performed in the code. However, the categor-
ical features are encoded using one-hot encoding, and the ’Education’ feature is
encoded using label encoding.

1.3 Identifying Outliers:

There is no explicit step for identifying outliers in the code. Outlier detection
and treatment could be a valuable addition to improve model robustness.

1.4 Dimensionality Reduction Techniques:

There is no dimensionality reduction technique applied in the code. Techniques
like PCA (Principal Component Analysis) could be considered to reduce the
dimensionality of the dataset, especially if there are many features.

1
1.5 Normalization, Standardization, or Transformation:
No normalization or standardization is applied explicitly. However, one-hot
encoding is performed for categorical variables using OneHotEncoder.

1.6 Other Considerations:

• Model Selection: The code employs a Support Vector Machine (SVM)
classifier.
• Cross-Validation: Stratified K-Fold cross-validation with 5 splits is used
for model evaluation.
• Grid Search: Grid search is conducted over a parameter grid to find the
best hyperparameters for the SVM model.
• Evaluation Metric: The F1 score with micro averaging is used as the
evaluation metric for the grid search.

• Model Deployment: Finally, the best model selected from grid search is
used to make predictions on the test data, and the predictions are saved
to a CSV file for submission.

2. Experiment Details

Table 1: Summary of Models Used

Model Support Vector Machine (SVM)
Reason for Use SVMs are suitable for classification tasks and can handle
high-dimensional data well.
C: 0.1, 1, 10
Hyperparameters
Kernel: linear, poly, rbf, sigmoid
Reason for Hyperparameter Tuning Hyperparameters like C and kernel type significantly
impact SVM model performance. Tuning these parameters
helps optimize the model’s predictive capability.
Cross-Validation Strategy Stratified K-Fold (5 splits)
Reason for Cross-Validation Cross-validation helps in estimating the model’s
performance on unseen data and prevents overfitting.
Evaluation Metric F1 Score (micro)
Reason for Metric Choice F1 Score provides a balance between precision and
recall, making it suitable for imbalanced datasets
and multi-class classification tasks.

2.1 Data Insights

2
3
4
5
3. Results
Public F1 score is 0.24841.
Private F1 score is 0.24647.
Public Leaderboard Rank: 69.
Private Leaderboard Rank: 65.

4. References
Pandas
Intro to Machine Learning
Model Selection and Boosting - Scikit-learn
Classification using SVC
Category Encoders Examples
Pipelines
Seaborn
Matplotlib
openai

5. Github repository
CS253 ML Assignment

Turtle Diagram (Calibration)
Document1 page
Turtle Diagram (Calibration)
semaphore6
89% (9)
Data Mininig Project
Document28 pages
Data Mininig Project
Karthikeyan Manimaran
67% (3)
Some Common Standards Include
Document46 pages
Some Common Standards Include
kardgl
No ratings yet
SAP Security00002
Document3 pages
SAP Security00002
Sumanth
No ratings yet
Vineela Ann1
Document9 pages
Vineela Ann1
vineela
No ratings yet
AI Lab Assignment-10 Ishaan Bhadrike
Document7 pages
AI Lab Assignment-10 Ishaan Bhadrike
NarutoBoy
No ratings yet
Data Science Notes
Document36 pages
Data Science Notes
mdluffyyy300
No ratings yet
Anomaly Detection Using The Numenta Anomaly Benchmark
Document8 pages
Anomaly Detection Using The Numenta Anomaly Benchmark
Mallikarjun patil
No ratings yet
ML External Lab Assessment
Document3 pages
ML External Lab Assessment
nitingupta936868
No ratings yet
Capstone Design Project Weekly Progress Report: Dynamic Autoselection of Machine Learning Model in Networks of Cloud
Document6 pages
Capstone Design Project Weekly Progress Report: Dynamic Autoselection of Machine Learning Model in Networks of Cloud
sudhakar kethana
No ratings yet
report
Document4 pages
report
muhammad anas
No ratings yet
Bangla Hand Written Digit Recognition
Document19 pages
Bangla Hand Written Digit Recognition
Khondoker Abu Naim
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Capstone Project 2
Document27 pages
Capstone Project 2
pranavi p
No ratings yet
AWS Certified Machine Learning Specialty Exam Guide
Document7 pages
AWS Certified Machine Learning Specialty Exam Guide
Deepak Gupta
No ratings yet
AWS Certified Machine Learning Specialty Exam Guide
Document7 pages
AWS Certified Machine Learning Specialty Exam Guide
Deepak Gupta
No ratings yet
Movie Prediction
Document9 pages
Movie Prediction
Vinutha Devendra
No ratings yet
Masterarbeit Oleksandr Panchenko PDF
Document106 pages
Masterarbeit Oleksandr Panchenko PDF
Holly Murray
No ratings yet
Technical Answers To Real World Problems - CBS1901 Vellore Institute of Technology, Vellore Summer Special Semester 2021-22
Document4 pages
Technical Answers To Real World Problems - CBS1901 Vellore Institute of Technology, Vellore Summer Special Semester 2021-22
Charan Bhogaraju
No ratings yet
School of Computer Science Engineering and Technology
Document1 page
School of Computer Science Engineering and Technology
PRANAT MANGLA
No ratings yet
CS20B1060
Document16 pages
CS20B1060
Keyboard Kumar
No ratings yet
INNOVATION - PDF Phrase 2
Document9 pages
INNOVATION - PDF Phrase 2
hemapardeep8
No ratings yet
DL
Document125 pages
DL
Rishabh Singh
No ratings yet
Week 10 - PROG 8510 Week 10
Document16 pages
Week 10 - PROG 8510 Week 10
Vineel Kumar
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Modeling
Document4 pages
Modeling
hunterrr611
No ratings yet
Deep Learning
Document124 pages
Deep Learning
gfhghjvb
No ratings yet
Data Preprocessing
Document2 pages
Data Preprocessing
Hemant
No ratings yet
Parallel and Distributed Systems
Document3 pages
Parallel and Distributed Systems
arungupta652
No ratings yet
Anomaly Detection in Images CIFAR-10
Document9 pages
Anomaly Detection in Images CIFAR-10
Mallikarjun patil
No ratings yet
Predicting Churn
Document37 pages
Predicting Churn
SAHARUDIN BIN JAPARUDIN Moe
No ratings yet
ML Models Concepts
Document32 pages
ML Models Concepts
Zarfa Masood
No ratings yet
Building Good Training Sets UNIT 1 PART2
Document46 pages
Building Good Training Sets UNIT 1 PART2
Aditya Sharma
No ratings yet
Important Questions
Document18 pages
Important Questions
shouryarastogi9760
No ratings yet
Machine Learning Lab Manual
Document23 pages
Machine Learning Lab Manual
Prakash Jeeva
No ratings yet
Prediction On Iris
Document14 pages
Prediction On Iris
harikabangarambandi
No ratings yet
Pipelines
Document17 pages
Pipelines
vgokuul007
No ratings yet
0 - Worsheet Template
Document10 pages
0 - Worsheet Template
Syed
No ratings yet
Milestone3 Ee4423
Document4 pages
Milestone3 Ee4423
Lộc Nguyễn Tấn
No ratings yet
C1000-154 STU C1000154v2STUSGC1000154
Document10 pages
C1000-154 STU C1000154v2STUSGC1000154
Gisele Souza
No ratings yet
Deep Learning Quantum
Document124 pages
Deep Learning Quantum
dhruvgautam380
No ratings yet
ML Models Concepts - I
Document28 pages
ML Models Concepts - I
Zarfa Masood
No ratings yet
Bhatt Pds Print - 77-85
Document9 pages
Bhatt Pds Print - 77-85
Harsh Shah
No ratings yet
Machine Learning Toolkit User Manual
Document7 pages
Machine Learning Toolkit User Manual
Eduardo Loyo
No ratings yet
10 PDF
Document12 pages
10 PDF
Aishwarya Das
No ratings yet
AutoML and XAI PDF
Document12 pages
AutoML and XAI PDF
susheendhar vijay
No ratings yet
AWS Certified Machine Learning - Specialty - Exam Guide
Document3 pages
AWS Certified Machine Learning - Specialty - Exam Guide
kaushal
No ratings yet
Assignment CSE 411
Document6 pages
Assignment CSE 411
asrar tamim
No ratings yet
Advanced Estimation Manual
Document57 pages
Advanced Estimation Manual
Orestes Gomez G
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
Document20 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
vanjchao
No ratings yet
Slide 1: Fast and Informative Model Selection Using Learning Curve Cross-Validation
Document71 pages
Slide 1: Fast and Informative Model Selection Using Learning Curve Cross-Validation
Teja .Manchala
No ratings yet
Ai-Ml in 5G Challenge Report
Document11 pages
Ai-Ml in 5G Challenge Report
Usha Chandrakala
No ratings yet
Data Pre-Processing Python For Beginner
Document12 pages
Data Pre-Processing Python For Beginner
Bongkar Taktik
No ratings yet
Data Pre-Processing Python For Beginner
Document12 pages
Data Pre-Processing Python For Beginner
Bongkar Taktik
No ratings yet
BigData Assessment2 26230605
Document14 pages
BigData Assessment2 26230605
abaid choughtai
No ratings yet
7 ML
Document38 pages
7 ML
nandukannanmelath
No ratings yet
Pravesh 6301
Document11 pages
Pravesh 6301
Shreyas Paraj
No ratings yet
Python for Machine Learning: From Fundamentals to Real-World Applications
From Everand
Python for Machine Learning: From Fundamentals to Real-World Applications
Kameron Hussain
No ratings yet
DAC Phase4
Document5 pages
DAC Phase4
eraasim64
No ratings yet
CSC580_CTA6 _Option_1_Anderson_Cleon
Document4 pages
CSC580_CTA6 _Option_1_Anderson_Cleon
Cleon
No ratings yet
Remarque Sur L'effect Des Different Valeurs C
Document5 pages
Remarque Sur L'effect Des Different Valeurs C
Unknowen
No ratings yet
Machine Learning Outlier Detection by Using Autoencoders
Document14 pages
Machine Learning Outlier Detection by Using Autoencoders
Sri Vigneshwara Enterprises
No ratings yet
Assignment 04: Submitted To
Document8 pages
Assignment 04: Submitted To
rose
No ratings yet
Computer: Quarter 4 - Module 13
Document6 pages
Computer: Quarter 4 - Module 13
Mylene
0% (1)
WS-011 Windows Server 2019 Administration
Document58 pages
WS-011 Windows Server 2019 Administration
Albert Jeremy
No ratings yet
XAMPP and NetBeans Installation On Linux
Document9 pages
XAMPP and NetBeans Installation On Linux
Praveen Bachu
No ratings yet
Psa6-Rice in A Box
Document51 pages
Psa6-Rice in A Box
FRANZ MILLAN DE VERA
No ratings yet
Rust Ownership, Borrowing, and Lifetimes Mark McDonnell
Document1 page
Rust Ownership, Borrowing, and Lifetimes Mark McDonnell
Sayed Ali sina Hussaini
No ratings yet
Worksheet 6 Social, Legal and Cultural Issues: Unit 6 Communication
Document5 pages
Worksheet 6 Social, Legal and Cultural Issues: Unit 6 Communication
wh45w45hw54
No ratings yet
Basics of Photography Course PDF
Document8 pages
Basics of Photography Course PDF
Jay Das
No ratings yet
How To Connect It ?: Foseal Car WIFI OBD2 Scanner Check Engine Diagnostic Tool User Manual
Document2 pages
How To Connect It ?: Foseal Car WIFI OBD2 Scanner Check Engine Diagnostic Tool User Manual
Gabriel Goulart
No ratings yet
2021 - Compiler Design (Csen 4111)
Document5 pages
2021 - Compiler Design (Csen 4111)
poxemik697covbasecom
No ratings yet
Notes Airflow MQTT
Document6 pages
Notes Airflow MQTT
Sejal Patil
No ratings yet
Process Flowsheeting
Document2 pages
Process Flowsheeting
Ese Frank
No ratings yet
PCS-931S - X - Technical Manual - EN - Overseas General - X - R1.70
Document689 pages
PCS-931S - X - Technical Manual - EN - Overseas General - X - R1.70
andriohead
No ratings yet
Project-List - 1
Document1 page
Project-List - 1
Shakth Creations
No ratings yet
Caltech - Post Graduate Program in DevOps - v13 - Slimup
Document22 pages
Caltech - Post Graduate Program in DevOps - v13 - Slimup
Omar Test
No ratings yet
Quick Start Guide ORION Controller: Smarter. Greener. Together
Document10 pages
Quick Start Guide ORION Controller: Smarter. Greener. Together
moumen Boughrara
No ratings yet
ODK Documentation
Document571 pages
ODK Documentation
Ayoub Fouzai
No ratings yet
WPC Practical 3
Document2 pages
WPC Practical 3
Avadhut Puranik
No ratings yet
System Analysis and Design: Unit Overview
Document18 pages
System Analysis and Design: Unit Overview
Chun Chun
No ratings yet
Genigraphics Poster Template 36x48 Trifold
Document1 page
Genigraphics Poster Template 36x48 Trifold
Boubaker Boudiar
No ratings yet
Agile Project Delivery With Focused Build For SAP Solution Manager (Update Q3/2021)
Document33 pages
Agile Project Delivery With Focused Build For SAP Solution Manager (Update Q3/2021)
Oralia Romero
No ratings yet
Document Automation Tool - AIBridge ML
Document11 pages
Document Automation Tool - AIBridge ML
AIBridge ML
100% (1)
Using of Catia V5 Software For Teaching at Faculty of Production Technology and Management
Document4 pages
Using of Catia V5 Software For Teaching at Faculty of Production Technology and Management
Fernado Prado
No ratings yet
9130 Eaton Ups
Document4 pages
9130 Eaton Ups
Janko Gardašević
No ratings yet
Logcat
Document66 pages
Logcat
Leo Calderon
No ratings yet
TRIE Trees: Search Engines Genome Analysis Data Analytics
Document6 pages
TRIE Trees: Search Engines Genome Analysis Data Analytics
Darshan Madesh
No ratings yet
Welcome To The 8pt Grid
Document10 pages
Welcome To The 8pt Grid
Efrain Maestre
No ratings yet
Bachelor Arbeit
Document152 pages
Bachelor Arbeit
cristi_pet4742
No ratings yet