Welcome to Scribd!

Hackathon Best Practices

Uploaded by

0% found this document useful (0 votes)

10 views2 pages

1. Load and understand the training and test data separately, checking for missing values, bad data, and categorical variables. 2. Clean the data by treating missing values, removing bad data, and encoding categorical variables in both the training and test sets. Perform feature engineering and scaling as needed. 3. Build models on the training data and make predictions on the test data. Submit predictions by matching IDs and replacing the target column or creating a new dataframe with IDs and predictions.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

10 views2 pages

Hackathon Best Practices

Uploaded by

Daniel Wee

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Hackathon Best Practices

1. Load the training and test data separately

2. Understand the data (check for each of the following in both the train & test dataset)
a. Check the head and tail of the data
b. Use the info() and describe() functions for more information
c. Look for the presence of null values in the dataset
d. Look for the presence of bad data or unwanted characters like ‘$’ or ‘#’ in the
numerical columns
3. Clean the data
a. Treat for missing values in both the train & test set
b. Remove bad data values in both the train & test set
c. Encode the categorical object variables in both the train & test set
d. Perform Feature Engineering if necessary
e. Scale/Normalize the dataset if necessary
4. Perform model building using any algorithm you feel will suit this problem:
a. Fit the model on the training data
b. Make predictions on the testing data
c. Store the predicted values in an array
5. Submission (Approach 1)
a. Import the sample submission file
b. Check that your test data and the sample submission file have the same ID column
sequence (if not, then sort them such that each ID’s individual predicted value is
placed on the corresponding ID in submission file)
c. Replace the “target” column in the submission file with your array of predictions
d. Export this file to a CSV - this is your personal submission file
e. Make sure it has the same headers as the sample submission file
f. Upload to the platform and check your score to make sure
6. Submission (Approach 2)
a. From your original test dataset, take the ID
b. Create a new dataframe with the ID and the corresponding predicted values
c. Export that dataframe to a CSV - this is your personal submission file
d. The number of rows (including headers) & columns should match with that in the
sample submission file, otherwise the platform will not accept the submission
e. Upload to the platform and check your score to make sure
7. Now go back to Step 3 (this is an iterative process)
a. Check to see if scaling the data helps with performance
b. Check to see if additional feature engineering helps with performance
c. Try removing unnecessary variables (use feature importance) & check to see if that
helps with performance
d. Play around with different algorithm choices & check to see if that helps improve final
model performance
e. Try GridSearch, RandomSearch or other model tuning techniques to see if that
improves the model’s final performance

A few pointers to keep in mind:

1. Do not drop null values from the test dataset.
2. Make sure that if you perform any preprocessing step on the training dataset, it is also
performed on the test dataset.
3. Make sure that the hyperparameter “n_jobs = -1” is set to that value while fitting the model, to
ensure that parallel processing takes place and decreases the time taken to fit the model.
However, this may take up all your local computational resources, and your PC may start
slowing down for other running tasks.
4. It is recommended to make copies of the datasets at every checkpoint, so you don’t have to
restart from scratch. In case of any issues, this will allow you to directly access the dataset
from the latest checkpoint and start from there instead.

Peer-Graded Assignment - Week 4
Document2 pages
Peer-Graded Assignment - Week 4
ICT DA13
33% (3)
Cloud Digital Leader Notes
Document16 pages
Cloud Digital Leader Notes
Sanjoy Biswas
No ratings yet
Peer-Graded Assignment - Week 4
Document2 pages
Peer-Graded Assignment - Week 4
ICT DA13
No ratings yet
CS593: Data Structure and Database Lab Take Home Assignment - 4 (10 Questions, 100 Points)
Document3 pages
CS593: Data Structure and Database Lab Take Home Assignment - 4 (10 Questions, 100 Points)
Lafoot Babu
No ratings yet
House Price Prediction: Project Description
Document11 pages
House Price Prediction: Project Description
POLURU SUMANTH NAIDU STUDENT - CSE
No ratings yet
Peer-Graded Assignment - Week 4
Document2 pages
Peer-Graded Assignment - Week 4
ICT DA13
No ratings yet
CSE114 Midterm 2 Sample Without Solutions
Document11 pages
CSE114 Midterm 2 Sample Without Solutions
urbedglvbh
No ratings yet
FRP C & DS Dumps
Document711 pages
FRP C & DS Dumps
Karthik Sivakumar
0% (1)
Generic Best Practices For Hackathon
Document1 page
Generic Best Practices For Hackathon
SOUJANYA BONI
No ratings yet
Assignment 2: Hive
Document11 pages
Assignment 2: Hive
HPot PotTech
No ratings yet
Lab-11 Random Forest
Document2 pages
Lab-11 Random Forest
KamranKhan
No ratings yet
Problem Statement
Document1 page
Problem Statement
suryansh
No ratings yet
Microsoft Ucertify dp-100
Document15 pages
Microsoft Ucertify dp-100
Steven Doh
No ratings yet
Data Scientist Exercise
Document2 pages
Data Scientist Exercise
Krishnababu Gopal
No ratings yet
Pravesh 6301
Document11 pages
Pravesh 6301
Shreyas Paraj
No ratings yet
File 482621234 482621234 - Assignment 2 - 7378831553794248
Document5 pages
File 482621234 482621234 - Assignment 2 - 7378831553794248
Bob Philip
No ratings yet
Lab 10
Document7 pages
Lab 10
johnandshelly621
No ratings yet
Deployment of Machine Learning Model To Microsoft Azure: Ineuron - Ai
Document18 pages
Deployment of Machine Learning Model To Microsoft Azure: Ineuron - Ai
Giang Do
No ratings yet
Ass3 v1
Document4 pages
Ass3 v1
Reeya Prakash
No ratings yet
Introduction To Java Programming Comprehensive Version 9th Edition Liang Test Bank 1
Document12 pages
Introduction To Java Programming Comprehensive Version 9th Edition Liang Test Bank 1
andrea
100% (55)
Important Questions
Document4 pages
Important Questions
Adilrabia rsl
No ratings yet
COL 774: Assignment 2
Document3 pages
COL 774: Assignment 2
Aditya Kumar
No ratings yet
Unit Iii
Document3 pages
Unit Iii
112 Pranav Khot
No ratings yet
To Implement The Ensembling Technique of Blending in C
Document4 pages
To Implement The Ensembling Technique of Blending in C
ffraki323
No ratings yet
PF Assignment-03 Revised
Document4 pages
PF Assignment-03 Revised
mdanial4488
No ratings yet
B19CSP305 Fods Manual Ece
Document49 pages
B19CSP305 Fods Manual Ece
KISHORE ANTHONY
No ratings yet
Table of Contents:: Predictnow - Ai Lets You Apply Machine Learning Predictions To Your Data Without Any Programming
Document15 pages
Table of Contents:: Predictnow - Ai Lets You Apply Machine Learning Predictions To Your Data Without Any Programming
sg
No ratings yet
DP 203
Document16 pages
DP 203
m3aistore
No ratings yet
hw1 Problem Set
Document8 pages
hw1 Problem Set
Billy bob
No ratings yet
Create A Data-Driven Functional Test
Document12 pages
Create A Data-Driven Functional Test
aksagar22
No ratings yet
Multiplechoice Jan 20177787
Document9 pages
Multiplechoice Jan 20177787
nsy42896
No ratings yet
QB 1
Document11 pages
QB 1
ksaikrishna5601
No ratings yet
Assignment 1
Document2 pages
Assignment 1
Arnav Yadav
No ratings yet
Pipelines
Document17 pages
Pipelines
vgokuul007
No ratings yet
YUKI ENDO - FInalexam
Document2 pages
YUKI ENDO - FInalexam
Janric David Costillas
No ratings yet
Building Good Training Sets UNIT 1 PART2
Document46 pages
Building Good Training Sets UNIT 1 PART2
Aditya Sharma
No ratings yet
Data Science & Big Data - Practical
Document7 pages
Data Science & Big Data - Practical
RAKESH G
No ratings yet
Assignment1 COMP723 2019
Document4 pages
Assignment1 COMP723 2019
imran5705074
No ratings yet
CS36 Lab 5
Document4 pages
CS36 Lab 5
Syed Zaidi
No ratings yet
test 3 phần CNU
Document24 pages
test 3 phần CNU
PhạmHọc
No ratings yet
CaseStudy ClassificationandEvaluation
Document4 pages
CaseStudy ClassificationandEvaluation
vettithala
No ratings yet
E4 DS203 2023 Sem2
Document2 pages
E4 DS203 2023 Sem2
sparee1256
No ratings yet
SEHH2239 Asg2 2324 S2 v5
Document7 pages
SEHH2239 Asg2 2324 S2 v5
LL 3708
No ratings yet
Smai A1 PDF
Document3 pages
Smai A1 PDF
Zubair Ahmed
No ratings yet
C9550-606.exam.60q: Number: C9550-606 Passing Score: 800 Time Limit: 120 Min
Document31 pages
C9550-606.exam.60q: Number: C9550-606 Passing Score: 800 Time Limit: 120 Min
Khemais Abdallah
No ratings yet
HiLCoE School of Computer Science and Technology Model Exit Exam
Document14 pages
HiLCoE School of Computer Science and Technology Model Exit Exam
Mahlet Arega
No ratings yet
Problem Statement
Document21 pages
Problem Statement
saravanan iyer
No ratings yet
Assignment 1 - LP1
Document14 pages
Assignment 1 - LP1
bbad070105
No ratings yet
Multiplechoice Nov 201665656
Document9 pages
Multiplechoice Nov 201665656
nsy42896
No ratings yet
4.1.3.5 Lab - Decision Tree Classification
Document11 pages
4.1.3.5 Lab - Decision Tree Classification
Nurul Fadillah Jannah
No ratings yet
檔案2
Document11 pages
檔案2
jiangmuggle
No ratings yet
Fraud Claim Detection
Document13 pages
Fraud Claim Detection
sivakumar R
No ratings yet
JAVA -_ QB
Document23 pages
JAVA -_ QB
Pooja Shree C
No ratings yet
1z0-064 Certkill 16qa
Document10 pages
1z0-064 Certkill 16qa
s102s102
No ratings yet
NNDL Lab Record
Document26 pages
NNDL Lab Record
A21CS17 BAVYA.K
No ratings yet
2324 BigData Lab3
Document6 pages
2324 BigData Lab3
Elie Al Howayek
No ratings yet
1st Mandatory PDF
Document10 pages
1st Mandatory PDF
Waqar Ahmed
No ratings yet
ML Lab Manual - Ex No. 1 To 9
Document26 pages
ML Lab Manual - Ex No. 1 To 9
Hari Haran
No ratings yet
projetML Ing1 24
Document2 pages
projetML Ing1 24
Mouhib Hermi
No ratings yet
Data Mining Lab Questions
Document47 pages
Data Mining Lab Questions
Sneha Pinky
100% (1)
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Thesis Experiment Results Template SFI
Document4 pages
Thesis Experiment Results Template SFI
Daniel Wee
No ratings yet
Thesis Experiment Results Template SFI
Document2 pages
Thesis Experiment Results Template SFI
Daniel Wee
No ratings yet
Training Materials
Document8 pages
Training Materials
Daniel Wee
No ratings yet
JD - JS Sales & Marketing Executive
Document1 page
JD - JS Sales & Marketing Executive
Daniel Wee
No ratings yet
Agreement To Rent
Document1 page
Agreement To Rent
Daniel Wee
No ratings yet
Answer To Training Materials
Document8 pages
Answer To Training Materials
Daniel Wee
No ratings yet
20230110133611095
Document1 page
20230110133611095
Daniel Wee
No ratings yet
Remitano Legal Status in Malaysia
Document7 pages
Remitano Legal Status in Malaysia
Daniel Wee
No ratings yet
Lab 3 V 1
Document4 pages
Lab 3 V 1
raynor
No ratings yet
More SQL Data Definition: Database Systems Lecture 6 Natasha Alechina
Document31 pages
More SQL Data Definition: Database Systems Lecture 6 Natasha Alechina
Rana Gaballah
No ratings yet
Ms Excel Notes English
Document11 pages
Ms Excel Notes English
Vivek Kumar
0% (1)
Converting A Hundred Million Integers To Strings Per Second
Document5 pages
Converting A Hundred Million Integers To Strings Per Second
mirandowebs
No ratings yet
Algorithms and Flowcharts
Document78 pages
Algorithms and Flowcharts
Asfia Asif
50% (2)
Proposal Keno1
Document16 pages
Proposal Keno1
Keno Tube
No ratings yet
KL-002!11!1 - Questions & Answers - Loorex
Document1 page
KL-002!11!1 - Questions & Answers - Loorex
Ïñtïsãm Ähmãd
No ratings yet
Manasi Rakhecha: Work Experience Skills
Document1 page
Manasi Rakhecha: Work Experience Skills
Shareena Fernandes
0% (1)
CV - Ahmed - Abbassi - Python-Data Engineer Senior - EN
Document3 pages
CV - Ahmed - Abbassi - Python-Data Engineer Senior - EN
ab sys ab sys
No ratings yet
Selection and Repetition Control Structure, Pseudocode Algorithms Using Sequence, Selection and Repetition
Document29 pages
Selection and Repetition Control Structure, Pseudocode Algorithms Using Sequence, Selection and Repetition
Muhammad Rafif
No ratings yet
Farmers Buddy
Document112 pages
Farmers Buddy
shafna
75% (4)
Application Design and Development
Document52 pages
Application Design and Development
Kamatchi Kartheeban
No ratings yet
Autodesk Civil 3D
Document3 pages
Autodesk Civil 3D
Someshwar Rao Thakkallapally
No ratings yet
Debug
Document15 pages
Debug
Samuel Burgarelli
No ratings yet
AmitAudumbarRaut (2 11)
Document2 pages
AmitAudumbarRaut (2 11)
thilak
No ratings yet
Hotspot Virtual Machine Garbage Collection Tuning Guide
Document50 pages
Hotspot Virtual Machine Garbage Collection Tuning Guide
M1414
No ratings yet
Assembly Language
Document15 pages
Assembly Language
Stephanie Bush
100% (1)
Cosc 344
Document196 pages
Cosc 344
Ahmad Hamma
100% (1)
Ict. MCQ
Document14 pages
Ict. MCQ
Commerce HoD
No ratings yet
Add & Activate - en
Document2 pages
Add & Activate - en
Stevie Chang
No ratings yet
11 - Dynamic Interaction Modelling
Document23 pages
11 - Dynamic Interaction Modelling
hdsasdad
No ratings yet
cst499 Capstone Report - Michael Cisneros and Jason Moon
Document68 pages
cst499 Capstone Report - Michael Cisneros and Jason Moon
api-478848348
No ratings yet
COPA - Configuration - 1
Document273 pages
COPA - Configuration - 1
Ankit Gupta
No ratings yet
Logcat 1703850153356
Document6 pages
Logcat 1703850153356
rifaldimahmud70
No ratings yet
Answers For MQ Interview Questions
Document4 pages
Answers For MQ Interview Questions
chlaxmikanth
No ratings yet
Database Objects in DBMS
Document3 pages
Database Objects in DBMS
prasad
No ratings yet
Index: - What Are Psets? - Creation of Psets. - What Is Analysis - Level Parameter?
Document16 pages
Index: - What Are Psets? - Creation of Psets. - What Is Analysis - Level Parameter?
Sathvik Adduri
No ratings yet
NSO Day 2 Yang XML and Rest Api
Document101 pages
NSO Day 2 Yang XML and Rest Api
Ala Jebnoun
No ratings yet