Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

VISVESVARAYA TECHNOLOGICAL UNIVERSITY,

BELAGAVI – 590 018

Artificial Intelligence and Machine Learning (18CS71)


A MINI PROJECT REPORT ON

“PERSONALITY PREDICTION SYSTEM”


Submitted as subject assignment work,

BY

TUSHIT SHUKLA 4AL18CS093

SUDARSHAN SHETTY 4AL18CS098

Under the Guidance of

Ms. Shilpa
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


ALVA’S INSTITUTE OF ENGINEERING AND TECHNOLOGY
MOODBIDRI-574225, KARNATAKA

2021 – 2022
ALVA’S INSTITUTE OF ENGINEERING AND TECHNOLOGY MIJAR,
MOODBIDRI D.K. -574225 KARNATAKA

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that, assignment work for the subject “Artificial Intelligence and

Machine Learning (18CS71)” has been successfully completed and report

submitted by TUSHIT SHUKLA(4AL18CS093), SUDARSHAN

SHETTY(4AL18CS098) during the academic year 2021– 2022. It is certified

that all corrections/suggestions indicated presentation session have been

incorporated in the report and scored Marks out of 10

and deposited in the departmental library.

Ms. Shilpa

Assistant Professor

i
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany a successful completion of any task would be
incomplete without the mention of people who made it possible, success is the epitome of hardwork
and perseverance, but steadfast of all is encouraging guidance.

So, with gratitude we acknowledge all those whose guidance and encouragement served as
beacon of light and crowned the effort with success.

We thank our Subject faculty Ms. Shilpa, Assistant Professor, Department of Computer
Science & Engineering, who has been our source of inspiration. She has been especially enthusiastic
in giving her valuable guidance and critical reviews.

We sincerely thank, Dr. Manjunath Kotari, Professor and Head, Department of Computer
Science & Engineering who has been the constant driving force behind the completion of the group
task.
We thank our beloved Principal Dr.Peter Fernandes, for his constant help and support
throughout.
We are indebted to Management of Alva’s Institute of Engineering and Technology,
Mijar, Moodbidri for providing an environment which helped us in completing our group task in
Artificial Intelligence and Machine Learning.

Also, we thank all the teaching and non-teaching staff of Department of Computer Science&
Engineering for the help rendered.

TUSHIT SHUKLA 4AL18CS093

SUDARSHAN SHETTY 4AL18CS098

ii
ABSTRACT
Machine learning (ML) is one of the intelligent methodologies that have shown promising results in the
domains of classification and prediction. One of the expanding areas necessitating good predictive accuracy is
sport prediction, due to the large monetary amounts involved in betting. In addition, club managers and owners
are striving for classification models so that they can understand and formulate strategies needed to win matches.
These models are based on numerous factors involved in the games, such as the results of historical matches,
player performance indicators, and opposition information. This paper provides a critical analysis of
the literature in ML, focusing on the application of NAÏVE BAISE to sport results prediction. In doing so, we
identify the learning methodologies utilised, data sources, appropriate means of model evaluation, and specific
challenges of predicting sport results. This then leads us to propose a novel sport prediction framework through
which ML can be used as a learning strategy. Our research will hopefully be informative and of use to those
performing future research in this application area.
TABLE OF CONTENTS

CHAPTER DESCRIPTIONS PAGE


NO. NO.
ACKNOWLEDGEMENT…………………………………………………. i
ABSTRACT………………………………………………………………... ii
TABLE OF CONTENT……………………………………………………. iii
LIST OF FIGURES………………………………………………………… iv
LIST OF TABLES………………………………………………………….. v

1. INTRODUCTION
1.1 INTRODUCTION TO SIGN LANGUAGE RECOGNITION 1
1.2 PROBLEM STATEMENT 2
1.3 OBJECTIVE 2

2. SYSTEM REQUIREMENT SPECIFICATION


2.1 HARDWARE SPECIFICATION 3
2.2 SOFTWARE SPECIFICATIONS 4

3. SYSTEM DESIGN
3.1 DATA-FLOW DIAGRAM 5
3.2 USE CASE DIAGRAM 6

4. IMPLEMENTATION
4.1 PSUEDO-CODE 7-14

5. TESTING
5.1 UNIT TESTING 15-17
5.2 TESTING OF OUR MODEL 17-18
6. RESULTS 19-21

7. CONCLUSION 22
REFERENCES………………………………………………………… 23

iii
LIST OF FIGURES

Figure no. Description Page no.

Fig 3.1 Data Flow Diagram 5

Fig 3.2 Use Case Diagram 6

Fig 3.3 Working Diagram 6

Fig 6.1 Start personality test 18

Fig 6.2 Personality Range 19

Fig 6.3 Graph after the personality test 19

Fig 6.3(a) Graph of personality test 20

iv
SPORTS WINNING PREDICTION

CHAPTER 01

INTRODUCTION
Cricket is a well-known sport. The popularity of cricket and its viewership has increased tremendously
in the past two decades. To cater to potential future growth, global market research was commissioned
by the International Cricket Council (ICC) which revealed that cricket has more than one billion fans
worldwide, with the potential for significant growth. Among all formats of cricket, the popularity of T-
20 Internationals (T20Is) is the highest. All of these fans of cricket are eager about upcoming cricket
events and tournaments. They desire to learn about the prospects of their favorite team

1.1 INTRODUCTION ABOUT THE TOPIC


Our study aims to find the winner of the 7 th edition T20 Cricket World Cup. For this purpose, the
dataset has been extracted from ESPN Cricinfo1 . After collecting the dataset, various techniques were
used to check its integrity and cleanliness. Afterwards, different machine learning algorithms belong to
the decision tree algorithm family such as Decision trees (ID3, C4.5, and Extra Trees), and Random
Forest Classifier were used to build predictive models. After testing these Classifiers on the extracted
dataset(s), we found that Random Forest has shown better results with custom accuracy of 80.86% as
compared to other Classifiers. Additionally, Australia is predicted to be the winner of T20 world cup
2020 as a result of this work. This dataset comprised of four datasets: fixtures of ICC World Cup T20,
results of previous matches, current ICC T20 rankings, and previous appearances in T20 World cups.
This model can also be applied to predict the winner of other cricket events or even other sports with
anature.

Department of Computer Science and Engineering Page 1


SPORTS WINNING PREDICTION

1.2 PROBLEM STATEMENT

Cricket Outcome Predictor for prediction of ODI cricket matches. Modern classification techniques like
Naïve Bayes, Support vector machine, Random Forest used for the prediction of results, and based on
these outcomes a comparative study was conducted. Several factors involved in the outcome of ODI
cricket matches, including Home ground, toss first and second innings, condition of the pitch, and team
strategies. But all these factors and strategies vary from time to time when gaming proceeds. As per the
accuracy of the model used for the prediction of the outcome of ODI cricket matches the Naïve Bayes
classifier assumed to be the best one when the predictor was independent and perform well in this case
when the dataset was an imbalance.

1.3 OBJECTIVE

The main objective of sports prediction is to improve team performance and enhance the chances of
winning the game. The value of a win takes on different forms like trickles down to the fans filling the
stadium seats, television contracts, fan store merchandise, parking, concessions, sponsorships,
enrollmentandretention.
Followed the general machine learning workflow step-by-step
1. Data cleaning and formatting.
2. Exploratory data analysis.
3. Feature engineering and selection.
4. Compare several machine learning models on a performance metric.
5. Perform hyper-parameter tuning on the best model.
6. Evaluate the best model on the testing set.
7. Interpret the model results.
8. Draw conclusions and document work.

Department of Computer Science and Engineering Page 2


SPORTS WINNING PREDICTION

CHAPTER 02

SYSTEM REQUIREMENT SPECIFICATION


A System Requirements Specification is a document or set of documentation that describes
the features and behavior of a system or software application. It includes a variety of elements
that attempts to define the intended functionality required by the customer to satisfy their
different users.
2.1 HARDWARE REQUIREMENT
 400 MB hard disk space.
 4 GB/8 GB RAM.
 Intel i3 or any processor above it and 4 core CPU.
 Operating systems of windows 10 will be sufficient.
 Active internet connection and a scanner optional.

2.2 SOFTWARE SPECIFICATIONS


 Windows(x64) Operating System
 OpenCV, TensorFlow, Keros, Numpy
 Required Datasets

2.2.1 TENSOR FLOW AND OPEN CV

TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for machine
learning applications such as neural networks. It is used for both research and production at
Google. Among the applications for which TensorFlow is the foundation, are automated image-
captioning software, suchas DeepDream.
OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly
aimed at real-time computer vision. Originally developed by Intel, it was later supported by
Willow Garage then Itseez (which was later acquired by Intel). The library is cross-platform and
free for use under the open- source Apache 2 License. Starting with 2011, OpenCV features
GPU acceleration for real-time operations. The first alpha version of OpenCV was released to

Department of Computer Science and Engineering Page 3


SPORTS WINNING PREDICTION

the public at the IEEE Conference on Computer Vision and Pattern Recognition in 2000, and
five betas were released between 2001 and 2005. The second major release of the OpenCV was
in October 2009. OpenCV 2 includes major changes to the C++ interface, aiming at easier,
more type-safe patterns, new functions, and better implementations for existing ones in terms of
performance (especially on multi-core systems).

Department of Computer Science and Engineering Page 4


SPORTS WINNING PREDICTION

CHAPTER 03
SYSTEM DESIGN
This chapter discusses about system design which describes about the Data flow diagram and
Entity Relationship Diagram in which both of them explain about overall structure of the system.

3.1 DATA FLOW DIAGRAM

Fig 3.1 Data Flow Diagram

The flowchart of learning performance prediction system is illustrated in Fig. 3.1. a known
target value and data is taken from the training set its features are represented and is sent to
learning algorithm then the data is sent to the learning mode and an unseen data from the
prediction set is taken and its feature is represented and the predicted target value is given from
the Learning mode. And specific output is given and the personality is predicted.

Department of Computer Science and Engineering Page 5


SPORTS WINNING PREDICTION

3.2 USE CASE DIAGRAM

Personality Types

Response to Questions

Normalized Scores

USER Displays Graph As


Result

Fig 3.2 USE CASE Diagram

Fig 3.3 Working Diagram

Despite the USE CASE diagram which explains the overall structure of the system. A use case
diagram can summarize the details of your system's users and their interactions with the system.
To build one, you'll use a set of specialized symbols and connectors. In this use-case diagram
personality traits is the model which comprehends the relationship between personality and
academic behavior. This model was defined by several independent sets of researchers who used
factor analysis of verbal descriptors of human behavior.

Department of Computer Science and Engineering Page 6


SPORTS WINNING PREDICTION

CHAPTER 04

IMPLEMENTATION
This chapter discusses about the implementation of the code which describes main functions of the
system.

4.1 Implementation
Started by importing all the libraries and dependencies.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
import matplotlib.ticker as plticker
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

Loaded the csv file

Department of Computer Science and Engineering Page 7


SPORTS WINNING PREDICTION

Display the head of the data file

Filter the matches played by India

Department of Computer Science and Engineering Page 8


SPORTS WINNING PREDICTION

Create a column for the matches played in 2010

Display the results of the newly created dataframe

Department of Computer Science and Engineering Page 9


SPORTS WINNING PREDICTION

Delete the columns that won't affect match results

Convert team-1 and team-2 from categorical variables to continous inputs

Department of Computer Science and Engineering Page 10


SPORTS WINNING PREDICTION

Adding the ICC rankings

Loop to add teams to new prediction dataset based on the ranking position of each team

Department of Computer Science and Engineering Page 11


SPORTS WINNING PREDICTION

Dummy variables and drop winning team column

Get the results of league matches

Department of Computer Science and Engineering Page 12


SPORTS WINNING PREDICTION

Function Code
def clean_and_predict(matches, ranking, final, logreg):

# Initialization of auxiliary list for data cleaning


positions = []

# Loop to retrieve each team's position according to ICC ranking


for match in matches:
positions.append(ranking.loc[ranking['Team'] == match[0],'Position'].iloc[0])
positions.append(ranking.loc[ranking['Team'] == match[1],'Position'].iloc[0])

# Creating the DataFrame for prediction


pred_set = []

# Initializing iterators for while loop


i=0
j=0

# 'i' will be the iterator for the 'positions' list, and 'j' for the list of matches (list of tuples)
while i < len(positions):
dict1 = {}

# If position of first team is better then this team will be the 'Team_1' team, and vice-versa
if positions[i] < positions[i + 1]:
dict1.update({'Team_1': matches[j][0], 'Team_2': matches[j][1]})
else:
dict1.update({'Team_1': matches[j][1], 'Team_2': matches[j][0]})

# Append updated dictionary to the list, that will later be converted into a DataFrame
pred_set.append(dict1)
i += 2
j += 1
Department of Computer Science and Engineering Page 13
SPORTS WINNING PREDICTION

# Convert list into DataFrame


pred_set = pd.DataFrame(pred_set)
backup_pred_set = pred_set

# Get dummy variables and drop winning_team column


pred_set = pd.get_dummies(pred_set, prefix=['Team_1', 'Team_2'], columns=['Team_1', 'Team_2'])

# Add missing columns compared to the model's training dataset


missing_cols2 = set(final.columns) - set(pred_set.columns)
for c in missing_cols2:
pred_set[c] = 0
pred_set = pred_set[final.columns]

pred_set = pred_set.drop(['Winner'], axis=1)

# Predict!
predictions = logreg.predict(pred_set)
for i in range(len(pred_set)):
print(backup_pred_set.iloc[i, 1] + " and " + backup_pred_set.iloc[i, 0])
if predictions[i] == 1:
print("Winner: " + backup_pred_set.iloc[i, 1])
else:
print("Winner: " + backup_pred_set.iloc[i, 0])
print("")

Department of Computer Science and Engineering Page 14


SPORTS WINNING PREDICTION

CHAPTER 05

TESTING
This chapter discusses about the personality of a person. Testing is quality assurance
mechanism for catching residual errors, test techniques includes are not limited to the
processes of executing a program or application with the intent of finding software
bugs.Testing and predicting the personality of a person by responding to few questions.

5.1 UNIT TESTING


MODULE DIVISION

Module Division is the process of dividing collection of source files required in the project
into discrete units of functionality. Each module can be independently built, tested and
debugged. Below are the modules which are divided in our project.
1. Data Collection
2. Attribute Selection
3. Pre-processing of data
4. Prediction of personality

Data Collection
First step for prediction system is data collection and deciding about the training and testing
dataset. In this project we have imported dataset from Kaggle website which includes 70% of
training dataset and 30% of testing dataset. Data collection is defined as the procedure of
collecting, measuring and analyzing accurate insights for research using standard validated
techniques. A researcher can evaluate their hypothesis on the basis of collected data. In most
cases, data collection is the primary and most important step for research, irrespective of the
field of research. The approach of data collection is different for different fields of study,
depending on the required information

Department of Computer Science and Engineering Page 15


SPORTS WINNING PREDICTION

Training Dataset:

In a dataset, a training set is implemented to build up a model, while a test (or validation) set is
to validate the model built. Here, you have the complete training dataset. You can extract
features and train to fit a model and so on.

Testing Dataset:

Here, once the model is obtained, you can predict using the model obtained on the training set.
Some data may be used in a confirmatory way, typically to verify that a given set of input to a
given function produces some expected result. Other data may be used in order to challenge
the ability of the program to respond to unusual, extreme, exceptional, or unexpected input.

Attribute Selection

Attribute of dataset are property of dataset which are used for system and for personality many
attributes are like heart gender of the person, age of the person ,Big five traits like Openness,
Neuroticism, Extraversion, Agreeableness, Consciousness( value 1 -10). The importance of
feature selection can best be recognized when you are dealing with a dataset that contains a
vast number of features. This type of dataset is often referred to as a high dimensional dataset.
Now, with this high dimensionality, comes a lot of problems such as - this high dimensionality
will significantly increase the training time of your machine learning model, it can make your
model very complicated which in turn may lead to Overfitting.

Pre-Processing of Data

Pre-processing needed for achieving best result from the machine learning algorithms. In this,
we gathered dataset and it was pre-processed before it is sent to training stage. Sampling is a
very common method for selecting a subset of the dataset that we are analysing. In most cases,
working with the complete dataset can turn out to be too expensive considering the
memory.Using a sampling algorithm can help us reduce the size of the dataset to a point where
we can use a better, but more expensive, machine learning algorithm. When we talk about

Department of Computer Science and Engineering Page 16


SPORTS WINNING PREDICTION

data, we usually think of some large datasets with huge number of rows and columns. While
that is a likely scenario, it is not always the case — data could be in so many different forms:
Structured Tables, Images, Audio files, Videos etc. Machines don’t understand free text, image
or video data as it is, they understand 1s and 0s. So we pre-processthedata

Prediction of Personality Classification

In this, system we used machine learning algorithms is performed and whichever algorithm is
used which it gives best accuracy for personality prediction. By applying all this modules
finally the personality is predicted and the final result is personality of the user.by using the
training and testing dataset the personality of the user is classified.

5.2 TESTING OF OUR MODEL

TABLE 5.2.1 PREDECTING THE BEHAVIORAL TRAIT IS SUCCESSFUL

SI NO. 1
Feature being tested Behavioral Traits
Description It will predict test for already collected
response to data of different matches
and predict the final output .
Input Data set Of Teams
Expected Output Display the Result as accuracy and team which
wins.
Actual Output Display the Result as accuracy and team which
wins.
Remark SUCCESSFUL

From the above table 5.2.1 all the behavioral traits are calculated by answering all the questionnaire
given and with that all the expected and actual output is calculated. If the actual output has the
normalized score as the expected output then the remark is given as successful, along with the
graph

Department of Computer Science and Engineering Page 17


SPORTS WINNING PREDICTION

TABLE NO 5.2.2 PREDECTING THE BEHAVIORAL TRAIT IS UNSUCCESSFUL

SI NO. 2
Feature being tested Behavioral Traits
Description It will predict test for already collected
response to data of different matches and predict
the final output
Input Data set Of Teams
Expected Output Display the Result as accuracy and team which
wins.
Actual Output Display the Result as accuracy and team which
wins.
Remark SUCCESSFUL

From the above table 5.2.2 all the behavioral traits are calculated by answering all the qustionnaire

given and with all the expected inputs the graph and score is calculated. If the actual output is

equal to the expected output then the remark shows successful. From the above test the actual input

does not match with te expected output, therefore te remark shows unsuccessful

Department of Computer Science and Engineering Page 18


SPORTS WINNING PREDICTION

CHAPTER 06
RESULT

Fig 6.1 Final Predicted output

From the above fig 6.1 the final predicted output has been given which help the user
to predict the winner using naive baise theorem.

Department of Computer Science and Engineering Page 19


SPORTS WINNING PREDICTION

Fig 6.2 Accuracy of the training and test data


From the fig 6.3 the the logreg help the user to get the training and test data accuracy.

Department of Computer Science and Engineering Page 20


SPORTS WINNING PREDICTION

CHAPTER 07
CONCLUSION
This project, we discuss about how the personality is identified using different classification
algorithms. Here we study relationship between user and his/her personality. In this we used
logistic regression because it gives best accuracy around 86.53% while compare to other
algorithms that are used previously like naive Bayes , SVM , Logistic regression is fast and
give accurate results compared to other algorithms. Thus the personality is automatically
classified by the system after user attempts the survey by the data set provided in the
back end . sports prediction is more in recent times so further in future more accurate traits
can be added. Further any improvement can be done using the data set and algorithms to
improve the accuracy and can be helpful for career guidance module,. This project discuss
about sport winning Prediction basically for the world cup prediction.

Department of Computer Science and Engineering Page 21


SPORTS WINNING PREDICTION

REFERENCES
[1] Fazel Keshtkar, Candice Burkett, Haiying Li and Arthur C. Graesser,Using Data
Mining Techniques to Detect the Personality of Players in an Educational Game

[2] R. Wald,T. M. Khoshgoftaar,A. Napolitano Using Twitter Content to Predict


Psychopathy

[3] YagoSaez , Carlos Navarro , Asuncion Mochon and Pedro Isasi, A system for
personality and happiness detection.

[4] Golbeck, J., Robles, C., and Turner, K. 2011a. Predicting Personality with Social
Media. In Proc of the 2011 annual conference extended abstracts on Human factors in
computing systems.

[5] DURGESH K.SRIVASTAVA, LEKHA BHAMBHU, “DATA Classification using


Support Vector Machine,” Journal of Theoretical and Applied Information Technology

[6] YILUN WANG, “Understanding Personality through social media,” International of


computer Science stand ford University.

Department of Computer Science and Engineering Page 22

You might also like