Professional Documents
Culture Documents
Artificial Intelligence and Machine Learning (18CS71) : "Personality Prediction System"
Artificial Intelligence and Machine Learning (18CS71) : "Personality Prediction System"
BY
Ms. Shilpa
Assistant Professor
2021 – 2022
ALVA’S INSTITUTE OF ENGINEERING AND TECHNOLOGY MIJAR,
MOODBIDRI D.K. -574225 KARNATAKA
CERTIFICATE
This is to certify that, assignment work for the subject “Artificial Intelligence and
Ms. Shilpa
Assistant Professor
i
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany a successful completion of any task would be
incomplete without the mention of people who made it possible, success is the epitome of hardwork
and perseverance, but steadfast of all is encouraging guidance.
So, with gratitude we acknowledge all those whose guidance and encouragement served as
beacon of light and crowned the effort with success.
We thank our Subject faculty Ms. Shilpa, Assistant Professor, Department of Computer
Science & Engineering, who has been our source of inspiration. She has been especially enthusiastic
in giving her valuable guidance and critical reviews.
We sincerely thank, Dr. Manjunath Kotari, Professor and Head, Department of Computer
Science & Engineering who has been the constant driving force behind the completion of the group
task.
We thank our beloved Principal Dr.Peter Fernandes, for his constant help and support
throughout.
We are indebted to Management of Alva’s Institute of Engineering and Technology,
Mijar, Moodbidri for providing an environment which helped us in completing our group task in
Artificial Intelligence and Machine Learning.
Also, we thank all the teaching and non-teaching staff of Department of Computer Science&
Engineering for the help rendered.
ii
ABSTRACT
Machine learning (ML) is one of the intelligent methodologies that have shown promising results in the
domains of classification and prediction. One of the expanding areas necessitating good predictive accuracy is
sport prediction, due to the large monetary amounts involved in betting. In addition, club managers and owners
are striving for classification models so that they can understand and formulate strategies needed to win matches.
These models are based on numerous factors involved in the games, such as the results of historical matches,
player performance indicators, and opposition information. This paper provides a critical analysis of
the literature in ML, focusing on the application of NAÏVE BAISE to sport results prediction. In doing so, we
identify the learning methodologies utilised, data sources, appropriate means of model evaluation, and specific
challenges of predicting sport results. This then leads us to propose a novel sport prediction framework through
which ML can be used as a learning strategy. Our research will hopefully be informative and of use to those
performing future research in this application area.
TABLE OF CONTENTS
1. INTRODUCTION
1.1 INTRODUCTION TO SIGN LANGUAGE RECOGNITION 1
1.2 PROBLEM STATEMENT 2
1.3 OBJECTIVE 2
3. SYSTEM DESIGN
3.1 DATA-FLOW DIAGRAM 5
3.2 USE CASE DIAGRAM 6
4. IMPLEMENTATION
4.1 PSUEDO-CODE 7-14
5. TESTING
5.1 UNIT TESTING 15-17
5.2 TESTING OF OUR MODEL 17-18
6. RESULTS 19-21
7. CONCLUSION 22
REFERENCES………………………………………………………… 23
iii
LIST OF FIGURES
iv
SPORTS WINNING PREDICTION
CHAPTER 01
INTRODUCTION
Cricket is a well-known sport. The popularity of cricket and its viewership has increased tremendously
in the past two decades. To cater to potential future growth, global market research was commissioned
by the International Cricket Council (ICC) which revealed that cricket has more than one billion fans
worldwide, with the potential for significant growth. Among all formats of cricket, the popularity of T-
20 Internationals (T20Is) is the highest. All of these fans of cricket are eager about upcoming cricket
events and tournaments. They desire to learn about the prospects of their favorite team
Cricket Outcome Predictor for prediction of ODI cricket matches. Modern classification techniques like
Naïve Bayes, Support vector machine, Random Forest used for the prediction of results, and based on
these outcomes a comparative study was conducted. Several factors involved in the outcome of ODI
cricket matches, including Home ground, toss first and second innings, condition of the pitch, and team
strategies. But all these factors and strategies vary from time to time when gaming proceeds. As per the
accuracy of the model used for the prediction of the outcome of ODI cricket matches the Naïve Bayes
classifier assumed to be the best one when the predictor was independent and perform well in this case
when the dataset was an imbalance.
1.3 OBJECTIVE
The main objective of sports prediction is to improve team performance and enhance the chances of
winning the game. The value of a win takes on different forms like trickles down to the fans filling the
stadium seats, television contracts, fan store merchandise, parking, concessions, sponsorships,
enrollmentandretention.
Followed the general machine learning workflow step-by-step
1. Data cleaning and formatting.
2. Exploratory data analysis.
3. Feature engineering and selection.
4. Compare several machine learning models on a performance metric.
5. Perform hyper-parameter tuning on the best model.
6. Evaluate the best model on the testing set.
7. Interpret the model results.
8. Draw conclusions and document work.
CHAPTER 02
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for machine
learning applications such as neural networks. It is used for both research and production at
Google. Among the applications for which TensorFlow is the foundation, are automated image-
captioning software, suchas DeepDream.
OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly
aimed at real-time computer vision. Originally developed by Intel, it was later supported by
Willow Garage then Itseez (which was later acquired by Intel). The library is cross-platform and
free for use under the open- source Apache 2 License. Starting with 2011, OpenCV features
GPU acceleration for real-time operations. The first alpha version of OpenCV was released to
the public at the IEEE Conference on Computer Vision and Pattern Recognition in 2000, and
five betas were released between 2001 and 2005. The second major release of the OpenCV was
in October 2009. OpenCV 2 includes major changes to the C++ interface, aiming at easier,
more type-safe patterns, new functions, and better implementations for existing ones in terms of
performance (especially on multi-core systems).
CHAPTER 03
SYSTEM DESIGN
This chapter discusses about system design which describes about the Data flow diagram and
Entity Relationship Diagram in which both of them explain about overall structure of the system.
The flowchart of learning performance prediction system is illustrated in Fig. 3.1. a known
target value and data is taken from the training set its features are represented and is sent to
learning algorithm then the data is sent to the learning mode and an unseen data from the
prediction set is taken and its feature is represented and the predicted target value is given from
the Learning mode. And specific output is given and the personality is predicted.
Personality Types
Response to Questions
Normalized Scores
Despite the USE CASE diagram which explains the overall structure of the system. A use case
diagram can summarize the details of your system's users and their interactions with the system.
To build one, you'll use a set of specialized symbols and connectors. In this use-case diagram
personality traits is the model which comprehends the relationship between personality and
academic behavior. This model was defined by several independent sets of researchers who used
factor analysis of verbal descriptors of human behavior.
CHAPTER 04
IMPLEMENTATION
This chapter discusses about the implementation of the code which describes main functions of the
system.
4.1 Implementation
Started by importing all the libraries and dependencies.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
import matplotlib.ticker as plticker
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
Loop to add teams to new prediction dataset based on the ranking position of each team
Function Code
def clean_and_predict(matches, ranking, final, logreg):
# 'i' will be the iterator for the 'positions' list, and 'j' for the list of matches (list of tuples)
while i < len(positions):
dict1 = {}
# If position of first team is better then this team will be the 'Team_1' team, and vice-versa
if positions[i] < positions[i + 1]:
dict1.update({'Team_1': matches[j][0], 'Team_2': matches[j][1]})
else:
dict1.update({'Team_1': matches[j][1], 'Team_2': matches[j][0]})
# Append updated dictionary to the list, that will later be converted into a DataFrame
pred_set.append(dict1)
i += 2
j += 1
Department of Computer Science and Engineering Page 13
SPORTS WINNING PREDICTION
# Predict!
predictions = logreg.predict(pred_set)
for i in range(len(pred_set)):
print(backup_pred_set.iloc[i, 1] + " and " + backup_pred_set.iloc[i, 0])
if predictions[i] == 1:
print("Winner: " + backup_pred_set.iloc[i, 1])
else:
print("Winner: " + backup_pred_set.iloc[i, 0])
print("")
CHAPTER 05
TESTING
This chapter discusses about the personality of a person. Testing is quality assurance
mechanism for catching residual errors, test techniques includes are not limited to the
processes of executing a program or application with the intent of finding software
bugs.Testing and predicting the personality of a person by responding to few questions.
Module Division is the process of dividing collection of source files required in the project
into discrete units of functionality. Each module can be independently built, tested and
debugged. Below are the modules which are divided in our project.
1. Data Collection
2. Attribute Selection
3. Pre-processing of data
4. Prediction of personality
Data Collection
First step for prediction system is data collection and deciding about the training and testing
dataset. In this project we have imported dataset from Kaggle website which includes 70% of
training dataset and 30% of testing dataset. Data collection is defined as the procedure of
collecting, measuring and analyzing accurate insights for research using standard validated
techniques. A researcher can evaluate their hypothesis on the basis of collected data. In most
cases, data collection is the primary and most important step for research, irrespective of the
field of research. The approach of data collection is different for different fields of study,
depending on the required information
Training Dataset:
In a dataset, a training set is implemented to build up a model, while a test (or validation) set is
to validate the model built. Here, you have the complete training dataset. You can extract
features and train to fit a model and so on.
Testing Dataset:
Here, once the model is obtained, you can predict using the model obtained on the training set.
Some data may be used in a confirmatory way, typically to verify that a given set of input to a
given function produces some expected result. Other data may be used in order to challenge
the ability of the program to respond to unusual, extreme, exceptional, or unexpected input.
Attribute Selection
Attribute of dataset are property of dataset which are used for system and for personality many
attributes are like heart gender of the person, age of the person ,Big five traits like Openness,
Neuroticism, Extraversion, Agreeableness, Consciousness( value 1 -10). The importance of
feature selection can best be recognized when you are dealing with a dataset that contains a
vast number of features. This type of dataset is often referred to as a high dimensional dataset.
Now, with this high dimensionality, comes a lot of problems such as - this high dimensionality
will significantly increase the training time of your machine learning model, it can make your
model very complicated which in turn may lead to Overfitting.
Pre-Processing of Data
Pre-processing needed for achieving best result from the machine learning algorithms. In this,
we gathered dataset and it was pre-processed before it is sent to training stage. Sampling is a
very common method for selecting a subset of the dataset that we are analysing. In most cases,
working with the complete dataset can turn out to be too expensive considering the
memory.Using a sampling algorithm can help us reduce the size of the dataset to a point where
we can use a better, but more expensive, machine learning algorithm. When we talk about
data, we usually think of some large datasets with huge number of rows and columns. While
that is a likely scenario, it is not always the case — data could be in so many different forms:
Structured Tables, Images, Audio files, Videos etc. Machines don’t understand free text, image
or video data as it is, they understand 1s and 0s. So we pre-processthedata
In this, system we used machine learning algorithms is performed and whichever algorithm is
used which it gives best accuracy for personality prediction. By applying all this modules
finally the personality is predicted and the final result is personality of the user.by using the
training and testing dataset the personality of the user is classified.
SI NO. 1
Feature being tested Behavioral Traits
Description It will predict test for already collected
response to data of different matches
and predict the final output .
Input Data set Of Teams
Expected Output Display the Result as accuracy and team which
wins.
Actual Output Display the Result as accuracy and team which
wins.
Remark SUCCESSFUL
From the above table 5.2.1 all the behavioral traits are calculated by answering all the questionnaire
given and with that all the expected and actual output is calculated. If the actual output has the
normalized score as the expected output then the remark is given as successful, along with the
graph
SI NO. 2
Feature being tested Behavioral Traits
Description It will predict test for already collected
response to data of different matches and predict
the final output
Input Data set Of Teams
Expected Output Display the Result as accuracy and team which
wins.
Actual Output Display the Result as accuracy and team which
wins.
Remark SUCCESSFUL
From the above table 5.2.2 all the behavioral traits are calculated by answering all the qustionnaire
given and with all the expected inputs the graph and score is calculated. If the actual output is
equal to the expected output then the remark shows successful. From the above test the actual input
does not match with te expected output, therefore te remark shows unsuccessful
CHAPTER 06
RESULT
From the above fig 6.1 the final predicted output has been given which help the user
to predict the winner using naive baise theorem.
CHAPTER 07
CONCLUSION
This project, we discuss about how the personality is identified using different classification
algorithms. Here we study relationship between user and his/her personality. In this we used
logistic regression because it gives best accuracy around 86.53% while compare to other
algorithms that are used previously like naive Bayes , SVM , Logistic regression is fast and
give accurate results compared to other algorithms. Thus the personality is automatically
classified by the system after user attempts the survey by the data set provided in the
back end . sports prediction is more in recent times so further in future more accurate traits
can be added. Further any improvement can be done using the data set and algorithms to
improve the accuracy and can be helpful for career guidance module,. This project discuss
about sport winning Prediction basically for the world cup prediction.
REFERENCES
[1] Fazel Keshtkar, Candice Burkett, Haiying Li and Arthur C. Graesser,Using Data
Mining Techniques to Detect the Personality of Players in an Educational Game
[3] YagoSaez , Carlos Navarro , Asuncion Mochon and Pedro Isasi, A system for
personality and happiness detection.
[4] Golbeck, J., Robles, C., and Turner, K. 2011a. Predicting Personality with Social
Media. In Proc of the 2011 annual conference extended abstracts on Human factors in
computing systems.