Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

SAVITRIBAI PHULE PUNE UNIVERSITY

A SEMINAR REPORT

ON

[ SOFT COMPUTING IN DATASCIENCE ]


SUBM ITTED BY

STUDENT NAME: SHIMPI AMOGH DEVRAM


SEAT NO: S191054206

UNDER THE GUIDANCE OF

Prof.Prathamesh Bajare

DEPARTMENT OF COMPUTER ENGINEERING


NBN SINHGAD TECHNICAL INSTITUTESCAMPUS
NBN SINHGAD SCHOOL OF
ENGINEERING 10/1, AMBEGAON (BK),
PUNE-411041
Year 2021−2022

1
Department of Computer Engineering
NBN Sinhgad School of Engineering,Pune-41
Date:-

CERTIFICATE
This is to certify that seminar report entitled,

Soft Computing In Data Science


has been successfully completed by ,

SHIMPI AMOGH DEVRAM TE(20) -S191054206

Towards the partial fulfillment of the degree of Bachelor of


Engineering in Computer Engineering as awarded by the Savitribai
Phule Pune University, at NBN Sinhgad School of Engineering,
Ambegaon (BK), Pune, during the academic year 2020-21.

PROF. PRATHAMESH BAJARE PROF. SHAILESH P. BENDALE

Name of the Guide


(Computer Engineering) Head of the Department
(Computer Engineering)

DR. S.P. PATIL


Principal
NBN Sinhgad School of
Engineering

2
Abstract

Educational data mining has been widely used to predict student


performance and establish intervention strategies to improve that
performance. Most studies have implemented machine learning
algorithms for interventions but the use of data mining in appraising
student performance in learning soft. some of the studies that have
explored the use of machine learning in predicting student
performance in software learning have only used Random Forest, and
as such, this study used the same dataset to implement 7 other
algorithms and establish the most efficient. The study used two
different sets of data and established that Neural Network was the
most efficient with regards to the first dataset although Random Forest
was the most efficient with regards to the second dataset. Both the NN
graphics and RF tree diagram are presented, and the predictions from
the two models also compared.

Keywords: Data mining · Random Forest · Performance prediction


Software engineering · Machine learning
Acknowledgement
My first and foremost acknowledgment is to my supervisor and guide
PROF . PRATHAMESH BAJARE . During the long journey of this study,
he supported me in every aspect. He was the one who helped and
motivated me to propose research in this field and inspired me with
his enthusiasm onresearch, his experience, and his lively character.

I express true sense of gratitude to my guide PROF . PRATHAMESH


BAJARE for his perfect valuable guidance, all the time support and
encouragement that he gave me.

I would also like to thanks our head of department Prof. Shailesh


P. Bendale , Principal and Management inspiring me and providing
all lab and other facilities, which made this seminar presentation
very convenient.

I really thankful to all those who rendered their valuable help for
successful completion on seminar presentation.

SHIMPI AMOGH DEVRAM (TE)


Name of the Student
List of Figures
1 INRODUCTION……………………………………………
2 METHODS OF SOFT COMPUTING……………………..
3 ALGORITHMIC EQUATIONS..................................................,,,,
4 RESULTS AND DISCUSSIONS…………………………
Contents
Certificate
Abstract
Acknowledgemet
List of Figures

1 Introduction
2 Methodology
1. Data Sources And Types
2. Study Approach
3 Algorithms

3.1 Random Forest Classifier


3.2 Neural Network
3.3 K-Nearest Neighbor
3.4 Support vector machine

4 Results And Discussions

4.1 Basic Statistics


4.2 Experiments

5 Conclusion
6 Future Scopes
7 References
1 Introduction

Machine learning refers to a computer program that makes


deductions, draws influences, and makes conclusions based on
experienced entrenched in classes of tasks. The program has a performance
measure for each of the task and its basic intention is to improve the
experience that it learns. Machine learning algorithms or computer pro-
grams that learn experiences are becoming popular, especially with the
progression towards artificial intelligence. At the core of the development
and increased likelihood of using machine learning or artificial intelligence in
education is Industry 4.0 and its related concepts. That is, Industry
4.0 proposes and promises to instigate Education 4.0 which will promote and
place education as a “smart factor’.

The application of Industry 4.0 supersedes the conventional application


of industrial revolution in higher education. In most cases, concepts and
technologies aligned with Industry 4.0 are used in institutions of higher
education to adopt different learning approaches including blackboard classes
and distance learning modules. However, the influence of artificial intelligence
in transforming workspace in the education sector is yet to be fully affected
because stakeholders are undecided whether or not pursue the technologies
due to insecurity perception. Nonetheless, it is certainly unanimous that
Industry 4.0 will reduce the gap between humanity and social sciences.

2. Methodology

The data used in the study and the study approach including a
detailed discussion of the algorithms used in the study are presented in the
following subsection.

NBN Sinhgad School of Engineering, Pune.


2.1 Data Sources And Types

The dataset used in the study was a product of the Software


Engineering Team Assessment and Prediction (SETAP) project and it consists
of over 100 team activity measures as well as outcomes for 74 student
teams. The data consists of distinct student activity measures, and each
student was a member of either the local (same university) or global
(different universities) team. . The weekly student activity measures were
aggregated for each team to create the team activity measure, and the
evaluation of each of the student was based on the student’s ef ficiency of
handling software engineering processes and understanding of software
engineering products. Hence, the outcome attribute consists of product and
process. The data has five major milestones namely M1 (High-Level
Requirements & Specs), M2 (Detailed Requirements & Specs), M3 (First
Prototype), M4 (Beta Release), and M5 (Final Delivery). The data was
collected over 7 semesters for the period spanning Fall 2012 and Fall 2017.
The information was gathered for over 383 students corresponding to 18
class sections. The Team Activity Measure (TAM) collated data has 115
attributes and 2 class labels. The TAM consists of 59 local teams and over
15 global teams.

2.2 Study Approach

The study uses visual programming tools and conventional


performance metrics to compare Decision Tree, Random Forest, Naïve
Bayes, Neural Network, k-Nearest Neighbor (kNN) , Logistic Regression,
Support Vector Machine.The preprocessing techniques used in the study
involved feature selection and the experiment focused on milestones 1 to 5
and their influence on final grades. That is, the final grades were set as the
target variable while the rest of the attributes were treated as features.

The visual program for executing the analysis is as shown in Fig. 1 below.
Fig. 1. The visual programming for assessing the algorithms used in
predicting the performance of the students in software engineering learning.

3. Algorithms

The mathematical information on the 8 algorithms is presented in the following


subsection.
1. Random Forest Classifier(RF Classifier)

i. The RF classifier has been used in different fields including remote


sensing, land classification, and diagnosis and prognosis of different
maladies.
ii. The algorithm assigns the random vector through independent
sampling and it is assumed to be of the same distribution as the rest
of the random vectors in the tree.

2. Neural Network

a. The neural network algorithms are based mathematical


processing of information through emulation of the biological
systems.
b. The neural network algorithms are based mathematical processing of
i. information through emulation of the biological systems.
ii. The neural network algorithms are based mathematical
processing of information through emulation of the
biological systems.
. Xn
Σ
h
a
w
3. K-Nearest Neighbor

i. The nearest neighbor algorithm is memory based and does not require any
model fitting.
ii. The nearest neighbor algorithm is memory based and does not require any
model fitting.
4 Results And Discussions

1. Basic Statistics
i. The SETAP Product T8 consists of 42 A scores and 32 F scores
ii. The 42 teams had different number of students all the aggregation led to the
conclusion that all team members scored an A.
iii. Similarassumptions and conclusions were made about the teams that scored F.
iv. As for those teams that scored an A, total coding hours, sharing of unique
commitment message, and personal meeting hours per week were attributes with
the greatest influence.

2. Experiments
As for those teams that scored an A, total coding hours, sharing of unique
commitment message, and personal meeting hours per week were attributes
with the greatest influence.

AUC-Area Under The Curve


CA-Classification Accuracy

Table 1. The performance metrics of the 8 algorithms (average A and F classes) for SETAP product T8

Method AUC CA F1 Precision Recall Log loss


kNN 0.641 0.593 0.596 0.602 0.593 2.193
SVM 0.535 0.533 0.525 0.600 0.533 0.693
SAMME.R 0.539 0.547 0.550 0.557 0.547 15.658
SAMME 0.539 0.547 0.550 0.557 0.547 0.767
Random Forest 0.569 0.553 0.546 0.542 0.553 0.933
Neural Network 0.659 0.587 0.591 0.608 0.587 1.280
Naive Bayes 0.591 0.553 0.555 0.593 0.553 3.747
Logistic Regression 0.643 0.620 0.624 0.634 0.620 3.052

Based on the scoring methods, especially with specific focus to model prediction
efficiency and classification accuracy, it suffices to conclude that Neural Network
outperformed the other models.
5. Conclusion

Computational Intelligence(Soft Computing) Is New Concept For Advanced


Information Processing. The Objective Of Computational Intelligence
Approaches Is To Realize A New Approach For Analyzing And Create
Flexible Information Processing Of Humans Such As
Sensing,Understanding ,Learning, Recognizing And Thinking.

6. Future Scopes
1. Aerospace Application_
Soft Computing Is Used For Aerospace System Because
Of The High Degrees Of Uncertainty And Complexity Of
These Problems And Because Of The Involvement Of
The Human Beings.

2. Communication Systems
Since Communication systems Involved Human Beings,
Soft Computing Can Effectively Applied To Such
Systems. Soft Computing Is Used Mainly In
Communication Network And Data Communication In
Communication Systems.

_
7. References_
i. Reddy, L ., et al.: A modern approach student
performance prediction using multi-agent data mining
technique. i-Manager’s J . Softw. Eng. 10(1), 14–20
(2015)
ii. Asif, R., Merceron, A., Pathan, M.: Predicting student
academic performance at degree level: a case study. Int.
J . Intell. Syst. Appl. 7(1), 49–61 (2014)
iii. Mueen, A., Zafar, B., M anzoor, U .: Modeling and
predicting students’ academic performance using data
mining techniques. Int. J. Mod. Educ. Comput. Sci.
8(11), 36–42 (2016)
iv. Devasia, T., Vinushree, T., Hegde, V.: Prediction of
students’ performance using educational data mining.
In: International Conference on Data Mining and
Advanced Computing ( SAPIENCE) (2016)
v. Petkovic, D., et al.: Using the random forest classifier to
assess and predict student learning of software
engineering teamwork. In: I EEE Frontiers in Education
Conference (FIE) (2016)

You might also like