Professional Documents
Culture Documents
fiNAL RESULT Merged
fiNAL RESULT Merged
fiNAL RESULT Merged
ON
“DATA SCIENCE”
Submitted to
RAJASTHAN TECHNICAL UNIVERSITY
BACHELOR’S DEGREE IN
COMPUTER SCIENCE AND ENGINEERING
BY
2022-2023
Swami Keshvanand Institute of Technology, Jaipur
Department of Computer Science and Engineering
CERTIFICATE
Acknowledgement
I would also like to express my thanks to my parents for their support and
blessings. A special thank goes to all my friends for their support in completion
of this work.
VEDANT KALIA
20ESKCA068
Contents
1 Introduction
OVERVIEW 6
MOTIVATION 6
OBJECTIVES OF TRAINING 7
3 Description of Modules 10
MODULE 1 : Python For Data Science 12
MODULE 2 : Understanding Statistic For Data Science 32
MODULE 3 : Predictive Modeling and Introduction to Machine Learning 40
4 Conclusion
TAKEAWAYS OF TRAINING 42
FUTURE SCOPE 42
List of Figures
Introduction
OVERVIEW
MOTIVATION
1. Use their learned skills, knowledge and abilities to deal with datas for the in-
ternet and apply basic design principles to present ideas, information, products,
and services on creating models
3. Demonstrate communication skills, service management skills, and
presentation skills.
4. Complete job preparation tasks including writing resumes and cover letters,
con- ducting job interviews and developing an E-Portfolio. Apply
employability skills including fundamental skills, personal management skills,
and teamwork skills.
Chapter 2
Problem Description
In j64]: train.head()
ID age job marital education defau It balance housing laan contact day dura0on campigu pdays pnevious sub•
ID
age
job
education e
default e
balance 6
hous:ing e
contact e
day e
duration e
campaign e
pdays 6
previous 0
poutcome 0
subscribed 0
dtype: int6é
In 66 ] : corr=Rrz i n , cor r( )
nas k = n p . array(co rr)
p1t.f1gure(£-kg size=(14, s) )
sn. heatmap(cor,1:inet‹1dth=e. 3, iTiask sk, annot=Tnue, s q Mare=T rue)
train = pd . get_dumies(train)
Logistic Regression
{rm sk]earn.linear_model import LOgistlcRepression
Lreg.-L-it(x_traln,y_tna in)
DECISION TREE
accurac\’_score(y_va1ue, prod)
6.855 576619273 361B
elf = DecisionTreeClass1fler(max_depth=4)
clf.fit(x_train,y_train)
DecisionTreeC1a5szfier(class_we1ghtGone, cr1ter1oN='gini', max_depth=t, max_features=WOne,
max_1eaf_nodes=None, m1n_impurity_decrease=0.0, min_impur1ty_splitaNone,
min samples_leaf=1, min samples split=2,
min weight {raction leaf=0.0, presort=False, randm state=Mone,
s p 11t:ter= ' be st ' )
value, predict)
6.9072669B26224329
test = pd.get_dummes(test)
rame()
mutant Oslo n[ ' subs c r ibed ' ] . replace(G, ‘no ', inpl ace=True) mutant Oslo n[ ' subs
c r ibed ' ] . replace(1,‘yes ’, inp lace=7rue)
sulxri Oslo n . to_csv( ' subnzss ion. csv ', header=Trtie, index=Fa1se)
Chapter 3
Description of Modules
Module-1: Python for Data Science
Introduction to Python
Python is a high-level, general-purpose and a very popular programming
language. Python programming language (latest Python 3) is being used in
web development, Machine Learning applications, along with all cutting
edge technology in Software Industry. Python Programming Language is very
well suited for Beginners, also for experienced programmers with other
programming languages like C++ and Java.
OPERATOR DESCRIPTION
> Greater than: True if left operand is greater than the x>y
right
< Less than: True if left operand is less than the right x<y
x == y
== Equal to: True if both operands are equal
Less than or equal to: True if left operand is less than x <= y
or equal
<=
to the right
c. Logical operators:
Logical operators perform Logical AND, Logical OR and
Logical NOT operations.
d. Bitwise operators:
Bitwise operators acts on bits and performs bit by bit operation.
| Bitwise OR x|y
~ Bitwise NOT ~x
to left operand
left operand
operand
left operand
Modulus AND: Takes modulus using left
and
%= right operands and assign result a%=b a=a%b
to left operand
Divide(floor) AND: Divide left
operand
operand
operand
operand
a <<=
Performs Bitwise left shift on
<<= b a= a
operands and assign value to left <<
operand b
f. Special operators: There are some special type of operators like-
Identity operators:
Is and is not are the identity operators both are used to check if two
values are located on the same part of the memory. Two variables
that are equal do not imply that they are identical.
is True if the operands are identical
is not True if the operands are not identical
Membership operators:
In and not in are the membership operators; used to test whether a
value or variable is in a sequence.
in True if value is found in the sequence
not in True if value is not found in the sequence
() Parentheses left-to-right
** Exponent right-to-left
* /% Multiplication/division/modulus left-to-right
OPERATOR DESCRIP ASSOCIATIVITY
TION
+ - Addition/subtraction left-to-right
b. If-else statements
The statement itself tells that if a given condition is true then execute the
statements present inside if block and if the condition is false then execute the
else block.
Else block will execute only when the condition becomes false, this is the
block where you will perform some actions when the condition is not true.
If-else statement evaluates the Boolean expression and executes the block of
code present inside the if block if the condition becomes TRUE and executes
a block of code present in the else block if the condition becomes FALSE.
Syntax:
If (Boolean expression):
Block of code #Set of statements to execute if condition is true
Else:
Block of code #Set of statements to execute if condition is false
c. elif statements
In python, we have one more conditional statement called elif statements.
Elif statement is used to check multiple conditions only if the given if
condition false. It’s similar to an if-else statement and the only difference is
that in else we will not check the condition but in elf we will do check the
condition.
Leif statements are similar to if-else statements but elf statements evaluate
multiple conditions.
Syntax:
if (condition):
#Set of statement to execute if condition is
trueelif (condition):
#Set of statements to be executed when if condition is false and
elifcondition is true
else:
#Set of statement to be executed when both if and elif conditions are false
if(condition):
#Statements to execute if condition is
trueif(condition):
#Statements to execute if condition is true
else:
#Statements to execute if condition is false
else:
#Statements to execute if condition is false
e. elif Ladder
We have seen about the elif statements but what is this elif ladder. As the
name itself suggests a program which contains ladder of elif statements or elif
statements which are structured in the form of a ladder.
This statement is used to test multiple expressions.
Syntax:
If (condition):
#Set of statement to execute if condition is
trueelif (condition):
#Set of statements to be executed when if condition is false and
elifcondition is true
elif (condition):
#Set of statements to be executed when both if and first elif
condition isfalse and second elif condition is true
elif (condition):
#Set of statements to be executed when if, first elif and second
elifconditions are false and third elif statement is true
else:
#Set of statement to be executed when all if and elif conditions are false
Looping ConstructsLoops:
a. while loop:
Repeats a statement or group of statements while a given condition is TRUE.
Ittests the condition before executing the loop body.
Syntax:
while expression:
Statement
b. For loop:
Executes a sequence of statements multiple times and abbreviates the code
that manages the loop variable.
Syntax:
For iterating_var in sequence:
statements(s)
c. nested loops:
You can use one or more loop inside any another while, for or do..while loop.
Syntax of nested for loop:
for iterating_var in sequence:
for iterating_var in sequence:
statements(s)
statements(s)
Syntax of nested while loop:
while expression:
while expression:
statement(s)
statement(s)
Loop Control Statements:
a. break statement:
Terminates the loop statement and transfers execution to the statement
immediately following the loop.
b. Continue statement:
Causes the loop to skip the remainder of its body and immediately retest
itscondition prior to reiterating.
c. pass statement:
The pass statement in Python is used when a statement is required
syntactically but you do not want any command or code to execute.
Functions
A. Built-in Functions or pre-defined functions:
These are the functions which are already defined by Python. For example:
id (),type(), print (), etc.
B. User-Defined Functions:
These are functions that are defined by the users for simplicity and to avoid
repetition of code. It is done by using def function.
Data Structure
Python has implicit support for Data Structures which enable you to store
and access data. These structures are called List, Dictionary, Tuple and Set.
Lists
Lists in Python are the most versatile data structure. They are used to store
heterogeneous data items, from integers to strings or even another list! They
are also mutable, which means that their elements can be changed even after
the list is created.
Creating Lists
Lists are created by enclosing elements within [square] brackets and each
item is separated by a comma.
Creating lists in Python
Since each element in a list has its own distinct position, having duplicate
values ina list is not a problem.
Accessing List elements
To access elements of a list, we use Indexing. Each element in a list has an
index related to it depending on its position in the list. The first element of
the list has the index 0, the next element has index 1, and so on. The last
element of the list has an index of one less than the length of the list.
Indexing in Python lists
While positive indexes return elements from the start of the list, negative
indexes return values from the end of the list. This saves us from the trivial
calculation whichwe would have to otherwise perform if we wanted to return
the nth element from the end of the list. So instead of trying to return
List_name[len(List_name)-1] element, we can simply write List_name[-1].
Using negative indexes, we can return the nth element from the end of the
list easily. If we wanted to return the first element from the end, or the last
index, the associated index is -1. Similarly, the index for the second last
element will be -2, and so on. Remember, the 0th index will still refer to the
very first element in the list.
Appending values in Lists
We can add new elements to an existing list using the append() or insert() methods:
append () – Adds an element to the end of the list
insert() – Adds an element to a specific position in the list which needs to be
specified along with the value
Removing elements from Lists
Removing elements from a list is as easy as adding them and can be done
using the remove() or pop() methods:
remove() – Removes the first occurrence from the list that matches the given
valuepop() – This is used when we want to remove an element at a specified
index from the list. However, if we don’t provide an index value, the last
element will be removed from the list.
Sorting Lists
On comparing two strings, we just compare the integer values of each
character from the beginning. If we encounter the same characters in both
the strings, we just compare the next character until we find two differing
characters.
Concatenating Lists
We can even concatenate two or more lists by simply using the + symbol.
This willreturn a new list containing elements from both the lists:
List comprehensions
A very interesting application of Lists is List comprehension which provides
a neat way of creating new lists. These new lists are created by applying an
operation on each element of an existing list. It will be easy to see their
impact if we first check out how it can be done using the good old for-loops.
Stacks & Queues using Lists
A list is an in-built data structure in Python. But we can use it to create user-
defined data structures. Two very popular user-defined data structures built
using lists are Stacks and Queues.
Stacks are a list of elements in which the addition or deletion of elements is
done from the end of the list. Think of it as a stack of books. Whenever you
need to add or remove a book from the stack, you do it from the top. It uses
the simple concept of Last-In-First-Out.
Queues, on the other hand, are a list of elements in which the addition of
elements takes place at the end of the list, but the deletion of elements takes
place from the front of the list. You can think of it as a queue in the real-
world. The queue becomes shorter when people from the front exit the
queue. The queue becomes longer when someone new adds to the queue
from the end. It uses the concept of First- In-First-Out.
Module-2: Understanding the Statistics for Data Science
Introduction to Statistics
Statistics simply means numerical data, and is field of math that generally
deals withcollection of data, tabulation, and interpretation of numerical data.
It is actually a form of mathematical analysis that uses different quantitative
models to produce a set of experimental data or studies of real life. It is an
area of applied mathematics concern with data collection analysis,
interpretation, and presentation. Statistics deals with how data can be used to
solve complex problems. Some people consider statistics to be a distinct
mathematical science rather than a branch of mathematics. Statistics makes
work easy and simple and provides a clear and clean picture of workyou do on
a regular basis.
Basic terminology of Statistics:
Population –
It is actually a collection of set of individuals or objects or events
whose properties are to be analyzed.
Sample –
It is the subset of a population.
Types of Statistics :
Measures of Central Tendency
(i) Mean :
It is measure of average of all value in a sample set.
For example,
(ii) Median:
It is measure of central value of a sample set. In these, data set is ordered
from lowest to highest value and then finds exact middle.
For example,
(iii) Mode:
It is value most frequently arrived in sample set. The value repeated most of
time in central set is actually mode.
For example,
Data Distribution
Terms related to Exploration of Data Distribution
-> Boxplot
-> Frequency Table
-> Histogram
-> Density Plot
Boxplot : It is based on the percentiles of the data as shown in the figure
below.The top and bottom of the boxplot are 75th and 25th percentile of
the data. The extended lines are known as whiskers that includes the range
of rest of the data. # BoxPlot Population In Millions
fig, ax1 = plt.subplots()
fig.set_size_inches(9, 15)
Frequency Table: It is a tool to distribute the data into equally spaced ranges,
segments and tells us how many values fall in each segment.
Histogram: It is a way of visualizing data distribution through frequency
table with bins on the x-axis and data count on the y-axis.
Code – Histogram
Introduction to Probability
Probability refers to the extent of occurrence of events. When an event
occurs likethrowing a ball, picking a card from deck, etc ., then the must be
some probability associated with that event.
In terms of mathematics, probability refers to the ratio of wanted outcomes to
the total number of possible outcomes. There are three approaches to the
theory of probability, namely:
1. Empirical Approach
2. Classical Approach
3. Axiomatic Approach
In this article, we are going to study about Axiomatic Approach.In this
approach, we represent the probability in terms of sample space(S) and other
terms.
Basic Terminologies:
Random Event :- If the repetition of an experiment occurs
several times under similar conditions, if it does not produce
the same outcome every time but the outcome in a trial is one
of the several possible outcomes, then such an experiment is
called random event or a probabilistic event.
Elementary Event – The elementary event refers to the
outcome of each random event performed. Whenever the
random event is performed, each associated outcome is known
as elementary event.
Sample Space – Sample Space refers to the set of all
possible outcomes of a random event. Example, when a coin
is tossed, the possible outcomes are head and tail.
Event – An event refers to the subset of the sample space
associated with a random event.
Occurrence of an Event – An event associated with a random
events said to occur if any one of the elementary event
belonging to it is an outcome.
Sure Event – An event associated with a random event is said
to be sure event if it always occurs whenever the random
event is performed.
Impossible Event – An event associated with a random event is
said to be impossible event if it never occurs whenever the
random events performed.
Compound Event – An event associated with a random event is
said to be compound event if it is the disjoint union of two or
more elementary events.
Mutually Exclusive Events – Two or more events associated
with a random event are said to be mutually exclusive events if
any one of the event occurs, it prevents the occurrence of all
other events. This means that no two or more events can occur
Exhaustive Events – Two or more events associated with a
randomevent are said to be exhaustive events if their union is
the sample space.
Probability of an Event – If there are total p possible outcomes associated
with a random experiment and q of them are favourable outcomes to the event
A, then the probability of event A is denoted by P(A) and is given by
P(A) = q/p
Probabilities of Discreet and Continuous Variables
Random variable is basically a function which maps from the set of sample
space to set of real numbers. The purpose is to get an idea about result of a
particular situation where we are given probabilities of different outcomes.
Conclusion
Data science is one of the most innovative fields in the modern world.
It provides the best suggestions for tackling the challenges faced by
increasing demand and a sustainable future. The necessity for a data
scientist is expanding along with the importance of data science. Data
scientists are the world's future. A data scientist must therefore be
able to offer excellent solutions that address the problems in all
industries.
Future Scope