Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

SUMMER INTERNSHIP REPORT

ON

PYTHON AND DATA ANALYTICS

TechTrunk Ventures Pvt. Ltd.

Submitted in partial fulfilment of the requirement for the


degree of

Bachelor Of Engineering

Department of Electronics and Communication


Engineering

In

Deccan College of Engineering and Technology

Affiliated to Osmania University

By

Syed Abrar Ur Rahman

160319735061
DECLARATION

I hereby certify that the work which is being presented in the report entitled
“Python and Data Analytics” in fulfillment of requirement for completion of the
four weeks internship in Department of Electronics and Communication
Engineering of “Deccan College of Engineering and technology” is an authentic
record of my own work carried out during the internship.

Syed Abrar Ur Rahman

1603-1973-5061

ECE, 7th Sem

DCET, Hyderabad

2
CERTIFICATE

This is to certify that the project entitled “Python and Data Analytics” submitted by “Syed
Abrar Ur Rahman” (1603-1973-5061) of B.E III year, Department of Electronics and
Communication Engineering (ECE) from DECCAN COLLEGE OF ENGINEERING AND
TECHNOLOGY, Affiliated to OU, Hyderabad in partial fulfilment for the award of Bachelor
of engineering is a record of bonafide work carried out by him.

TechTrunk Ventures Pvt. Ltd.

3
4
ACKNOWLEDGEMENT

It is with sense of gratitude; I acknowledge the efforts of entire hosts of well-wishers who
have in some way or other contributed in their own special ways to the success and
completion of the Summer Training. Successfully completion of any type of technology
requires helps from a number of people. I have also taken help from different people for the
preparation of the report. Now, there is little effort to show my deep gratitude to those helpful
people.

First, I express my sense of gratitude and indebtedness to our Training mentor- Ashish
Panday. From the bottom of my heart, for his immense support and guidance throughout the
training. Without his kind direction and proper guidance this study would have been a little
success. In every phase of the project his Supervision and guidance shaped this training to be
completed perfectly.

5
TABLE OF CONTENTS

Contents Page no.


DECLARATION................................................................................................................................... 2
CERTIFICATE ..................................................................................................................................... 3
ACKNOWLEDGEMENT .................................................................................................................... 5
1. INTRODUCTION TO PYTHON PROGRAMMING .............................................................. 7
2. DATA TYPES IN PYTHON ........................................................................................................ 9
3. CONDITIONAL AND ITERATIVE STATEMENTS IN PYTHON ..................................... 12
4. PYTHON LIBRARIES USED IN DATA ANALYTICS ......................................................... 15
5. EXPLORATORY DATA ANALYSIS ...................................................................................... 16
6. EDA ON TITANIC DATASET ................................................................................................. 18
7. CONCLUSION ........................................................................................................................... 25

6
1. INTRODUCTION TO PYTHON PROGRAMMING

Python is an interpreted, high-level, general-purpose programming language. It has efficient


high-level data structures and a simple but effective approach to object-oriented
programming. Python’s elegant syntax and dynamic typing, together with its interpreted
nature, make it an ideal language for scripting and rapid application development in many
areas on most platforms.

Python for Data science:

Why Python???

1. Python is an open source language.

2. Syntax as simple as English.

3. Very large and Collaborative developer community.

4. Extensive Packages.

• UNDERSTANDING OPERATORS:

Theory of operators: - Operators are symbolic representation of Mathematical tasks.

• VARIABLES AND DATATYPES:

Variables are named bounded to objects. Data types in python are int (Integer), Float,
Boolean and strings.

• CONDITIONAL STATEMENTS:

If-else statements (Single condition)

If- elif- else statements (Multiple Condition)

• LOOPING CONSTRUCTS:

For loop

While loop

Do while loop

7
• FUNCTIONS:

Functions are re-usable piece of code. Created for solving specific problem.

Two types: Built-in functions and User- defined functions.

• DATA STRUCTURES:

Two types of Data structures:

LISTS: A list is an ordered data structure with elements separated by comma and
enclosed within square brackets.

DICTIONARY: A dictionary is an unordered data structure with elements separated


by comma and stored as key: value pair, enclosed with curly braces {}.

8
2. DATA TYPES IN PYTHON
Variables can hold values, and each value has a data-type.

Python is a dynamically typed language. Hence we do not need to define the type of the
variable while declaring it.

Python provides type() function to find the data type of the variable.

Figure 1 Data types in Python

2.1. Numeric Data Type


• Variable stores numeric values.
• The integer, float, and complex values belong to a Python Number Data Type.
• Syntax: variable_name = Value
• Examples
a=50 //integer
a=2.35 //float
a=2+5j //complex

9
2.2. Dictionary Data Type
• Dictionaries are kind of table type.
• Dictionaries are enclosed by curly braces {} and values can be assigned and accessed
using square braces []
• Syntax: dict_name = {key0: value0, key1: value1}
• Example
Dict = {‘Name’ : ‘Syed’, ‘Class’ : ‘ECE’, ‘Roll no’ : 5061}

2.3. Boolean Data Type


• A Boolean data can be either True(1) or False(0).
• Syntax: b=bool(x) // where x is an integer

2.4. Set Data Type


• A Set is an unordered collection element where each element is unique.
• Syntax: set_name = {value0, value1, value2}
• Example
Myset = {0,1,2,3,4,5}

10
2.5. Tuple Data Type
• A Tuple in python is an ordered sequence of elements of different data types such as
integer, float, string, list or even other tuple.
• Tuples are enclosed by round brackets ()
• Tuples should have a minimum of 2 elements stored.
• Syntax: t = (element0, element1, element2)
• Example
T=(1, 2, ’python’, (12, 86) ) //tuple with 2 integer values, a string and another tuple.

11
3. CONDITIONAL AND ITERATIVE STATEMENTS IN PYTHON

3.1.Conditional statements
• Conditional statements are used when we need to check a certain condition and
run the block of code. If the condition is True then the block of code will be
executed otherwise the block will not be executed.
• Different types of conditional statements are
➢ If
➢ If else
➢ Elif
➢ Nested if else

3.1.1. if
• In if statement the condition is defined initially before executing the portion of the
code.
• Syntax:
if(condition):
statement
• Example
if(a>b):
print(“a is greater than b”)

3.1.2. if else
• if else statement is used for decision making.
• A block of code is executed if the condition is True otherwise other block of code
is executed.
• Syntax:
if(condition):
True_statement
else:
False_statement

12
3.1.3. Nested if else
• We can have an if else statement inside another if else statement.
• This is called nesting in computer programming.
• Example
if(condition1):
if(condition2):
Statement
else:
statement

3.2. Iterative statements in python


• The iterative statements are also known as looping statements or repetitive
statements.
• The iterative statements are used to execute a part of the program repeatedly
as long as a given condition is True.
• Python provides the following iterative statements
➢ while statement
➢ for statement

3.2.1. while statement


• while statement is also known as entry controlled loop statement because the
given condition is first verified then the execution of the
statements is determined based on the condition result.
• Syntax: Example
while condition: while i>0:
statement1 print(i)
statement2 i=i-1
….
• The execution flow of while statement is as shown

13
3.2.2. for statement
• The for statement is used to iterate through a sequence like a list, a tuple, a
set, a dictionary, or a string.
• Syntax Example
for <variable> in <sequence>: for i in range(0,5):
statement1 print(i)
statement2
….

14
4. PYTHON LIBRARIES USED IN DATA ANALYTICS

4.1. What is Data Analytics?


• Data Analytics refers to the techniques used to analyze data to enhance
productivity and business gain.
• Data is extracted from various sources and is cleaned and categorized to
analyze various behavioral patterns.

4.2. Python Libraries


4.2.1. Numpy
• Numpy stands for Numerical Python.
• It is used for numerical computing in data.
• This library contains basic linear algebra functions, Fourier
transforms, and advance random number capabilities.

4.2.2. Pandas
• Pandas is a fast, powerful, flexible and easy to use open source
data analysis and manipulation tool.
• Pandas is best for handling data. It can handle missing data,
cleaning up the data and it supports multiple file formats.
• It can read or load data in many formats like CSV, Excel, SQL etc.

4.2.3. Matplotlib
• It is used for visualization with python.
• It is graph plotting library in python that serves as a
visualization utility.
• Different graphs like line, column, pie, scatter plots can be easily drawn
using matplotlib.

15
5. EXPLORATORY DATA ANALYSIS

• Exploratory Data Analysis(EDA) is a phenomenon under data analysis used for


gaining a better understanding of data aspects like:
➢ Main features of data.
➢ Variables and relationships that hold between them.
➢ Identifying which variables are important for our problem.
• We shall look at various exploratory data analysis methods like
➢ Descriptive statistics
➢ Grouping data
➢ Correlation

5.1. Descriptive Statistics


• Descriptive statistics is a helpful way to understand characteristics of your
data and to get a quick summary of it.
• Pandas in python provide an describe() function to view the description of the
data.
• Any missing value or NaN value is automatically skipped.
• Syntax
import pandas as pd
DF = pd.read_csv(“file.csv”)
DF.describe()

• Another useful method is value_counts() which can get the count of each
category in a categorical attributed series of values.
• Syntax
DF[“Variable”].value_counts()

5.2. Grouping Data


• Groupby() is an interesting measure available in pandas which can help us
figure out effects of different categorical attributes on other data variables.
• Syntax
DF.groupby(‘variable’)

16
5.3. Correlation
• Correlation is a simple relationship between two variables in a context such
that one variable affects the other.
• corr() function can be used to find the correlation between the variables.
• Syntax
DF.corr()

17
6. EDA ON TITANIC DATASET

• Importing the necessary library module.

• Reading the Data.

• Finding the missing data in the dataset. (True=missing False=exist)

18
• Heat map of missing values. (Yellow = missing value)

• Description of Data

• Survival of people (1=Survived 0=Dead) Male vs Female survival

19
• Survival with respect to Passenger class (Pclass)

• Survival with respect to Age.

• Fare given by passengers.

20
• Data Cleaning.

21
• Pie Chart on Male vs Female population on titanic.

• Fare Given by male and female.

• Correlation

22
• Class distribution of titanic using pie chart.

• Age distribution of passengers.

23
• Fare given with respect to Age.

• Men vs Female Survival

24
7. CONCLUSION

Exploratory Data Analysis (EDA) has been performed on Titanic dataset.

Purpose of analysis is to provide answers to programmatic questions. Data analytics describe


the dataset. Analysis of a data is a process of inspecting, cleaning, transforming and modeling
data with a goal of highlighting useful information and visualizing the useful information.

Data analytics has a lot of benefits and can make the data a lot more efficient.

Data analytics will be very important in the near future where the need of manual
computations and analysis of the data seems to be coming to an end.

25

You might also like