Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

Introduction to Python

one of the easy tool


-

for analysis
Dr.Praveen Ranjan Srivastava
Indian Institute of Management (IIM),Rohtak
Python

Indian Institute of Management (IIM),Rohtak


Python
Python is a general-purpose interpreted, interactive, object-oriented,
and high-level programming language. It was created by Guido van
Rossum.

•Python is Interpreted − Python is processed at runtime by the


interpreter. You do not need to compile your program before executing
it. This is similar to PERL and PHP.
•Python is Interactive − You can actually sit at a Python prompt and
interact with the interpreter directly to write your programs.
•Python is Object-Oriented − Python supports Object-Oriented style
or technique of programming that encapsulates code within objects.
•Python is a Beginner's Language − Python is a great language for
the beginner-level programmers and supports the development of a
wide range of applications from simple text processing to WWW
browsers to games.

Indian Institute of Management (IIM),Rohtak


Python Features
Python's features include −
Python
•Easy-to-learn − Python has few keywords, simple structure, and a clearly defined syntax. This allows the
student to pick up the language quickly.
•Easy-to-read − Python code is more clearly defined and visible to the eyes.
•Easy-to-maintain − Python's source code is fairly easy-to-maintain.
•A broad standard library − Python's bulk of the library is very portable and cross-platform compatible
on UNIX, Windows, and Macintosh.
•Interactive Mode − Python has support for an interactive mode which allows interactive testing and
debugging of snippets of code.
•Portable − Python can run on a wide variety of hardware platforms and has the same interface on all
platforms.
•Extendable − You can add low-level modules to the Python interpreter. These modules enable
programmers to add to or customize their tools to be more efficient.
•Databases − Python provides interfaces to all major commercial databases.
•GUI Programming − Python supports GUI applications that can be created and ported to many system
calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X Window system of
Unix.
•Scalable − Python provides a better structure and support for large programs than shell scripting.
Apart from the above-mentioned features, Python has a big list of good features, few are listed below −
•It supports functional and structured programming methods as well as OOP.
•It can be used as a scripting language or can be compiled to byte-code for building large applications.
•It provides very high-level dynamic data types and supports dynamic type checking.
•It supports automatic garbage collection.
•It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

Indian Institute of Management (IIM),Rohtak


Getting Python Python
The most up-to-date and current source code, binaries,
documentation, news, etc., is available on the official
website of Python https://www.python.org/
You can download Python documentation
from https://www.python.org/doc/. The documentation
is available in HTML, PDF, and PostScript formats.

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

python

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python pip is a de facto standard package-management
system used to install and manage software
pip install pandas packages written in Python.[4] Many packages can
be found in the default source for packages and
their dependencies — Python Package
Index (PyPI).

pip' or python is not recognized as an internal or external command


Go to control Panel >> Uninstall or change Program

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

And that should solve your path issues, so jump to command prompt
and you can use pip now.
Indian Institute of Management (IIM),Rohtak
Python
Now check pip in cmd window

Indian Institute of Management (IIM),Rohtak


Python
If still now work -----

Indian Institute of Management (IIM),Rohtak


Python

C:\Users\admin\AppData\Local\Pro
grams\Python\Python37

Click ok and
restart
computer
Indian Institute of Management (IIM),Rohtak
Python
In command prompt :py -m pip install --upgrade pip setuptools

Indian Institute of Management (IIM),Rohtak


Python
If pip not working

https://www.youtube.com/watch?v=zYdHr-LxsJ0

https://www.youtube.com/watch?v
=An2UBGAlzpU

https://www.youtube.com/watch?v
=zYdHr-LxsJ0

Indian Institute of Management (IIM),Rohtak


Python
Overview of Python Libraries
for Data Scientists

Reading Data; Selecting and Filtering the Data; Data


manipulation, sorting, grouping, rearranging

Plotting the data

Descriptive statistics

Inferential statistics

Indian Institute of Management (IIM),Rohtak


Python Libraries for Data
Science
Many popular Python toolboxes/libraries:
– NumPy
– SciPy
– Pandas
– SciKit-Learn

Visualization libraries
– matplotlib
– Seaborn

and many more …

Indian Institute of Management (IIM),Rohtak


Python Libraries for Data
Science
NumPy:
 introduces objects for multidimensional arrays and
matrices, as well as functions that allow to easily perform
advanced mathematical and statistical operations on
those objects

 provides vectorization of mathematical operations on


arrays and matrices which significantly improves the
performance Link: http://www.numpy.org/

 many other python libraries are built on NumPy


Indian Institute of Management (IIM),Rohtak
Python
The most fundamental package, around which the scientific
computation stack is built, is NumPy (stands for Numerical Python). It
provides useful features for operations on n-arrays and matrices in
Python. The library provides vectorization of mathematical operations
on the NumPy array type, which ameliorates performance and
accordingly speeds up the execution.

Indian Institute of Management (IIM),Rohtak


Python Libraries for Data
Science
SciPy:
 collection of algorithms for linear algebra, differential
equations, numerical integration, optimization,
statistics and more

 part of SciPy Stack

 built on NumPy
Link: https://www.scipy.org/scipylib/

Indian Institute of Management (IIM),Rohtak


Python Libraries for Data
Science
Pandas:
 adds data structures and tools designed to work with
table-like data (similar to Series and Data Frames in R)

 provides tools for data manipulation: reshaping,


merging, sorting, slicing, aggregation etc.

 allows handling missing data

Link: http://pandas.pydata.org/

Indian Institute of Management (IIM),Rohtak


Python
Standard Python distribution doesn't come bundled with
Pandas module. A lightweight alternative is to install
NumPy using popular Python package installer, pip.
pip install pandas
If you install Anaconda Python package, Pandas will be
installed by default with the following −
Windows
Anaconda (from https://www.continuum.io) is a free Python
distribution for SciPy stack.

Indian Institute of Management (IIM),Rohtak


Python Libraries for Data
Science

SciKit-Learn:
 provides machine learning algorithms:
classification, regression, clustering, model
validation etc.

 built on NumPy, SciPy and matplotlib

Link: http://scikit-learn.org/

Indian Institute of Management (IIM),Rohtak


Python Libraries for Data Science
matplotlib:
 python 2D plotting library which produces
publication quality figures in a variety of hardcopy
formats

 a set of functionalities similar to those of MATLAB

 line plots, scatter plots, barcharts, histograms, pie


charts etc.

 relatively low-level; some effort needed to create


advanced visualization Link: https://matplotlib.org/
Indian Institute of Management (IIM),Rohtak
Python Libraries for Data
Science
Seaborn:
 based on matplotlib

 provides high level interface for drawing


attractive statistical graphics

 Similar (in style) to the popular ggplot2 library


in R

Link: https://seaborn.pydata.org/
Indian Institute of Management (IIM),Rohtak
Python
Seaborn is mostly focused on the visualization of statistical models;
such visualizations include heat maps, those that summarize the data
but still depict the overall distributions. Seaborn is based on Matplotlib
and highly dependent on that.

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python
Setting up and down key arrow

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

Now check up key


working
Same step for down
key (plz do)

os.system('cls')

Indian Institute of Management (IIM),Rohtak


Python
>>> 5+98 >>> 3*4+5-6/2 >>> 
103 14.0 value=int(input("Enter 
>>> myVariable=30 the value:"))
>>> ‐35+4 >>> myVariable
‐31 30 Enter the value:60

>>> 21/7 >>> myvariable**2 >>> value

3.0 900

>>> 23/7 value=input("Enter the value:")

3.2857142857142856 Enter the value:50

>>> 23//7 >>> value


'50' >>> myvariable+value
3
90
>>> 2**5 value+20
32 error

Indian Institute of Management (IIM),Rohtak


Python
>>>2**3
>>> import math
8
>>> math.
>>> pow(2,3)
8
math.sqrt
How many inbuilt functions
<built-in function sqrt>
>>> dir(__builtins__) #double underscore
>>> len("Hello") >>> math.sqrt(64)
5 8.0
>>> len("praveen ranjan srivastava")
25
>>> >>> squareRoot=math.sqrt
>>> squareRoot(64)
max(4,2,1,5,8) 8.0
8
Indian Institute of Management (IIM),Rohtak
Python
Saving your Program

Indian Institute of Management (IIM),Rohtak


Python

C:\Users\admin\AppData\Local\Programs\Python\Python37

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python

Double click on file

Indian Institute of Management (IIM),Rohtak


Python
List
>>> names=[]
names.append("patrik")
>>> names
>>> names
[]
['mark', 'john', 'july', 'patrik']
>>> names=["mark","john","july"]
>>> names
['mark', 'john', 'july']
age=[23,12,32,11]
>>> names[1] >>> names.extend(age)
'john' >>> names
['mark', 'john', 'july', 'patrik', 23, 12,
>>> names[-3]
32, 11]
'mark'
print(names[0])
mark
Indian Institute of Management (IIM),Rohtak
Python
Python If ... Else

Python supports the usual logical conditions from


mathematics:
•Equals: a == b
•Not Equals: a != b
•Less than: a < b
•Less than or equal to: a <= b
•Greater than: a > b
•Greater than or equal to: a >= b
These conditions can be used in several ways, most
commonly in "if statements" and loops.
An "if statement" is written by using the if keyword.

Indian Institute of Management (IIM),Rohtak


Python
Python For Loops
>>> a = 33
>>> b=200
>>> if b>a:
print("b is greater that a")

Elif
The elif keyword is
pythons way of saying "if
https://www.w3schools.com/python the previous conditions
/python_for_loops.asp were not true, then try this
condition".
Indian Institute of Management (IIM),Rohtak
Python

Indian Institute of Management (IIM),Rohtak


Python
Python For Loops

Exit the loop when x is "banana":


fruits = ["apple", "banana", "cherry"]
for x in fruits:
print(x)
if x == "banana":
break
do not print banana now enter ,let see the result :

fruits = ["apple", "banana", "cherry"]


for x in fruits:
if x == "banana":
continue
print(x)
Indian Institute of Management (IIM),Rohtak
Python
Range function 
sum = 0
for val in range(1, 6):
sum = sum + val
print(sum)

C:\Users\Your Name\AppData\Local\Programs\Python\Python36-
32\Scripts>pip --version

Indian Institute of Management (IIM),Rohtak


Python
NumPy
NumPy, which stands for Numerical Python, is a library consisting of
multidimensional array objects and a collection of routines for
processing those arrays. Using NumPy, mathematical and logical
operations on arrays can be performed. This tutorial explains the basics
of NumPy such as its architecture and environment. It also discusses the
various array functions, types of indexing, etc. An introduction to
Matplotlib is also provided. All this is explained with the help of
examples for better understanding.

pip install numpy


Pip install pandas

Indian Institute of Management (IIM),Rohtak


Python
Installing numpy and panda

Indian Institute of Management (IIM),Rohtak


Python
python -m pip install --upgrade pip

Indian Institute of Management (IIM),Rohtak


Python

Indian Institute of Management (IIM),Rohtak


Python
>>>import numpy as np
>>> print(np.__version__)
Matrix multiplication
>>>import numpy as np
>>>p = [[1, 0], [0, 1]]
>>>q = [[1, 2], [3, 4]]
>>> print("original matrix:")
print(p)
print(q)

result = np.dot(p, q)
print(result)

Indian Institute of Management (IIM),Rohtak


Python
>>> import numpy as np The element wise addition of 
>>> x = np.array([[1, 2], [4, 5]]) matrix is : 
>>> y = np.array([[7, 8], [9, 10]]) [[ 8 10]
>>>print(numpy.add(x,y))
[13 15]]
>>>print (np.subtract(x,y))
The element wise subtraction 
>>>print (np.divide(x,y)) of matrix is : 
>>> print (np.multiply(x,y))
[[‐6 ‐6]
>>> print (np.dot(x,y))
[‐5 ‐5]]
The element wise multiplication The element wise division of 
of matrix is : [[ 7 16]
[36 50]] matrix is : 
The product of matrices is : [[ 0.14285714  0.25      ]
[[25 28]
[73 82]] [ 0.44444444  0.5       ]]

>>>print (x.T) transpose Indian Institute of Management (IIM),Rohtak


Python
from numpy import array from numpy import array
A = array([[1, 2, 3], [4, 5, 6]]) A = array([[1, 2, 3], [4, 5, 6]])
print(A) print(A)
A = array([[1, 2, 3], [4, 5, 6]]) B = array([[1, 2, 3], [4, 5, 6]])
print(A) print(B)
B = array([[1, 2, 3], [4, 5, 6]]) C = A * B
print(B) print(C)
The matrix multiplication operation can be implemented in NumPy using the dot()
C = A + B function.
from numpy import array
print(C) A = array([[1, 2], [3, 4], [5, 6]])
print(A)
B = array([[1, 2], [3, 4]])
print(B)
C = A.dot(B)
print(C)

Indian Institute of Management (IIM),Rohtak


Python
# Import pandas package
import pandas as pd

# Define a dictionary containing employee data


data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# select two columns


print(df[['Name', 'Qualification']])

Indian Institute of Management (IIM),Rohtak


Python
READING AND WRITING OPERATIONS IN PYTHON

Reading an excel file into Python 3.7 shell

The library openpyxl is to be installed in Command Prompt as shown


by pip command below for opening the files:

Indian Institute of Management (IIM),Rohtak


Python
Opening the excel file in Python shell:
>>> import pandas as pd To get the sheet name use
>>> import numpy as np
>>>data.get_sheet_names()
>>> import openpyxl ['Sheet1']
data=openpyxl.load_workbook(r'C:\Users\admin\Desktop\sales-
funnel.xlsx')

Indian Institute of Management (IIM),Rohtak


Python
• For fetching cell value through this sheet object
>>>sheet =data.get_sheet_by_name('Sheet1')
>>>sheet['A2'].value

To print all contents of excel file onto the terminal


shell:
for r in range(1,19):

print(r,sheet.cell(row=r,column=1).value,sheet.cell(row=r,column=2).va
lue,sheet.cell(row=r,column=3).value,
sheet.cell(row=r,column=4).value,
sheet.cell(row=r,column=5).value, sheet.cell(row=r,column=6).value,
sheet.cell(row=r,column=7).value)
Indian Institute of Management (IIM),Rohtak
Python
Reading multiple excel files into single excel file
>>> import pandas as pd Paste :sales-jan-2014,Feb,March
>>> import numpy as np data in to download folder
>>> import glob

Indian Institute of Management (IIM),Rohtak


Python
>>>path =r"C:\Users\admin\Downloads"
>>>file_identifier ="*.xlsx"
>>>all_data =pd.DataFrame()
>>>for f in glob.glob(r"C:\Users\admin\Downloads\sales*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)

all_data.head()

Indian Institute of Management (IIM),Rohtak


Python
>>>all_data.describe()

Indian Institute of Management (IIM),Rohtak


Python
Some operations on combined file
Combining this file with customer status file by left join on status
column:
Another file name is customer_sattus.xlsx (stored in download folder)
>>>status = pd.read_excel(r"C:\Users\admin\Downloads\customer-
status.xlsx")
>>>all_data_st = pd.merge(all_data, status, how='left')
>>>all_data_st.head()

Indian Institute of Management (IIM),Rohtak


Python
if you want to take a quick look at how your top tier customers are
performing compared to the bottom. Use groupby to get the average
of the values:

>>>all_data_st.groupby(["status"])["quantity","unit price","ext
price"].mean()

Indian Institute of Management (IIM),Rohtak


Python
Writing into existing data file using openpyxl save command:
In the example, we create a new xlsx file. We write data into three cells.
from openpyxl import Workbook
From the openpyxl module, we import the Workbook class. A workbook
is the container for all other parts of the document.
book = Workbook()
We create a new workbook. A workbook is always created with at least
one worksheet. C:\Users\admin\AppData\Local\Program
sheet = book.active s\Python\Python37
We get the reference to the active sheet.
sheet['A1'] = 56
sheet['A2'] = 43 import time
We write numerical data to cells A1 and A2.
now = time.strftime("%x")
sheet['A3'] = now
We write current date to the cell A3.
book.save("sample.xlsx") Indian Institute of Management (IIM),Rohtak
Python
Writing new values using append() function
from openpyxl import Workbook
book = Workbook()
sheet = book.active

rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)

for row in rows:


sheet.append(row)
book.save('appending.xlsx')
Indian Institute of Management (IIM),Rohtak
Python
Create and write on excel file using xlsxwriter module

pip install xlsxwriter

>>>import xlsxwriter
# which is the filename that we want to create. 
>>>workbook = xlsxwriter.Workbook('hello.xlsx')

Indian Institute of Management (IIM),Rohtak


Python
#The workbook object is then used to add new worksheet via the
add_worksheet() method.
>>>worksheet = workbook.add_worksheet()

#Use the worksheet object to write data via the write() method.
>>>worksheet.write('A1', 'Hello..')
>>>worksheet.write('B1', 'Geeks')
>>>worksheet.write('C1', 'For')
>>>worksheet.write('D1', 'Geeks')

>>>workbook.close()

#now you can see your python folder hello .xlsx file created.

Indian Institute of Management (IIM),Rohtak


Python
Data Analysis
Most people likely have experience with pivot tables in Excel.
Pandas provides a similar function called (appropriately enough)
pivot_table .
This session also will focus on explaining the pandas pivot_table
function and how to use it for your data analysis as a decision maker.
The Data
One of the challenges with using the panda’s pivot_table is making
sure you understand your data and what questions you are trying to
answer with the pivot table.
It is a simple function but can produce very powerful analysis very
quickly.
In this scenario, I’m going to be tracking a sales pipeline (also called
funnel). The basic problem is that some sales cycles are very long
(think “enterprise software”, capital equipment, etc.) and management
wants to understand it in more detail throughout the year.
Indian Institute of Management (IIM),Rohtak
Python
How to Create a Pivot Table in Python using Pandas
>>>import pandas as pd

Employees = {'Name of Employee':


['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','
Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
'Sales':
[1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1
000,900,250,750,50],
'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
'Country':
['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','J
apan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
}
>>> print(Df1)
>>Df1 = pd.DataFrame(Employees)
Indian Institute of Management (IIM),Rohtak
Python
Scenario 1: Total sales per employee

>>> pivot = Df1.pivot_table(index=['Name of Employee'],


values=['Sales'], aggfunc='sum')

>>> print(pivot)

Scenario 2: Total sales by country

>>>pivot = Df1.pivot_table(index=['Country'], values=['Sales'],


aggfunc='sum')
>>> print(pivot)

Indian Institute of Management (IIM),Rohtak


Python
Scenario 3: Sales by both employee and country

>>>pivot = Df1.pivot_table(index=['Name of Employee','Country'],


values=['Sales'], aggfunc='sum')
>>> print(pivot)

Scenario 4: Max individual sale by country


>>>pivot = Df1.pivot_table(index=['Country'], values=['Sales'],
aggfunc='max')
>>> print(pivot)

Indian Institute of Management (IIM),Rohtak


Python
Scenario 5: Mean, median and min sales by country

>>> pivot = Df1.pivot_table(index=['Country'], values=['Sales'],


aggfunc={'median','mean','min'})
>>> print(pivot)

Indian Institute of Management (IIM),Rohtak


Python Data Analysis

>>> data = pd.read_excel (r"C:\Users\admin\Downloads\sales-


funnel.xlsx")

Indian Institute of Management (IIM),Rohtak


Python Data Analysis

• For convenience sake, let’s define the status column as a category


and set the order we want to view.

>>>df["Status"] = df["Status"].astype("category")

Indian Institute of Management (IIM),Rohtak


Python Data Analysis
Pivot the data
As we build up the pivot table, I think it’s easiest to take it one step at
a time. Add items and check each step to verify you are getting the
results you expect.
The simplest pivot table must have a data frame and an index . In this
case, let’s use the Name as our index.
pd.pivot_table(df,index=["Name"])

Indian Institute of Management (IIM),Rohtak


Python Data Analysis

pd.pivot_table(df,index=["Name","Rep","Manager"])

Indian Institute of Management (IIM),Rohtak


Python Data Analysis

Manager wise summery

pd.pivot_table(df,index=["Manager","Rep"])

Indian Institute of Management (IIM),Rohtak


Python Data Analysis
You can see that the pivot table is smart enough to start aggregating the
data and summarizing it by grouping the reps with their managers.
Now we start to get a glimpse of what a pivot table can do for us.
pd.pivot_table(df,index=["Manager","Rep"],values=["
Price"])

Indian Institute of Management (IIM),Rohtak


Python Data Analysis

The price column automatically averages the data but we can do a count
or a sum. Adding them is simple using aggfunc and np.sum .

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],aggfunc
=np.sum)

Indian Institute of Management (IIM),Rohtak


Python Data Analysis
aggfunc can take a list of functions. Let’s try a mean using the numpy
mean function and len to get a count.

pd.pivot_table(df,index=["Manager","Rep"],values=[
"Price"],aggfunc=[np.mean,len])

Indian Institute of Management (IIM),Rohtak


Python Data Analysis
Columns vs. Values
I think one of the confusing points with the pivot_table is the use of
columns and values . Remember, columns are optional - they provide
an additional way to segment the actual values you care about.
The aggregation functions are applied to the values you list.

pd.pivot_table(df,index=["Manager","Rep"],values=["
Price"],columns=["Product"],aggfunc=[np.sum])

Indian Institute of Management (IIM),Rohtak


Python Data Analysis

The NaN’s are a bit distracting. If we want to remove them, we could


use fill_value to set them to 0.

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],columns=
["Product"],aggfunc=[np.sum],fill_value=0)

Indian Institute of Management (IIM),Rohtak


Python data analysis
add Quantity to the values list
pd.pivot_table(df,index=["Manager","Rep"],values=[
"Price","Quantity"],
columns=["Product"],aggfunc=[np.sum],fill_value=0)

Unable to see table…


only few column's and
dots…..

Indian Institute of Management (IIM),Rohtak


Python data analysis
pip install Quandl

Indian Institute of Management (IIM),Rohtak


Python data analysis
>>> import quandl

>>>pd.set_option('display.max_columns', None)

Indian Institute of Management (IIM),Rohtak


Python data analysis
What’s interesting is that you can move items to the index to get a
different visual representation. Remove Product from the columns and
add to the index .
pd.pivot_table(df,index=["Manager","Rep","Product
"],values=["Price","Quantity"],aggfunc=[np.sum],fill
_value=0)

Indian Institute of Management (IIM),Rohtak


Python data analysis
table =
pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],valu
es=["Quantity","Price"],aggfunc={"Quantity":len,"Price":[np.sum,np.me
an]},fill_value=0)

Indian Institute of Management (IIM),Rohtak


Python data analysis
table.query('Manager == ["Debra Henley"]')

Indian Institute of Management (IIM),Rohtak


Python data analysis
table.query('Status == ["pending","won"]')

https://pbpython.com/pandas-pivot-table-explained.html

Indian Institute of Management (IIM),Rohtak


Python data analysis
Introducing our data set: World Happiness Report
We'll use the World Happiness Report, which is a survey about
the state of global happiness. The report ranks more than 150
countries by their happiness levels, and has been published
almost every year since 2012. We use data collected in the years
2015, 2016, and 2017, which is available for Analysis.

Put data in python folder first

C:\Users\admin\AppData\Local\Programs\Pyt
hon\Python37

data = pd.read_csv('data.csv', index_col=0)


Indian Institute of Management (IIM),Rohtak
Python data analysis
# sort the df by ascending years and descending happiness scores
data.sort_values(['Year', "Happiness Score"],
ascending=[True, False], inplace=True)
>>>data.head()
# getting an overview of our data
print("Our data has {0} rows and {1}
columns".format(data.shape[0], data.shape[1]))
# checking for missing values
print("Are there missing values?
{}".format(data.isnull().any().any()))

Indian Institute of Management (IIM),Rohtak


Python data analysis
data.describe()

Indian Institute of Management (IIM),Rohtak


Python data analysis
The describe() method reveals that Happiness Rank
ranges from 1 to 158, which means that the largest
number of surveyed countries for a given year was
158.
The Year column doesn't have any missing values.
Firstly, because it's displayed in the data set as int,
but also - the count for Year amounts to 495 which is
the number of rows in our data set.
By comparing the count value for Year to the other
columns, it seems we can expect 25 missing values
in each column (495 in Year VS. 470 in all other
columns).
Indian Institute of Management (IIM),Rohtak
Python data analysis
Average score of happiness value year wise
pd.pivot_table(data, index= 'Year', values= "Happiness Score")
By passing Year as the index parameter,
we chose to group our data by Year.
The output is a pivot table that displays
the three different values for Year as
index, and the Happiness Score as values.
It's worth noting that the aggregation
default value is mean (or average),
so the values displayed in the Happiness
Score column are the yearly average for
all countries.
The table shows the average for all
countries was highest in 2016, and is
currently the lowest in the past three
years.
Indian Institute of Management (IIM),Rohtak
Python data analysis
Region wise happiness score
pd.pivot_table(data, index = 'Region', values="Happiness Score")

Region and year wise happiness score 

pd.pivot_table(data, index = ['Region', 'Year'],


values="Happiness Score")

Indian Institute of Management (IIM),Rohtak


Python data analysis
pd.pivot_table(data, index= 'Region', columns='Year',
values="Happiness Score")

Indian Institute of Management (IIM),Rohtak


Python
Let's add the median, minimum, maximum, and the standard
deviation for each region. This can help us evaluate how accurate
the average is, and if it's really representative of the real picture.

pd.pivot_table(data, index= 'Region', values= "Happiness 
Score",aggfunc= [np.mean, np.median, min, max, np.std])

Indian Institute of Management (IIM),Rohtak


Python data analysis
pip install matplotlib

pip install seaborn

pip install ipython


Indian Institute of Management (IIM),Rohtak
Thank you !!!
Indian Institute of Management (IIM),Rohtak
Introduction of
ORANGE

Indian Institute of Management (IIM),Rohtak


Anaconda, Jupyter, Spyder — Things You Need To Know
There are a lot of environments that are available
for free in the internet , where you could straight
away go and start Programming,
but Anaconda powered by Continuum
Analytics is an environment that anyone could
use to program in Python and R. If you use wish to
program in Python, then Jupyter Notebook is
your place, where you have access to a lot of
scientific and Numeric Libraries and if you are
more of an Analytical Person than solving
scientific problems then there is R
studio where a ready environment is available
for persons who wish to code in R
Indian Institute of Management (IIM),Rohtak
A. Installing Anaconda:
•click on this link to get Anacondas in the
web, https://www.anaconda.com/download/
•Select the required OS that you have in your PC, (Linux,
Windows, Mac)

•After downloading your file, install your software in the


system. See this video for more information on how to install
Anaconda easily explained(you tube video link).
•This will be your main screen of your Anaconda Prompt. This
is just like a chrome where you get stuffed with a lot of
webpages, but here with a lot of different tools that Anaconda
is offering to us such as Jupyter, R studio, Orange,
Spyder,etc..

Indian Institute of Management (IIM),Rohtak


Indian Institute of Management (IIM),Rohtak
ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Building Data and


Business Analytics
Model is fun using
Orange…
Indian Institute of Management (IIM),Rohtak
ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE
Why Orange?

Indian Institute of Management (IIM),Rohtak


ORANGE

 Orange is a platform built for mining and analysis on a


GUI based workflow. This signifies that you do not have
to know how to code to be able to work using Orange
and mine data, crunch numbers and derive insights.
 You can perform tasks ranging from basic visuals to data
manipulations, transformations, and data mining. It
consolidates all the functions of the entire process into a
single workflow.
 The best part and the differentiator about Orange is that
it has some wonderful visuals. You can try silhouettes,
heat-maps, geo-maps and all sorts of visualizations
available.

Indian Institute of Management (IIM),Rohtak


ORANGE

Setting up your System

Step 1: Go to https://orange.biolab.si and click on Download.

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Step 2: Install the platform and set the working directory for
Orange to store its files.

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE
This is what the start-up page of Orange looks like. You
have options that allow you to create new projects, open
recent ones or view examples and get started.
Before we delve into how Orange works, let’s define a few
key terms to help us in our understanding:
 A widget is the basic processing point of any data
manipulation. It can do a number of actions based on
what you choose in your widget selector on the left of
the screen.
 A workflow is the sequence of steps or actions that you
take in your platform to accomplish a particular task.
You can also go to “Example Workflows” on your start-up
screen to check out more workflows once you have created
your first one.For now, click on “New” and let’s start
building your first workflow. Indian Institute of Management (IIM),Rohtak
ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE
Few Basic Computations
Double click on orange icon

Indian Institute of Management (IIM),Rohtak


ORANGE

For saving
your work
flow ----

Indian Institute of Management (IIM),Rohtak


ORANGE

Name of your work flow

Indian Institute of Management (IIM),Rohtak


ORANGE

This area is
known as
CANVAS

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE
IF YOU HAVE EXTERNAL DATA

Indian Institute of Management (IIM),Rohtak


ORANGE

NO DATA??

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

GOING TOWARDS FEW GRAPHS

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


Thank you !!!
Indian Institute of Management (IIM),Rohtak

You might also like