Professional Documents
Culture Documents
Session 4 PDF
Session 4 PDF
for analysis
Dr.Praveen Ranjan Srivastava
Indian Institute of Management (IIM),Rohtak
Python
python
And that should solve your path issues, so jump to command prompt
and you can use pip now.
Indian Institute of Management (IIM),Rohtak
Python
Now check pip in cmd window
C:\Users\admin\AppData\Local\Pro
grams\Python\Python37
Click ok and
restart
computer
Indian Institute of Management (IIM),Rohtak
Python
In command prompt :py -m pip install --upgrade pip setuptools
https://www.youtube.com/watch?v=zYdHr-LxsJ0
https://www.youtube.com/watch?v
=An2UBGAlzpU
https://www.youtube.com/watch?v
=zYdHr-LxsJ0
Descriptive statistics
Inferential statistics
Visualization libraries
– matplotlib
– Seaborn
built on NumPy
Link: https://www.scipy.org/scipylib/
Link: http://pandas.pydata.org/
SciKit-Learn:
provides machine learning algorithms:
classification, regression, clustering, model
validation etc.
Link: http://scikit-learn.org/
Link: https://seaborn.pydata.org/
Indian Institute of Management (IIM),Rohtak
Python
Seaborn is mostly focused on the visualization of statistical models;
such visualizations include heat maps, those that summarize the data
but still depict the overall distributions. Seaborn is based on Matplotlib
and highly dependent on that.
os.system('cls')
3.0 900
C:\Users\admin\AppData\Local\Programs\Python\Python37
Elif
The elif keyword is
pythons way of saying "if
https://www.w3schools.com/python the previous conditions
/python_for_loops.asp were not true, then try this
condition".
Indian Institute of Management (IIM),Rohtak
Python
C:\Users\Your Name\AppData\Local\Programs\Python\Python36-
32\Scripts>pip --version
result = np.dot(p, q)
print(result)
print(r,sheet.cell(row=r,column=1).value,sheet.cell(row=r,column=2).va
lue,sheet.cell(row=r,column=3).value,
sheet.cell(row=r,column=4).value,
sheet.cell(row=r,column=5).value, sheet.cell(row=r,column=6).value,
sheet.cell(row=r,column=7).value)
Indian Institute of Management (IIM),Rohtak
Python
Reading multiple excel files into single excel file
>>> import pandas as pd Paste :sales-jan-2014,Feb,March
>>> import numpy as np data in to download folder
>>> import glob
all_data.head()
>>>all_data_st.groupby(["status"])["quantity","unit price","ext
price"].mean()
rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)
>>>import xlsxwriter
# which is the filename that we want to create.
>>>workbook = xlsxwriter.Workbook('hello.xlsx')
#Use the worksheet object to write data via the write() method.
>>>worksheet.write('A1', 'Hello..')
>>>worksheet.write('B1', 'Geeks')
>>>worksheet.write('C1', 'For')
>>>worksheet.write('D1', 'Geeks')
>>>workbook.close()
#now you can see your python folder hello .xlsx file created.
>>> print(pivot)
>>>df["Status"] = df["Status"].astype("category")
pd.pivot_table(df,index=["Name","Rep","Manager"])
pd.pivot_table(df,index=["Manager","Rep"])
The price column automatically averages the data but we can do a count
or a sum. Adding them is simple using aggfunc and np.sum .
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],aggfunc
=np.sum)
pd.pivot_table(df,index=["Manager","Rep"],values=[
"Price"],aggfunc=[np.mean,len])
pd.pivot_table(df,index=["Manager","Rep"],values=["
Price"],columns=["Product"],aggfunc=[np.sum])
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],columns=
["Product"],aggfunc=[np.sum],fill_value=0)
>>>pd.set_option('display.max_columns', None)
https://pbpython.com/pandas-pivot-table-explained.html
C:\Users\admin\AppData\Local\Programs\Pyt
hon\Python37
Region and year wise happiness score
pd.pivot_table(data, index= 'Region', values= "Happiness
Score",aggfunc= [np.mean, np.median, min, max, np.std])
Step 2: Install the platform and set the working directory for
Orange to store its files.
For saving
your work
flow ----
This area is
known as
CANVAS
NO DATA??