Professional Documents
Culture Documents
Ilovepdf Merged
Ilovepdf Merged
More Python
Lab 2
School of Information Technology Management
Ted Rogers School of Management
Toronto Metropolitan University
Winter 2024
Data Frames groupby() method
In [ ]: #Calculate mean value for each numeric column per each group
df_rank.mean()
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Once groupby object is create we can calculate various statistics for each group:
Note: If single brackets are used to specify the column (e.g. salary), then the output is Pandas
Series object. When double brackets are used the output is a Data Frame
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
When selecting one column, it is possible to use single set of brackets, but
the resulting object will be a Series (not a DataFrame):
In [ ]: #Select column salary:
df['salary']
When we need to select more than one column and/or make the output to
be a DataFrame, we should use double brackets:
In [ ]: #Select column salary:
df[['rank','salary']]
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
If we need to select a range of rows, we can specify the range using ":"
Notice that the first row has a position 0, and the last value in the range is
omitted:
So for 0:10 range the first 10 rows are returned with the positions starting
with 0 and ending with 9
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
If we need to select a range of rows, using their labels we can use method
loc:
In [ ]: #Select rows by their labels:
df_sub.loc[10:20,['rank','sex','salary']]
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
We can sort the data by a value in the column. By default the sorting will
occur in ascending order and a new data frame is return.
In [ ]: # Create a new data frame from the original sorted by the column Salary
df_sorted = df.sort_values( by ='service')
df_sorted.head()
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
min, max
count, sum, prod
mean, median, mode, mad
std, var
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
agg() method are useful when multiple statistics are computed per column:
In [ ]: flights[['dep_delay','arr_delay']].agg(['min','mean','max'])
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
df.method() description
describe Basic statistics (count, mean, std, min, quantiles, max)
kurt kurtosis
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
In [ ]: %matplotlib inline
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
description
distplot histogram
barplot estimate of central tendency for a numeric variable
violinplot similar to boxplot, also shows the probability density of the
data
jointplot Scatterplot
regplot Regression plot
pairplot Pairplot
boxplot boxplot
swarmplot categorical scatterplot
factorplot General categorical plot
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[ ]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
https://www.learnpytorch.io/
PyTroch
• One of the most popular frameworks to develop
deep neural networks in Python
• It has an easy Interface
• It needs a fewer lines to code in comparison to
other frameworks like TensorFlow
• It is easy to understand (and debug!)
• It is considered to be Pythonic which easily
integrates with the Python data science stack
• You can think about it as NumPy extension to
GPUs!
PyTorch: 3 levels of
abstraction
• Tensor: Imperative numpy ndarray, but could
also run on GPU
• Variable: Node in a computational graph. It
stores data and gradient (gradient is used for
optimizing weights)
• Module: A neural network layer. It stores state
or learnable weights
Artificial Intelligence in Business - ITM 740
Introduction to Python
Week 1
School of Information Technology Management
Ted Rogers School of Management
Toronto Metropolitan University
Winter 2023
Tutorial Content
Overview of Python Libraries for AI
Work with Data
Reading Data
Selecting and Filtering the Data
Data manipulation
Sorting
Grouping
Rearranging
Plotting the data
Descriptive statistics
Inferential statistics
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Link: http://www.numpy.org/
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
built on NumPy
Link: https://www.scipy.org/scipylib/
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.
Link: http://pandas.pydata.org/
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Link: http://scikit-learn.org/
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
line plots, scatter plots, bar charts, histograms, pie charts etc.
Link: https://matplotlib.org/
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Rename it to Lab 1
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Click on the arrow on the left side of the code window (or Shift + Enter)
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Note: The above command has many optional arguments to fine-tune the data import process.
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[3]:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Can you guess how to view the last few records; Hint:
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Out[4]: dtype('int64')
df.attribute description
dtypes list the types of the columns
columns list the column names
axes list the row labels and column names
ndim number of dimensions
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
df.method() description
head( [n] ), tail( [n] ) first/last n rows
What are the mean values of the first 50 records in the dataset? Hint:
use head() method to subset the first 50 records and then calculate the
mean
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Note: there is an attribute rank for pandas data frames, so to select a column with a name
"rank" we should use method 1.
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Find how many values in the salary column (use count method);
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
Example:
Departments = ['IT', 'HR', 'Marketing']
for x in Departments:
print('the department name is:' + x )
Modified based on: Python for Data Analysis, Research Computing Services, Boston University
profit = 60
if profit > 0:
print('the company profitable')
else:
print('the company is not profitable')
Modified based on: Python for Data Analysis, Research Computing Services, Boston University