Professional Documents
Culture Documents
Dejene Chala Stat606 Screening Quiz Programming Part
Dejene Chala Stat606 Screening Quiz Programming Part
September 4, 2023
1 Python Statements
1. Tasks
a. Initialize a list of numbers: numbers = [3, 7, 1, 9, 2].
b. Implement a loop to iterate through each number in the list.
c. Inside the loop, use a conditional statement to check if the number is even. If it’s even, add
it to a new list called even_numbers.
d. Finally, print both the original list of numbers and the list of even numbers.
[12]: # 1 Answer
number=[3, 7, 1, 9, 2]
#1.a.
for num in number:
print(num,end=" ")
3 7 1 9 2
[8]: #1.b
[13]: #1.c
print(even)
[14]: #1. d.
print("even: "+str(even), "entire: "+str(number))
1
even: 2 entire: [3, 7, 1, 9, 2]
2. Tasks: Write a Python code snippet using list comprehension to achieve this task:
a. Initialize the list of temperatures: fahrenheit_temps = [32, 68, 50, 104, 86].
b. Use list comprehension to convert each temperature from Fahrenheit to Celsius using the
conversion formula provided.
c. Store the converted temperatures in a new list called celsius_temps.
d. Print both the original list of Fahrenheit temperatures and the list of converted Celsius
temperatures.
[21]: # Answer
# 2.a
fahrenheit_temps = [32, 68, 50, 104, 86]
[51]: # 2.b
def cel_temp(fahrenheit_temps):
return (fahrenheit_temps- 32) *5/9
[54]: # 2.c
df1=pd.DataFrame(data)
[55]: # 2. d.
print(df1)
fahrenheit_temps cel_temps
0 32 0.0
1 68 20.0
2 50 10.0
3 104 40.0
4 86 30.0
2
Ferhanite_temps Celicius_temps
0 32 0.0
1 68 20.0
2 50 10.0
3 104 40.0
4 86 30.0
[70]: # 3.
# 3.a. The function that removes duplicates
def Remove_duplicate(Original_data):
final_list = []
for num in Original_data:
if num not in final_list:
final_list.append(num)
return final_list
[71]: # 3.b.
numbers=[2, 5, 7, 2, 8, 7, 10, 5]
print(Remove_duplicate(numbers))
[2, 5, 7, 8, 10]
3
c. Define the Bus class as a subclass of Vehicle. Add an additional attribute capacity and a
method get_capacity() to display the passenger capacity.
d. Create an instance of the Car class and an instance of the Bus class. Use the methods to
display their information, including doors for the car and capacity for the bus.
[287]: # 4
class Vehicle:
def __init__ (self,Make,Model,Year):
self.Make=Make
self.Model=Model
self.Year=Year
def get_info(self):
print(f"The Vehicle is produced by {self.Make} company and its model is␣
↪{self.Model} produced in the year {self.Year} ")
class Car(Vehicle):
def __init__(self, Make,Model,Year,num_doors):
super().__init__(Make,Model,Year)
self.num_doors=num_doors
def get_doors(self):
print(f"this Car has {self.num_doors} doors")
class Bus(Vehicle):
def __init__(self, Make,Model,Year,capacity):
super().__init__(Make,Model,Year)
self.capacity=capacity
def get_capacity(self):
print(f"this A luxury Bus that has the capacity of accomodating {self.
↪capacity} Passengers")
c=Car("FORD","ESCORT",2020,4)
c.get_info()
c.get_doors()
b=Bus("Ebusco","Ebusco",2017,60)
b.get_info()
b.get_capacity()
The Vehicle is produced by FORD company and its model is ESCORT produced in the
year 2020
this Car has 4 doors
The Vehicle is produced by Ebusco company and its model is Ebusco produced in
the year 2017
this A luxury Bus that has the capacity of accomodating 60 Passengers
4
4 Visualization using Python Matplotlib Library
5. Tasks: You are analyzing the performance of different machine learning algorithms on a
dataset. You want to create a bar chart to visualize the accuracy scores achieved by three algo-
rithms: Random Forest, Support Vector Machine (SVM), and Naive Bayes. Write a Python code
snippet using Matplotlib to achieve this task.
a. Initialize a list of algorithm names: algorithms = [‘Random Forest’, ‘SVM’, ‘Naive Bayes’].
b. Initialize a list of accuracy scores for each algorithm: accuracy_scores = [0.85, 0.78, 0.92].
c. Use Matplotlib to create a bar chart that displays the accuracy scores for each algorithm.
Label the x-axis with algorithm names and the y-axis with “Accuracy Score”.
d. Provide appropriate labels, title, and color for the bars.
[282]: # Answer
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
[90]: style.use('ggplot')
plt.figure(figsize=(3,3))
color=['royalblue','red','purple']
plt.bar(algorithms,accuracy_scores,color=color,width=0.5)
plt.xlabel('Algorithm')
plt.ylabel('Accuracy score')
plt.title('Barchart for Accuracy of different Algorithms')
5
5 Manipulating and Analyzing Data with Pandas
5.0.1 6. Tasks:
6.1. Import Pandas library to enable data manipulation and analysis.
6.2. Read in the transactions_dataset.csv and parse the date column, and assign the resulting
DataFrame to a variable named transactions .
6.3. Report the following about the transactions DataFrame:
a. The number of rows and columns
b. The names of the columns
c. The datatypes of each column
d. Inspect Data: Quickly inspect the first 10 rows of the transactions data to get a sense of its
structure.
6.4. Check if any values are missing in the date , store_nbr or transactions columns of the trans-
actions dataset.
6.5. Drop all rows in the transactions DataFrame that contain at least one missing value (NAN)
inplace.
6.6. Find and report the number of unique dates in the transactions DataFrame.
6.7. Calculate and report the mean, median, minimum, and maximum of the transactions column
in the transactions DataFrame.
6.8. Create a table that displays the top 10 stores by total transactions from the transactions
DataFrame. Ensure that the stores are sorted in descending order by transaction count.
6
6.9. Extract month from date column in the transactions DataFrame created in step 2. Add this
extracted month column to the transactions DataFrame.
6.10. Generate a table that shows the total transactions by store and month based on the transac-
tions DataFrame.
6.11. Create a new DataFrame named grouped to calculate the sum and mean of transactions for
each store and month combination.
6.12. Extract the row corresponding to Store 3 and Month 1 from the grouped DataFrame.
6.13. Then select the column storing the mean of transactions.
6.14. Drop the outer layer of the column Index in grouped dataframe and then reset the row index
to the default integer index.
Answer
transactions=pd.read_csv("G:/Fall2023/stat606/data/transactions_dataset.csv")
[290]: # or we can use sorted function also to get the columns in the dataframe
sorted(transactions)
[92]: # 3. a,b:
# the following is used to get the number of rows and columns of the given␣
↪DataFrame
(83488, 3)
So we see that there are three columns in the dataframe: Date,Store_nbr and Transactions
[99]: # 3 c.
# the following lines of codes were used to get the data types of each of the␣
↪columns in the dataframe
transactions.dtypes
7
[100]: # 3. d
# The following line was used to inspect 10 cases of the dataframe
transactions.head(10)
From the result we see that date is date variable, store_nbr is integer with some missing values
and transaction is integer
[101]: # 4
# The following lines of code was used to identify the missing values in the␣
↪dataframe
transactions.isnull().sum()
[101]: date 2
store_nbr 3
transactions 3
dtype: int64
So, we see that there are 8 rows in the dataframe with atleast one missing value
[102]: transactions.isnull().sum().sum()
[102]: 8
[111]: # 5
transactions.dropna(how='any',inplace=True)
[114]: transactions.head(10)
8
6 1/2/2013 6.0 2143.0
8 1/2/2013 8.0 3250.0
9 1/2/2013 9.0 2940.0
10 1/2/2013 10.0 1293.0
[296]: # 6.
#The following code was used to get the number of unique dates in the dataframe
ud=transactions['date'].unique()
print(ud.shape)
(1683,)
So, from the result we see that there are 1682 unique values of variable date
[297]: # 7.
# We use the aggregate function to get the required summary values
transactions['transactions'].agg(['mean','median','min','max'])
From the result above, we see that the mean, median, minimum and maximum transactions were
1694.6149, 1393.00, 5.00 and 8359.00 respectively
[130]: # 8
TopTen_transactions=transactions.
↪sort_values(by=['transactions'],axis=0,ascending=False)
[131]: TopTen_transactions.head(10)
9
16570 12/23/2013 44.0 8256.0
33700 12/23/2014 44.0 8120.0
16572 12/23/2013 46.0 8001.0
16619 12/24/2013 46.0 7840.0
16573 12/23/2013 47.0 7727.0
52064 12/24/2015 44.0 7700.0
33748 12/24/2014 44.0 7689.0
70904 12/21/2016 44.0 7597.0
[302]: # 9
# I have to check if the date is in Date format
transactions.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83488 entries, 0 to 83487
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 83486 non-null object
1 store_nbr 83485 non-null float64
2 transactions 83485 non-null float64
dtypes: float64(2), object(1)
memory usage: 1.9+ MB
[304]: transactions['date']
[304]: 0 2013-01-01
1 2013-01-02
2 2013-01-02
3 2013-01-02
4 2013-01-02
…
83483 2017-08-15
83484 2017-08-15
83485 2017-08-15
83486 2017-08-15
83487 2017-08-15
Name: date, Length: 83488, dtype: datetime64[ns]
[305]: # get month from date and make it one variable in the transaction data
transactions['date'].dt.year
transactions['month']=transactions['date'].dt.month
[309]: grouped=transactions.
↪groupby(['month','store_nbr'],as_index=False)['transactions'].agg(['sum'])
10
[310]: grouped.head()
[307]: # 11
grouped=transactions.
↪groupby(['month','store_nbr'],as_index=False)['transactions'].
↪agg(['sum','mean'])
[311]: print(grouped)
[191]: # To select Month 1 Store 3 from the group dataframe, the following line of␣
↪code was used.
options = [3]
[192]: # 12.
print(store_3_mo_1)
[312]: #13.
# Then the following line of code was used to select the mean column
print(store_3_mo_1['mean'])
11
2 3151.428571
Name: mean, dtype: float64
[313]: # 14
#The following line of code was written to drop the outler layer of the index␣
↪in the dataframe and reset it to integer
[195]: print(grouped)
12