Dejene Chala Stat606 Screening Quiz Programming Part

Dejene_Chala_Stat606_Screening_Quiz_Programming_part
September 4, 2023
1 Python Statements
1. Tasks
a. Initialize a list of numbers: numbers = [3, 7, 1, 9, 2].
b. Implement a loop to iterate through each number in the list.
c. Inside the loop, use a conditional statement to check if the number is even. If it’s even, add
it to a new list called even_numbers.
d. Finally, print both the original list of numbers and the list of even numbers.
[11]: import numpy as np

import pandas as pd
import scipy.stats as stats
[12]: # 1 Answer
number=[3, 7, 1, 9, 2]
#1.a.
for num in number:
print(num,end=" ")
3 7 1 9 2
[8]: #1.b
for num in number:

if num % 2 == 0:
print(num, end=" ")
even=num
[13]: #1.c
print(even)
[14]: #1. d.
print("even: "+str(even), "entire: "+str(number))
1
even: 2 entire: [3, 7, 1, 9, 2]
2. Tasks: Write a Python code snippet using list comprehension to achieve this task:
a. Initialize the list of temperatures: fahrenheit_temps = [32, 68, 50, 104, 86].
b. Use list comprehension to convert each temperature from Fahrenheit to Celsius using the
conversion formula provided.
c. Store the converted temperatures in a new list called celsius_temps.
d. Print both the original list of Fahrenheit temperatures and the list of converted Celsius
temperatures.
[21]: # Answer
# 2.a
fahrenheit_temps = [32, 68, 50, 104, 86]
[51]: # 2.b
def cel_temp(fahrenheit_temps):
return (fahrenheit_temps- 32) *5/9
[52]: for num in fahrenheit_temps:

print(cel_temp(num),end=" ")
0.0 20.0 10.0 40.0 30.0
[53]: import pandas as pd

data={"fahrenheit_temps":[32, 68, 50, 104, 86],"cel_temps":[0.0, 20.0, 10.0, 40.
↪0, 30.0]}
[54]: # 2.c
df1=pd.DataFrame(data)
[55]: # 2. d.
print(df1)
fahrenheit_temps cel_temps
0 32 0.0
1 68 20.0
2 50 10.0
3 104 40.0
4 86 30.0
[64]: # or we can use

cel_temps=[0.0, 20.0, 10.0, 40.0, 30.0]
com_list= list(zip(fahrenheit_temps, cel_temps))
df2 = pd.DataFrame(com_list,
columns=['Ferhanite_temps', 'Celicius_temps'])
print(df2)
2
Ferhanite_temps Celicius_temps
0 32 0.0
1 68 20.0
2 50 10.0
3 104 40.0
4 86 30.0
2 Functions and Methods

3. Tasks: You are tasked with analyzing a dataset for a machine learning project. As a pre-
processing step, you need to remove any duplicate data points from the dataset. Write a Python
code snippet that defines a function to remove duplicates from a list and then uses that function
to process a given list of numbers.
a. Write a function named remove_duplicates that takes a list as its argument and returns a
new list with duplicate elements removed.
b. Next, initialize a list of numbers: numbers = [2, 5, 7, 2, 8, 7, 10, 5]. Use the remove_duplicates
function to process this list and print the resulting list.
[70]: # 3.
# 3.a. The function that removes duplicates
def Remove_duplicate(Original_data):
final_list = []
for num in Original_data:
if num not in final_list:
final_list.append(num)
return final_list
[71]: # 3.b.
numbers=[2, 5, 7, 2, 8, 7, 10, 5]
print(Remove_duplicate(numbers))
[2, 5, 7, 8, 10]
3 Object Oriented Programming

4. Tasks: You are working on a project to model different types of vehicles for an autonomous
transportation system. You have a base class Vehicle with attributes like make , model , and year ,
as well as a method get_info() to display information about a vehicle. Now, you need to create two
subclasses: Car and Bus. Both subclasses will inherit from the Vehicle class and add their specific
attributes and methods.
a. Define the Vehicle class with attributes ( make , model , year ) and a method get_info() that
prints the vehicle’s information.
b. Define the Car class as a subclass of Vehicle. Add an additional attribute num_doors and a
method get_doors() to display the number of doors.
3
c. Define the Bus class as a subclass of Vehicle. Add an additional attribute capacity and a
method get_capacity() to display the passenger capacity.
d. Create an instance of the Car class and an instance of the Bus class. Use the methods to
display their information, including doors for the car and capacity for the bus.
[287]: # 4
class Vehicle:
def __init__ (self,Make,Model,Year):
self.Make=Make
self.Model=Model
self.Year=Year
def get_info(self):
print(f"The Vehicle is produced by {self.Make} company and its model is␣
↪{self.Model} produced in the year {self.Year} ")
class Car(Vehicle):
def __init__(self, Make,Model,Year,num_doors):
super().__init__(Make,Model,Year)
self.num_doors=num_doors
def get_doors(self):
print(f"this Car has {self.num_doors} doors")
class Bus(Vehicle):
def __init__(self, Make,Model,Year,capacity):
super().__init__(Make,Model,Year)
self.capacity=capacity
def get_capacity(self):
print(f"this A luxury Bus that has the capacity of accomodating {self.
↪capacity} Passengers")
c=Car("FORD","ESCORT",2020,4)
c.get_info()
c.get_doors()
b=Bus("Ebusco","Ebusco",2017,60)
b.get_info()
b.get_capacity()
The Vehicle is produced by FORD company and its model is ESCORT produced in the
year 2020
this Car has 4 doors
The Vehicle is produced by Ebusco company and its model is Ebusco produced in
the year 2017
this A luxury Bus that has the capacity of accomodating 60 Passengers
4
4 Visualization using Python Matplotlib Library
5. Tasks: You are analyzing the performance of different machine learning algorithms on a
dataset. You want to create a bar chart to visualize the accuracy scores achieved by three algo-
rithms: Random Forest, Support Vector Machine (SVM), and Naive Bayes. Write a Python code
snippet using Matplotlib to achieve this task.
a. Initialize a list of algorithm names: algorithms = [‘Random Forest’, ‘SVM’, ‘Naive Bayes’].
b. Initialize a list of accuracy scores for each algorithm: accuracy_scores = [0.85, 0.78, 0.92].
c. Use Matplotlib to create a bar chart that displays the accuracy scores for each algorithm.
Label the x-axis with algorithm names and the y-axis with “Accuracy Score”.
d. Provide appropriate labels, title, and color for the bars.
[282]: # Answer
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
[283]: algorithms = ['Random Forest', 'SVM', 'Naive Bayes']

accuracy_scores = [0.85, 0.78,0.92]
[90]: style.use('ggplot')
plt.figure(figsize=(3,3))
color=['royalblue','red','purple']
plt.bar(algorithms,accuracy_scores,color=color,width=0.5)
plt.xlabel('Algorithm')
plt.ylabel('Accuracy score')
plt.title('Barchart for Accuracy of different Algorithms')
[90]: Text(0.5, 1.0, 'Barchart for Accuracy of different Algorithms')
5
5 Manipulating and Analyzing Data with Pandas
5.0.1 6. Tasks:
6.1. Import Pandas library to enable data manipulation and analysis.
6.2. Read in the transactions_dataset.csv and parse the date column, and assign the resulting
DataFrame to a variable named transactions .
6.3. Report the following about the transactions DataFrame:
a. The number of rows and columns
b. The names of the columns
c. The datatypes of each column
d. Inspect Data: Quickly inspect the first 10 rows of the transactions data to get a sense of its
structure.
6.4. Check if any values are missing in the date , store_nbr or transactions columns of the trans-
actions dataset.
6.5. Drop all rows in the transactions DataFrame that contain at least one missing value (NAN)
inplace.
6.6. Find and report the number of unique dates in the transactions DataFrame.
6.7. Calculate and report the mean, median, minimum, and maximum of the transactions column
in the transactions DataFrame.
6.8. Create a table that displays the top 10 stores by total transactions from the transactions
DataFrame. Ensure that the stores are sorted in descending order by transaction count.
6
6.9. Extract month from date column in the transactions DataFrame created in step 2. Add this
extracted month column to the transactions DataFrame.
6.10. Generate a table that shows the total transactions by store and month based on the transac-
tions DataFrame.
6.11. Create a new DataFrame named grouped to calculate the sum and mean of transactions for
each store and month combination.
6.12. Extract the row corresponding to Store 3 and Month 1 from the grouped DataFrame.
6.13. Then select the column storing the mean of transactions.
6.14. Drop the outer layer of the column Index in grouped dataframe and then reset the row index
to the default integer index.
Answer
[288]: # Answer to 1,2

#pandas package was installed early, So, the following line of code was used to␣
↪read the DataFrame into Phyton.
transactions=pd.read_csv("G:/Fall2023/stat606/data/transactions_dataset.csv")
[289]: for col in transactions.columns:

print(col, end= " ")
date store_nbr transactions
[290]: # or we can use sorted function also to get the columns in the dataframe
sorted(transactions)
[290]: ['date', 'store_nbr', 'transactions']
[92]: # 3. a,b:
# the following is used to get the number of rows and columns of the given␣
↪DataFrame
print(transactions.shape)# So we wee that DataFrame has 83488 row and 3 columns
(83488, 3)
So we see that there are three columns in the dataframe: Date,Store_nbr and Transactions
[99]: # 3 c.
# the following lines of codes were used to get the data types of each of the␣
↪columns in the dataframe
transactions.dtypes
[99]: date object

store_nbr float64
transactions float64
dtype: object
7
[100]: # 3. d
# The following line was used to inspect 10 cases of the dataframe
transactions.head(10)
[100]: date store_nbr transactions

0 1/1/2013 25.0 770.0
1 1/2/2013 1.0 2111.0
2 1/2/2013 2.0 2358.0
3 1/2/2013 3.0 3487.0
4 1/2/2013 4.0 1922.0
5 1/2/2013 5.0 1903.0
6 1/2/2013 6.0 2143.0
7 1/2/2013 NaN 1874.0
8 1/2/2013 8.0 3250.0
9 1/2/2013 9.0 2940.0
From the result we see that date is date variable, store_nbr is integer with some missing values
and transaction is integer
[101]: # 4
# The following lines of code was used to identify the missing values in the␣
↪dataframe
transactions.isnull().sum()
[101]: date 2
store_nbr 3
transactions 3
dtype: int64
So, we see that there are 8 rows in the dataframe with atleast one missing value
[102]: transactions.isnull().sum().sum()
[102]: 8
[111]: # 5
transactions.dropna(how='any',inplace=True)
[114]: transactions.head(10)

0 1/1/2013 25.0 770.0
1 1/2/2013 1.0 2111.0
2 1/2/2013 2.0 2358.0
3 1/2/2013 3.0 3487.0
4 1/2/2013 4.0 1922.0
5 1/2/2013 5.0 1903.0
8
6 1/2/2013 6.0 2143.0
8 1/2/2013 8.0 3250.0
9 1/2/2013 9.0 2940.0
10 1/2/2013 10.0 1293.0
[296]: # 6.
#The following code was used to get the number of unique dates in the dataframe
ud=transactions['date'].unique()
print(ud.shape)
(1683,)
So, from the result we see that there are 1682 unique values of variable date
[297]: # 7.
# We use the aggregate function to get the required summary values
transactions['transactions'].agg(['mean','median','min','max'])
[297]: mean 1694.613056

median 1393.000000
min 5.000000
max 8359.000000
Name: transactions, dtype: float64
[298]: # or the we can use the discribe function

transactions['transactions'].describe()
[298]: count 83485.000000

mean 1694.613056
std 963.301404
min 5.000000
25% 1046.000000
50% 1393.000000
75% 2079.000000
max 8359.000000
Name: transactions, dtype: float64
From the result above, we see that the mean, median, minimum and maximum transactions were
1694.6149, 1393.00, 5.00 and 8359.00 respectively
[130]: # 8
TopTen_transactions=transactions.
↪sort_values(by=['transactions'],axis=0,ascending=False)
[131]: TopTen_transactions.head(10)

52011 12/23/2015 44.0 8359.0
71010 12/23/2016 44.0 8307.0
9
16570 12/23/2013 44.0 8256.0
33700 12/23/2014 44.0 8120.0
16572 12/23/2013 46.0 8001.0
16619 12/24/2013 46.0 7840.0
16573 12/23/2013 47.0 7727.0
52064 12/24/2015 44.0 7700.0
33748 12/24/2014 44.0 7689.0
70904 12/21/2016 44.0 7597.0
[302]: # 9
# I have to check if the date is in Date format
transactions.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83488 entries, 0 to 83487
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 83486 non-null object
1 store_nbr 83485 non-null float64
2 transactions 83485 non-null float64
dtypes: float64(2), object(1)
memory usage: 1.9+ MB
[303]: # change it to date format as follows

transactions['date']=pd.to_datetime(transactions['date'])
[304]: transactions['date']
[304]: 0 2013-01-01
1 2013-01-02
2 2013-01-02
3 2013-01-02
4 2013-01-02
…
83483 2017-08-15
83484 2017-08-15
83485 2017-08-15
83486 2017-08-15
83487 2017-08-15
Name: date, Length: 83488, dtype: datetime64[ns]
[305]: # get month from date and make it one variable in the transaction data
transactions['date'].dt.year
transactions['month']=transactions['date'].dt.month
[309]: grouped=transactions.
↪groupby(['month','store_nbr'],as_index=False)['transactions'].agg(['sum'])
10
[310]: grouped.head()
[310]: month store_nbr sum

0 1.0 1.0 229203.0
1 1.0 2.0 282101.0
2 1.0 3.0 463260.0
3 1.0 4.0 222075.0
4 1.0 5.0 208297.0
[307]: # 11
grouped=transactions.
↪groupby(['month','store_nbr'],as_index=False)['transactions'].
↪agg(['sum','mean'])
[311]: print(grouped)
month store_nbr sum

0 1.0 1.0 229203.0
1 1.0 2.0 282101.0
2 1.0 3.0 463260.0
3 1.0 4.0 222075.0
4 1.0 5.0 208297.0
.. … … …
636 12.0 49.0 447690.0
637 12.0 50.0 394846.0
638 12.0 51.0 260405.0
639 12.0 53.0 105711.0
640 12.0 54.0 121409.0
[641 rows x 3 columns]
[191]: # To select Month 1 Store 3 from the group dataframe, the following line of␣
↪code was used.
options = [3]
store_3_mo_1 = grouped.loc[(grouped['month'] == 1) &

grouped['store_nbr'].isin(options)]
[192]: # 12.
print(store_3_mo_1)
month store_nbr sum mean

2 1 3.0 463260.0 3151.428571
[312]: #13.
# Then the following line of code was used to select the mean column
print(store_3_mo_1['mean'])
11
2 3151.428571
Name: mean, dtype: float64
[313]: # 14
#The following line of code was written to drop the outler layer of the index␣
↪in the dataframe and reset it to integer
grouped.reset_index(inplace = True, drop = True)
[195]: print(grouped)
month store_nbr sum mean

0 1 1.0 229203.0 1548.668919
1 1 2.0 282101.0 1919.054422
2 1 3.0 463260.0 3151.428571
3 1 4.0 222075.0 1510.714286
4 1 5.0 208297.0 1407.412162
.. … … … …
636 12 49.0 447690.0 3730.750000
637 12 50.0 394846.0 3290.383333
638 12 51.0 260405.0 2170.041667
639 12 53.0 105711.0 1174.566667
640 12 54.0 121409.0 1011.741667
[641 rows x 4 columns]
12

Dejene Chala Stat606 Screening Quiz Programming Part

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dejene Chala Stat606 Screening Quiz Programming Part

Uploaded by

Copyright:

Available Formats

Dejene_Chala_Stat606_Screening_Quiz_Programming_part

[11]: import numpy as np

for num in number:

[52]: for num in fahrenheit_temps:

0.0 20.0 10.0 40.0 30.0

[53]: import pandas as pd

[64]: # or we can use

2 Functions and Methods

3 Object Oriented Programming

[283]: algorithms = ['Random Forest', 'SVM', 'Naive Bayes']

[90]: Text(0.5, 1.0, 'Barchart for Accuracy of different Algorithms')

[288]: # Answer to 1,2

[289]: for col in transactions.columns:

date store_nbr transactions

[290]: ['date', 'store_nbr', 'transactions']

print(transactions.shape)# So we wee that DataFrame has 83488 row and 3 columns

[99]: date object

[100]: date store_nbr transactions

[114]: date store_nbr transactions

[297]: mean 1694.613056

[298]: # or the we can use the discribe function

[298]: count 83485.000000

[131]: date store_nbr transactions

[303]: # change it to date format as follows

[310]: month store_nbr sum

month store_nbr sum

[641 rows x 3 columns]

store_3_mo_1 = grouped.loc[(grouped['month'] == 1) &

month store_nbr sum mean

grouped.reset_index(inplace = True, drop = True)

month store_nbr sum mean

[641 rows x 4 columns]

You might also like