Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

[This question paper contains two printed pages]

Lab Assessment II, Spring 2024


Royal University of Bhutan
Gyalpozhing College of Information Technology
CSA203 Artificial Intelligence and Machine Learning

Time: 1 Hour, 30 Minutes Max. Marks: 25


Create a new notebook file and name the file with your student number along with
specialization (Example: 0220039_2AI). Mention the question number correctly in each
solution. Failing to do so will result in the deduction of marks.

PART A [15 Marks]

Use the given link below to access the dataset.


https://drive.google.com/file/d/17AaEKRQ6mGTFXN2yLgvxoywWjzYV0XO-/view?usp=sh
aring

Load and explore the dataset provided to understand its structure, determine the type
of supervised problem (classification or regression), and apply appropriate techniques
and algorithms to solve the problem.

The dataset contains 16 input features, with the final feature serving as the target
variable.

1. Load the dataset as df and split it into training and test sets. Note that the file is
in CSV format, but the delimiter is not a common (“,”), it s a semicolon
(“;”). Use the delimiter parameter in the read_csv method to correctly
load the data. [1]

2. Explore the data to determine which type of supervised learning problem it


represents. Justify your answer. [1]

3. Apply necessary feature engineering techniques using a pipeline. These


techniques include handling missing values by imputation, scaling numerical
features, and encoding categorical features.

Note: The missing values in the dataset are represented by the string “unknown”

3.1. Separate numerical and categorical columns to create separate pipelines


for numerical and categorical features. [0.5]

3.2. Create pipelines for both numeric and categorical features. [1.5]
4. Training ensemble model.

1
[This question paper contains two printed pages]
4.1. Select three base models and construct a basic stacking ensemble model.
Keep the default algorithm as the final estimator for the stacking
ensemble. Ensure to create a pipeline model for each base model and
train the stacking model accordingly. [3]

4.2. Select a boosting ensemble technique and construct a pipeline model


using the chosen meta-estimator. Then train your boosting model. [2]

4.3. Within the pipeline of your boosting model, add a feature selection
technique. Use any preferred feature selection technique to select the top
10 features. [2]

5. Evaluate the model.


5.1. Use the accuracy, confusion matrix, precision, and recall scores to
evaluate the stacking model. What is your conclusion from the model’s
accuracy? Justify your answer. [2.5]

Note: Mention this parameter for precision and recall pos_label='yes'

5.2. Compare the stacking model and boosting model. Which one is
performing better? why? [1.5]

PART B [10 Marks]

Use the link below to access the dataset for this task.
https://drive.google.com/file/d/1PZJH6WSZLwhYkFOKNyO57BxHUrUhja0a/view?usp=sh
aring

6. Load the dataset and create a new data frame named rdf. Assign the column
names of features as “x” and “y”. [1]

7. Conduct data exploration using a visualization graph of your preference. What


insights can be drawn from the graph regarding the formation of clusters? [2]

8. Use the elbow method to determine the optimal value of k for the KMeans
algorithm. Provide a reason for the selection of a specific k value. [2]

9. Create the KMeans model using the k value you got from the above step and
train the model. [1]

10. Plot your clustering result showing different clusters in different colors. Also,
plot your cluster centroids. [2]

11. Evaluate your clustering model. Is your model good? Justify your answer. [2]
ALL THE BEST ☺

You might also like