Priyanka Rane - C4 - Roll - No.33 - B87 - AIML

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Priyanka Rane_C4_Roll.No.

33_B87_AIML

Honor Code
“As a member of the SPJIMR student community, I pledge to abide by the rules and regulations of the
examination and conduct myself with the highest levels of fairness and integrity. I will not indulge in
any unfair practices in this examination and will also extend my support to the institute officials to
conduct this examination in a free and fair manner."

Q1. Solution:
a. The problem statement for building a machine learning prediction model from the
provided glass type classifier dataset is:
To develop a machine learning model that accurately classifies types of glass based
on their chemical composition features, such as Sodium, magnesium, aluminium,
silicon, potassium, calcium, barium, and iron, with the objective of differentiating
between building windows, containers, headlamps, tableware, and vehicle windows.
b. This is a classification problem because the goal is to categorize the observations into
discrete, predefined labels representing the different types of glass. The target
variable 'Type of glass' is categorical with distinct classes, which is indicative of a
classification task in machine learning.

c. In the provided dataset, the features are the measurable properties or attributes that
the model will use to classify the type of glass. These features are the chemical
composition percentages of elements present in the glass:

RI (Refractive Index)
Sodium
Magnesium
Aluminum
Silicon
Potassium
Calcium
Barium
Iron

The label, also known as the target variable, is the variable we want to predict using
these features. For this dataset, the label is 'Type of glass', which indicates the
category of glass, such as Building windows, Containers, Headlamps, Tableware, or
Vehicle windows. The features are the inputs for the model, while the label is the
output it predicts.

d. This is a supervised learning problem because the dataset includes labeled examples
the 'Type of glass' for each observation. In supervised learning, the model learns
from the input features and their corresponding labels to make predictions.
Priyanka Rane_C4_Roll.No.33_B87_AIML

e. To figure out what kind of glass we have, we might use two popular methods. One is
like a flowchart that helps us decide by asking a series of questions based on the
glass's details (Decision Tree). The other method is good at sorting things even when
there's a lot of different information to consider, and it can find patterns that aren't
just straight lines (Support Vector Machine or SVM).

f. How well a model guesses right can depend on how good the data is and which parts
of the data it looks at. we usually want the model to guess right as much as possible.
To see if the model is good, we can look at things like precision (how many selected
items are relevant), recall (how many relevant items are selected), and the F1-score
(which combines precision and recall to give a single score). Also, a confusion matrix
is like a report card that shows where the model made right and wrong guesses,
which helps us to know if the model is good for all kinds of predictions.

g. To make a prediction model better, we can try a few things like picking the right
features from your data, adjusting the model settings, combining different models to
get the best parts of each, and making sure the training data is varied and covers
many cases. We should also avoid making the model too specific to just to training
data, and also check how well the model works with new data it hasn't seen before
and keep updating the model and making it better over time.

Q2. Solutions:
a. False.
Logistic regression is like a type of sorting hat that decides which of two groups something
belongs to, rather than predicting actual numbers. It's great for answering yes or no
questions, like if an email is spam (yes or no) or if a team will win a game (win or lose). It
works by calculating the chance of something being true or false. This calculation gives a
number between 0 and 1, like a percentage chance. If you want to guess a number like the
price of a house, that's a job for another method called linear regression, not logistic
regression.
b. False.
A word cloud is like a picture made of words from a bunch of text. The more often a word
shows up in the text, the bigger and bolder it appears in the picture. So, the big words are
the ones we see a lot in the text, and the small ones aren't as common. It's a handy way to
see which words are most important or come up a lot in whatever we're reading or looking
at.
c. False.
The training dataset teaches the model what to look for and helps it find patterns. However,
to really see if the model learned well, we don't use the material it was trained on. Instead,
we use a new set of data, called the test dataset. This is like giving the model a final exam on
Priyanka Rane_C4_Roll.No.33_B87_AIML

stuff it hasn't seen in class to make sure it can apply what it learned to new situations. This
helps us check how good the model is at predicting new, unseen data.
d. True.
In making a model to catch when someone uses a credit card wrongly, it's really important to
not miss any bad transactions. A "false negative" is when the model thinks a bad transaction
is okay. We don't want this because it means a thief might get away with stealing money,
which is bad for both the person whose card was used and the bank. So, we try really hard
to make sure our model catches as many of these bad transactions as possible. This way, we
keep everyone's money safer and maintain trust in using credit cards.
e. False.
Machine learning is not an advanced form of deep learning; rather, deep learning is a subset
of machine learning. Machine learning is a broader field that encompasses various
techniques and algorithms for enabling computers to learn from data and make decisions or
predictions. Deep learning, on the other hand, specifically refers to a set of algorithms based
on artificial neural networks with many layers, or "deep" networks, that can automatically
learn complex patterns in large datasets. Deep learning has become popular due to its ability
to achieve remarkable performance in tasks such as image and speech recognition, natural
language processing, and other areas that require the analysis of vast amounts of
unstructured data.
f. False.
Clustering is an unsupervised learning technique, not supervised. The primary purpose of
clustering is to group data points into clusters based on similarity without prior knowledge
of the group assignments. Unlike supervised learning, where models are trained on labeled
data (data with predefined categories or labels), clustering works with unlabeled data. It
identifies patterns or structures within the data by analyzing the intrinsic characteristics of
the data points and grouping them accordingly. Clustering has important business use cases,
such as customer segmentation, anomaly detection, and market research, where the goal is
to discover natural groupings within the data.

Q3. Solution:
High-quality training data is crucial for building effective machine learning and deep learning
models because the accuracy and reliability of these models directly depend on the quality
of the data they are trained on. Poor quality data, characterized by inaccuracies,
inconsistencies, missing values, or bias, can lead to models that perform inadequately, make
incorrect predictions, or fail to generalize from the training data to real-world scenarios.
High-quality data, on the other hand, ensures that the learned patterns are meaningful,
representative of the underlying problem, and applicable to new, unseen data.
For example, considering an example of marketing communication division of my
organisation Legrand, a company that specializes in electrical and digital building
Priyanka Rane_C4_Roll.No.33_B87_AIML

infrastructures. If Legrand wants to use machine learning to optimize its email marketing
campaigns by predicting which customers are most likely to engage with certain types of
content, it must have high-quality training data. This data might include historical
engagement metrics, customer demographic information, and previous campaign results.
High-quality training data would be clean, well-organized, accurately labeled, and reflective
of the diverse customer base. Training a model on such data would enable Legrand to
effectively target its marketing efforts, personalize communications, and improve customer
engagement rates, ultimately leading to higher ROI for its marketing campaigns.
Without high-quality data, the model might target the wrong customers, overlook key
demographics, or fail to capture the nuances of customer preferences, leading to suboptimal
marketing strategies and wasted resources.

Reference:
Class presentation & learnings
Orange data mining software tutorials
Course pack case studies for AIML

You might also like