SP19-BCS-083 Assignment01

ASSIGNMENT - I
Data Science
7-B
Submitted on: March 24, 2022
Haider Nawaz Janjua

SP19-BCS-083
COMSATS UNIVERSITY ISLAMABAD
Department of Computer Science

Class Assignment – I
CLO-(C1)
Part A
Q1: What is Data Science?
Answer: Data Science is the field where first data is prepared for analytics,
then analytics are performed on the data after which interesting patterns and
insights are revealed which are presented to stakeholders so that they can
make informed decisions.
Basically, in Data Science, useful information and insights are discovered

from the data.
Q2: Discuss life cycle of Data Science
Answer: Team Data Science Process (TDSP, by Microsoft)1, based on

CRISP-DM, outlines the following major stages in the cycle:
1. Business understanding:
▪ Objectives are defined (what question(s) to answer)
▪ and data sources are identified (finding relevant data that will
help in answering the question(s)).
2. Data acquisition and understanding
▪ Data is ingested (moved from source to target location)
▪ And the Data is explored.
3. Modeling
▪ Feature Engineering (features are selected)
▪ Model Training (splitting dataset, building a model, and
evaluating)
4. Deployment
▪ Put the Model in Production.
1
https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle-data
1
5. Customer acceptance
▪ Confirm that the deployment satisfies customer’s objectives.
Q3: List down tools / languages used in this domain along the algorithms
/ techniques applied to get results.
Answer:
1. Tools:
▪ SAS
▪ Apache Spark
▪ Microsoft Excel
▪ Hadoop
2. Languages:
▪ R
▪ Python
▪ Scala
3. Algorithms:
▪ Linear Regression
▪ Logistic Regression
▪ Naïve Bayes
▪ KNN
▪ Support Vector Machine
▪ K-Means
4. Techniques2:
▪ Pattern Recognition
▪ Bootstrapping
▪ Resampling
2
https://seleritysas.com/blog/2021/01/22/key-data-science-modeling-techniques-used-in-data-evaluation-and-
analysis/
2
▪ Cross Validation
Q4: Also highlight any 3 use cases of Data Science.
Answer:
1. Price Optimization:
Analyzing multiple prices to determine what price will maximize
sales as well as profits.
2. Customer Segmentation and Clustering:
Customers based on survey, geographic area, or their shopping
habits are segmented. This allows for personalized advertising as
well as coming up with sales strategies for each segment.
3. Sentiment Analysis:
Text such as feedback, social media post, surveys are analyzed to
better understand user’s motivations.
4. Fraud Detection:
This involves using algorithm to detect fake profile, unusual
transactions, fake insurance claims.
3
Part B
Question 1
The temperature of a fridge containing a pharmaceutical company's new vaccine. We measure

the temperate every hour with the following data points (in Fahrenheit):
31, 32, 32, 31, 28, 29, 31, 38, 32, 31, 30, 29, 30, 31, 26
4
• Calculate, mean, median, standard deviation.
Answer:
• Plot the data and see if it is skewed or not.

Answer:
No, the graph is not skewed and is normal. This is because 1) mean, median and mode values are
approximately the same (i.e., 31). 2) Furthermore, number of datapoints before and after the peak
is same, i.e., 7 (symmetric) 3) we can see the bell shape.
5
• Which measure is better, mean, mode or median?
Answer:
Since the data is normal, it doesn’t really matter as the value of mean, median and mode
is same.
Question 2
The scores of students in their maths test:

86,86, 93, 86, 67, 85, 75, 64, 78,72,21
6
• Calculate, mean, median, standard deviation.
Answer:
• Plot the data and see if it is skewed or not.

Answer:
7
Yes, it is skewed. This is because 1) mean, median and mode values are NOT the same.
2) The Graph is not symmetric around the peak. 3) We cannot see the bell shape.
• Which measure is better, mean, mode or median?

Answer:
Mean:
The mean (73.9) here is influenced by an outlier (21 marks), so that is not preferred.
Between Median and Mode:

Most of the data is in between late 70’s and mid 80’s. The median (78) represents this fact well
vs mode (86) which doesn’t represent the data from the late 70’s.
That is why Median is preferred in this case.

SP19-BCS-083 Assignment01

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SP19-BCS-083 Assignment01

Uploaded by

Copyright:

Available Formats

ASSIGNMENT - I

Haider Nawaz Janjua

Department of Computer Science

Basically, in Data Science, useful information and insights are discovered

Q2: Discuss life cycle of Data Science

Answer: Team Data Science Process (TDSP, by Microsoft)1, based on

Q4: Also highlight any 3 use cases of Data Science.

The temperature of a fridge containing a pharmaceutical company's new vaccine. We measure

• Plot the data and see if it is skewed or not.

The scores of students in their maths test:

• Plot the data and see if it is skewed or not.

• Which measure is better, mean, mode or median?

Between Median and Mode:

That is why Median is preferred in this case.

You might also like