Professional Documents
Culture Documents
SP19-BCS-083 Assignment01
SP19-BCS-083 Assignment01
Data Science
7-B
Submitted on: March 24, 2022
Part A
Q1: What is Data Science?
Answer: Data Science is the field where first data is prepared for analytics,
then analytics are performed on the data after which interesting patterns and
insights are revealed which are presented to stakeholders so that they can
make informed decisions.
1
https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle-data
1
5. Customer acceptance
▪ Confirm that the deployment satisfies customer’s objectives.
Q3: List down tools / languages used in this domain along the algorithms
/ techniques applied to get results.
Answer:
1. Tools:
▪ SAS
▪ Apache Spark
▪ Microsoft Excel
▪ Hadoop
2. Languages:
▪ R
▪ Python
▪ Scala
3. Algorithms:
▪ Linear Regression
▪ Logistic Regression
▪ Naïve Bayes
▪ KNN
▪ Support Vector Machine
▪ K-Means
4. Techniques2:
▪ Pattern Recognition
▪ Bootstrapping
▪ Resampling
2
https://seleritysas.com/blog/2021/01/22/key-data-science-modeling-techniques-used-in-data-evaluation-and-
analysis/
2
▪ Cross Validation
Answer:
1. Price Optimization:
Analyzing multiple prices to determine what price will maximize
sales as well as profits.
2. Customer Segmentation and Clustering:
Customers based on survey, geographic area, or their shopping
habits are segmented. This allows for personalized advertising as
well as coming up with sales strategies for each segment.
3. Sentiment Analysis:
Text such as feedback, social media post, surveys are analyzed to
better understand user’s motivations.
4. Fraud Detection:
This involves using algorithm to detect fake profile, unusual
transactions, fake insurance claims.
3
Part B
Question 1
31, 32, 32, 31, 28, 29, 31, 38, 32, 31, 30, 29, 30, 31, 26
4
• Calculate, mean, median, standard deviation.
Answer:
No, the graph is not skewed and is normal. This is because 1) mean, median and mode values are
approximately the same (i.e., 31). 2) Furthermore, number of datapoints before and after the peak
is same, i.e., 7 (symmetric) 3) we can see the bell shape.
5
• Which measure is better, mean, mode or median?
Answer:
Since the data is normal, it doesn’t really matter as the value of mean, median and mode
is same.
Question 2
6
• Calculate, mean, median, standard deviation.
Answer:
7
Yes, it is skewed. This is because 1) mean, median and mode values are NOT the same.
2) The Graph is not symmetric around the peak. 3) We cannot see the bell shape.