Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

5KS04 Data Science and Statistics L-3, T-0, C-

Section: A
Total student :76
• Dr. Sheetal S. Dhande-Dandge

• Professor,
• Deptt of Computer Science & Engineering
Sipna College of Engineering and
Technology, Amravati. 444601
• Maharashtra, India
• Hon. Secretary, CSI Amravati Chapter.
• BOS Member of CSE, SGBA University.
• Member of IT Board, SGBA University.

02-08-2022 Dr. S. S. Dhande-Dandge 1

Course Basic Knowledge of Mathematics & Discrete Structures

Course Objectives: Throughout the course, students will be expected to demonstrate their understanding of Data
Science and Statistics by being able to do each of the following:
1. To understand the need of data science and Statistics
2. To understand the knowledge of statistics data analysis techniques utilized in business decision
3. To understand and apply the different data modeling strategies.
4. To apply the learned concept for the skillful data management.

Course Outcomes On completion of the course, the students will be able to

(Expected 1. Explain basics and need of data science
Outcome): 2. Demonstrate proficiency with statistical analysis of data.
3. Perform linear and multiple linear regression analysis.
4. Develop the ability to build and assess classification-based models
5. Evaluate outcomes and make decisions based on data.
6. Compare machine learning techniques to solve data science business problems

02-08-2022 Dr. S. S. Dhande-Dandge 2

Unit I: Introduction to Data Science Hours:6

What Is Data Science? Where Do We See Data Science? How Does Data Science Relate to Other Fields? The Relationship between Data Science and Information Science,
Computational Thinking , Skills for Data Science , Tools for Data Science, Issues of Ethics, Bias, and Privacy in Data Science
Data: Data types, Data Collection, Data Pre-processing.

Unit II: Statistical Learning & Inference Hours:8

Need of Statistics in Data Science, Measures of central tendency: Mean, Median, Mode, Mid-range. Measures of Dispersion: Range, variance, Mean deviation, standard
Techniques: Introduction, Data Analysis and Data Analytics, Descriptive Analysis , Diagnostic Analytics ,Predictive Analytics, Prescriptive Analytics, Exploratory Analysis,
Mechanistic Analysis, Regression

Unit III: Regression and its Techniques Hours:6

Linear Regression, Multiple Linear Regression, Other Considerations in the Regression Model Comparison of Linear Regression with K-Nearest Neighbors

Unit IV: Classification Hours:6

An Overview of Classification, Why Not Linear Regression? Logistic Regression, Linear Discriminant Analysis, Comparison of Classification Methods

Unit V: Tree Based Methods Hours:6

Tree-Based Methods: The Basics of Decision Trees, Regression and Classification Trees, Trees Versus Linear Models, Advantages and Disadvantages, Bagging, Random Forests,
Unit VI: Supervised and Unsupervised Learning Hours:8

Supervised Learning: Introduction, Logistic Regression, Softmax Regression, Classification with KNN, Decision Tree,252 9.5.1 Decision Rule, Random Forest Naïve Bayes,
Support Vector Machine (SVM)
Unsupervised Learning: Introduction, Agglomerative Clustering, Divisive Clustering, Expectation Maximization (EM) Introduction to Reinforcement Learning.
02-08-2022 Dr. S. S. Dhande-Dandge 3

Text Book: Text Book:

1. Chirag Shah,” A Hands-on Introduction
1. Chirag to Shah,”
Data Science
A Hands-on
“, Cambridge
to Data
Cambridge University
2. Press (2020)
Data Mining: Concepts and Techniques ISBN:978-1-108-47244-9.
By Jiawei Han, Jian Pei, Micheline Kamber ( Chapter 2: point 2.2)
3. Gareth James, Daniela Witten,
2. Trevor
Data Mining:
Hastie, Robert
and Techniques
An Introduction
By JiaweitoHan,
Jian Pei,
( in R, First Edition, 2013, Springer-
Verlag New York, ISBN: 978-1- 4614-7137-0.
Chapter 2: point 2.2)
3. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to
Statistical Learning with Applications in R, First Edition, 2013, Springer-Verlag New
York, ISBN: 978-1- 4614-7137-0.

Reference Books:
Reference Books:
1. Cathy O’Neil and Rachel Schutt: Doing Data Science, First Edition, 2014, O’reilly Publications, ISBN:978-1-449-35865-5.
1. Cathy O’Neil and Rachel Schutt: Doing Data Science, First Edition, 2014, O’reilly
2. DT Editorial Services, “Big Data, Black Book”, DT Editorial Services, ISBN:
Publications, ISBN:978-1-449-35865-5.
9789351197577, 2016 Edition2. DT Editorial Services, “Big Data, Black Book”, DT Editorial Services, ISBN:

9789351197577, 2016 Edition

02-08-2022 Dr. S. S. Dhande-Dandge 4

Introduction to Data Science
Unit I:
Introduction to Data Science
Unit I: Introduction to Data Science Hours:6

What Is Data Science? Where Do We See Data Science? How Does Data
Science Relate to Other Fields? The Relationship between Data Science
and Information Science, Computational Thinking , Skills for Data Science ,
Tools for Data Science, Issues of Ethics, Bias, and Privacy in Data Science
Data: Data types, Data Collection, Data Pre-processing.

02-08-2022 Dr. S. S. Dhande-Dandge 5

Introduction to Data Science
Unit I:
Introduction to Data Science
Unit I: Introduction to Data Science Hours:6

What Is Data Science? Where Do We See Data Science? How Does Data
Science Relate to Other Fields? The Relationship between Data Science
and Information Science, Computational Thinking , Skills for Data Science ,
Tools for Data Science, Issues of Ethics, Bias, and Privacy in Data Science
Data: Data types, Data Collection, Data Pre-processing.

02-08-2022 Dr. S. S. Dhande-Dandge 6

1.1 What Is Data Science
• Data science as a field of study and practice that involves the
collection, storage, and processing of data in order to derive
important insights into a problem or a phenomenon.
• Such data may be generated by humans (surveys, logs, etc.) or
machines (weather data, road vision, etc.), and could be in different
formats (text, audio, video, augmented or virtual reality, etc.).
• We will also treat data science as an independent field by itself rather
than a subset of another domain, such as statistics or computer
science. This will become clearer as we look at how data science
relates to and differs from various fields and disciplines

02-08-2022 Dr. S. S. Dhande-Dandge 7

Why is data science so important now?
• Dr. Tara Sinclair, the chief economist at indeed. com since 2013, said, “the
number of job postings for ‘data scientist’ grew 57%” year-overyear in the first
quarter of 2015.
• Why have both industry and academia recently increased their demand for data
science and data scientists? What changed within the past several years? The
answer is not surprising: we have a lot of data, we continue to generate a
staggering amount of data at an unprecedented and ever-increasing speed,
analyzing data wisely necessitates the involvement of competent and well-trained
practitioners, and analyzing such data can provide actionable insights
• The “3V model” attempts to lay this out in a simple (and catchy) way.
• These are the three Vs:
• 1. Velocity: The speed at which data is accumulated.
• 2. Volume: The size and scope of the data.
• 3. Variety: The massive array of data and types (structured and unstructured)

02-08-2022 Dr. S. S. Dhande-Dandge 8

1.2 Where Do
We See Data
• Figure 1.1 Increase of data
volume in last 15 years.
(Source: IDC’s Digital
Universe Study, December

02-08-2022 Dr. S. S. Dhande-Dandge 9

Where Do We See Data Science
• 1.2.1 Finance
• 1.2.2 Public Policy
• 1.2.3 Politics
• 1.2.4 Healthcare
• 1.2.5 Urban Planning
• 1.2.6 Education
• 1.2.7 Libraries

02-08-2022 Dr. S. S. Dhande-Dandge 10

How Does Data Science Relate to Other
• 1.3.1 Data Science and Statistics
• how does the knowledge of these fields blend together?
• Statistician and data visualizer Nathan Yau of Flowing Data suggests that data scientists
should have at least three basic skills:
• 1. A strong knowledge of basic statistics (see Chapter 3) and machine learning (see Chapters 8–10)
– or at least enough to avoid misinterpreting correlation for causation or extrapolating too much
from a small sample size.
• 2. The computer science skills to take an unruly dataset and use a programming language (like R or
Python, see Chapters 5 and 6) to make it easy to analyze.
• 3. The ability to visualize and express their data and analysis in a way that is meaningful to
somebody less conversant in data (see Chapters 2 and 11).
• 1.3.2 Data Science and Computer Science
• Data science and computer science overlap and are mutually supportive. Some of the algorithms
and techniques developed in the computer science field – such as machine learning algorithms,
pattern recognition algorithms, and data visualization techniques – have contributed to the data
science discipline

02-08-2022 Dr. S. S. Dhande-Dandge 11

How Does Data Science Relate to Other
• 1.3.4 Data Science and Business Analytics

• Business analytics (BA) refers to the skills, technologies, and practices for continuous
iterative exploration and investigation of past and current business performance to gain
insight and be strategic. BA focuses on developing new perspectives and making sense of
performance based on data and statistics. And that is where data science comes in. To
fulfill the requirements of BA, data scientists are needed for statistical analysis, including
explanatory and predictive modeling and fact-based management, to help drive
successful decision-making. There are four types of analytics, each of which holds
opportunities for data scientists in business analytics:30 1. Decision analytics: supports
decision-making with visual analytics that reflect reasoning. 2. Descriptive analytics:
provides insight from historical data with reporting, score cards, clustering, etc. 3.
Predictive analytics: employs predictive modeling using statistical and machine learning
techniques. 4. Prescriptive analytics: recommends decisions using optimization,
simulation, etc.

02-08-2022 Dr. S. S. Dhande-Dandge 12

How Does Data Science Relate to Other
• 1.3.3 Data Science and Engineering
• Broadly speaking, engineering in various fields (chemical, civil, computer, mechanical,
etc.) has created demand for data scientists and data science methods. Engineers
constantly need data to solve problems. Data scientists have been called upon to
develop methods and techniques to meet these needs. Likewise, engineers have assisted
data scientists. Data science has benefitted from new software and hardware developed
via engineering, such as the CPU (central processing unit) and GPU (graphic processing
unit) that substantially reduce computing time
• 1.3.5 Data Science, Social Science, and Computational Social Science
• Computational social science raises inevitable questions about the politics and ethics
often embedded in data science research, particularly when it is based on sociopolitical
problems with real-life applications that have far-reaching consequences. Government
policies, people’s mandates in elections, and hiring strategies in the private sector, are
prime examples of such applications

02-08-2022 Dr. S. S. Dhande-Dandge 13

Take away from first lecture
• Orientation of syllabus
• The present chapter provided several views on how people think and talk
about data science, how it affects or is connected to various fields, and
what kinds of skills a data scientist should have.
• Using a small example, we practiced (1) data collection, (2) descriptive
statistics, (3) correlation, (4) data visualization, (5) model building, and (6)
extrapolation and regression analysis.
• As we progress through various parts of this book, we will dive into all of
these and more in detail, and learn scientific methods, tools, and
techniques to tackle data-driven problems, helping us derive interesting
and important insights for making decisions in various fields – business,
education, healthcare, policy-making, and more.
02-08-2022 Dr. S. S. Dhande-Dandge 14

You might also like