Professional Documents
Culture Documents
CH 1
CH 1
By
Irandufa Indebu (MSc)
irandufa.indebu@haramaya.edu.et
1
Outline
• Definition of data science
2
Definition of data science (DS)
• Before defining data science, it is better to start from
the definition of data itself.
• Sometimes, it is difficulty to differentiate data,
information, and knowledge for some individuals
• Because, data for one person may be information for
the other person
• Let us discuss them in terms of DIKW (Data information
knowledge wisdom) paradigm
3
Definition of data science (DS)
Data-to-information-to-knowledge-to-intelligence-to-wisdom
cognitive progression
4
Definition of data science (DS)
5
Definition of data science (DS)
• Information is the processed data
• Represents a description of relevant data (objects)
in an organized and structured way, for a certain
purpose, or having a certain meaning
• it is the level of contextualization
• Answer simple question such as what, when, where,
who.
8
Definition of data science (DS)
10
Definition of data science (DS)
1. High-Level Definition
• Data science is a science of data or the study of data
2. Trans-disciplinary Definition
11
Definition of data science (DS)
• Data science is a multi-disciplinary field that uses scientific
methods, processes, algorithms, and systems to extract
knowledge and insights from structured, semi-structured and
unstructured data.
• It is a systematic study of raw data and making insightful
observations.
• From those observations one can take relevant actions to
establish a goal.
• Data acquisition, data cleaning, feature engineering,
modelling and visualization are some major parts of this
universe.
12
Definition of data science (DS)
• From the DIKIW-processing perspective, data science is
a systematic approach to ‘thinking with wisdom’,
“understanding the domain”, “managing data”,
“computing with data”, “discovering knowledge”,
“communicating with stakeholders”, “acting on insights”,
and “delivering products”.
13
Definition of data science (DS)
15
1. History of data collection
3200 BC in Mesopotamia.
16
Cont’d…
Transactional data include event information such as:
17
Cont’d…
• In 1970, Edgar F. Codd introduced data collection and
storage in terms of relational data model.
• Codd’s published the paper that provide the foundation for
SQL.
• SQL (structured query language ) is international standard
for defining database queries.
• Relational databases store data in tables with a structure of
one row per instance and one column per attribute.
• Databases are the natural technology to use for storing
and retrieving structured transactional or operational
data.
18
Cont’d…
• However, as companies have become larger and more
automated, the amount and variety of data generated
have dramatically increased.
• Analyzing these data and applying in decision making
were another headache for these companies
• The other problem was that the data were often stored
in numerous separate database within the orgn ..
• SELECT, INSERT, UPDATE, and DELETE were the only
simple operation that were used.
• This challenge led to the development of data
warehouses.
19
Cont’d…
• In a data warehouse, data are taken from across the
organization and integrated, thereby providing a more
comprehensive data set for analysis
• The challenge of data collected is not only the amount
of data collected that has grown dramatically but also
the variety of data.
• Emails, blogs, photos, tweets, likes, shares, web
searches, video uploads, online purchases, podcasts
are few data sources.
• If we look at metadata of these events, we can begin to
understand the meaning of the term big data
20
Cont’d…
• Big data are often defined in terms of three Vs:
• Extreme volume of data
• Variety of the data types
• velocity at which the data must be processed
• The existence of big data led to the development of
new data-processing frameworks
• Why?
• B/c it is impossible to process these data with ordinary
data base management system.
21
2. Historical Data Analysis
• When we talk about data analysis, statistics field is
always come to our mind.
• Statistics is the branch of science that deals with the
collection and analysis of data
• The simplest form of statistical analysis of data is the
summarization of a data set.
• This is in terms of:
• summary (descriptive) statistics (including measures of a
central tendency, such as the arithmetic mean, or measures of
variation, such as the range).
23
Cont’d…
• Data visualization is useful in helping data scientists
explore and understand the data they are working with.
• It can also be useful to communicate the results of a
data science project
• In 1943 Warren McCulloch and Walter Pitts proposed
the first mathematical model of a neural
network.
• In 1948, Claude Shannon published “A Mathematical
Theory of Communication” and by doing so founded
information theory.
24
Cont’d…
• In 1951, Evelyn Fix and Joseph Hodges proposed a
model for discriminatory analysis (classification or
pattern-recognition problem) that became the basis for
modern nearest-neighbor models.
• These postwar developments culminated in 1956 in the
establishment of the field of artificial intelligence at a
workshop in Dartmouth College
• In the mid-1960s, three important contributions to
machine learning (ML) were made
25
Emergence and Evolution of Data Science
26
Cont’d…
• In 2001, William S. Cleveland published an action plan
for creating a university department in the
field of data science.
• In 2015, a statement about the role of statistics in data
science was released by a number of ASA leaders, saying
that “statistics and machine learning play a central role
in data science.”
• In recent years, data science has been elaborated
beyond statistics. Why?
• B/C statistics cannot own data science, the broader
capability requirements that go beyond statistics like
computational issues
27
Cont’d…
• A multidisciplinary view has thus been increasingly
accepted:
28
Application of Data Science
29
Application of Data Science
Healthcare: The healthcare sector, especially, receives great
benefits from data science applications.
Medical Image Analysis
Transport
In the transportation sector, Data Science is actively making
its mark in making safer driving environments for the drivers
It is also playing a key role in optimizing vehicle performance
and adding greater autonomy to the drivers.
Data Science has actively increased its manifold with
the introduction of self-driving cars.
31
Application of Data Science
Healthcare
In the health-care industry, data science is making great leaps.
The various industries in health-care making use of data
science are:
• Medical Image Analysis
• Genetics and Genomics
• Drug Discovery
• Predictive Modeling for Diagnosis
• Health bots or virtual assistants
32
Application of Data Science
DATA SCIENCE APPLICATIONS AND EXAMPLES
Thanks
34