Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

INTRODUCTION TO DATA SCIENCE

AND ARTIFICIAL INTELLIGENCE


Lecturer: Nguyễn Văn Thiệu
Email: thieu.nguyenvan@phenikaa-uni.edu.vn

Documents:
PhD. Pham Tien Lam
• Xá c điṇ h khá ch hà ng tiềmnăng
• Gợ i ý sả n phẩm cho khá ch hà ng?
• Tă ng cườ ng hiệ u quả củ a quả ng
cá o?
• Hệ thống mở cử a tự độ ng?
• Hệ thống tự độ ng trả lờ i khá ch
hà ng?
• Tối ưu hoá quá trı̀nh vậ n tả i hà ng
hoá ?
• Dự đoá n giá chứ ng khoá n?
• Tự độ ng dic̣ h vă n bả n?
• Data
?

https://ourworldindata.org
• Data as models of What you created while using your
reality phone
• Make a calls
Geographical
•• Capture photos
Text message
Transport Cultural
• Browse internet
• Reading books
Natura
l
Scientific
• Learning
• ….

Meteorological Financia
All of your data are
l collected
Statistical
https://financesonline.com/how-much-data-is-created-every-day/
• Big data is a term that describes the large volume of data — both structured and
unstructured.
• Big data can be analyzed for insights that lead to better decisions and strategic
business moves.

https://www.weforum.org/agenda/2019/04/how-much-data-is-generated-each-day-
cf4bddf29f/
Volume Variety

BIG
Velocity
DATA Veracity

https://xcelpros.com/erps-make-big-data-and-big-business-a-good-
match/
Data Formats in Big
Data

Structuted Semi- Unstructuted


Data Structuted
Databas CS Vide
e V o
Data XM Audi
warehouse L o
ERP, JSO Imag
CRM N e
Document

Sensor
data
Machine
Learnin
COMPUTER g MATH AND
SCIENCE STATISTICS

DATA
SCIENTIST
Traditional Data
software Analyst

BUSINESS/ DOMAIN
EXPERTISE
• Data analyst
• Machine learning
engineer
• Deep learning engineer
• Data engineer
• Data scientist
• Risk analyst
• Business analyst
• …
• How can we
make it happen?
• What
will PRESCRIPTIVE
• Why did happen? ANALYTICS
• What it PREDICTIVE
happened happen? ANALYTICS
DIAGNOSTIC
? ANALYTICS
DESCRIPTIVE
ANALYTICS
• is an interdisciplinary field that involves using mathematical and statistical
methods to extract insights and knowledge from data.
• Goals: extract valuable information from data and use it to make informed
decisions, whether that's in the context of a business, government, or any
other organization.
• Data Science is being used in various industries, such as healthcare, finance,
marketing, and more.
• Define your problems
• Data collection: web, databases, log, APIs, and others
• Data preparation (data model):
➡ Data cleaning: missing data, inconsistency,
duplications
➡ Data transformations
• Exploratory analysis: in sight of data
• Modeling: statistics, machine learning
• Report
• Deploy and maintenance
KNOWLEDGE

Information that leads


INFORMATION to valuable actions
Processed or
summarized raw facts
DATA that can be used in
decision-making
Raw facts
about the
world
• Data
mining

Extraction of actionalble information from (usualy) very


large datasets, is the subject of extreme hype, fear, and
interest
HTTPS://WWW.HACKSTER.IO/AMIT-MANIAR/INDUSTRY-4-0-
CONNECTING-TRADITIONAL-HARDWARE-TO-INTERNET-BD83CE
• Business with
data

https://www.visualcapitalist.com/how-big-tech-makes-their-billions-
2022
• Business with
data

https://www.visualcapitalist.com/how-big-tech-makes-their-billions-
2022
• Business with
data
• Phenikaa
Dữ liệu lớn và nhu cầu của xã hội
• A machine which can do like
human?
Siri bot Cortana bot
sophia robot
Boston dynamic robots
• is the field of computer science focused on creating machines that can
perform tasks that typically require human intelligence, such as learning,
problem-solving, and pattern recognition.
• Goals: is to create machines that can perform tasks that were previously
performed by humans, and do so more efficiently and accurately.
• AI applications in various industries, such as autonomous vehicles, speech
recognition, recommendation systems, and more.
Artificial
Intelligence

Machine
Learnin
g
Statistics
Deep
Learnin
g

Data
Minin
g
Traditional Computer y = F(x) Artificial
Intelligent
Work
flow

• •
• Functiona •
l

Functional
1. Problem Settings
2.Data Collection
D = [(xi, yi), i =
1,2,...,m]

3.Modeling and Training Models

4.Model selection
5.Deploy suitable model (Using the best model to make
prediction)
" A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E."

Tom Mitchell

“Mộ t chương trı̀nh máy tı́nh đượ c cho là có khả nă ng họ c từ kinh


nghiêm dữ liệu E, để thự c hiện mộ t chứ c nă ng T vớ i hiệu nă ng P, nếu
nó thự c hiện chứ c nă ng T tố t hơn (đượ c đo bằ ng hiệu nă ng P) nếu có
mặ t dữ liệu E”
• is a subfield of AI that involves creating algorithms that can learn from data and
make predictions or decisions without being explicitly programmed to do so.
• is important because it allows individuals to automate tasks and make predictions
based on data, which can lead to improved decision-making and increased
efficiency.
• Types of Machine Learning:
 supervised learning
 unsupervised learning
 semi-supervised learning
 reinforcement learning
Supervised Learning

Training a model on labeled data and making predictions based on that training data.
Unsupervised Learning

Training a model on unlabeled data and finding patterns or relationships within the data
Semi-supervised Learning

• is a combination of supervised and


unsupervised learning.

• It uses a small amount of labeled


data and a large amount of
unlabeled data, which provides the
benefits of both unsupervised and
supervised learning while avoiding
the challenges of finding a large
amount of labeled data.
Comparison among SL, SSL, and UL
Reinforcement Learning
Creating models that can learn through trial and error, making decisions based on rewards and punishments..
Difference of ML types
Preprocessing Learnin Evaluation Prediction
g
Label
s

Trainning
Label Learning
s Dataset Final New
Algorith
model Data
Raw data m
Test Dataset

Label
s

Feature extraction/ Model


selection selection, Cross
Dimensinality Reduction, Validation
Sampling Performance Metrics,
• Introduction
• Python Programming
• Statistical Analysis
• Introduction to Machine Learning
• Supervised Learning
➡ Regression
➡ Classification
• Unsupervised Learning
➡ Clustering
➡ Dimensionality reduction
• Attendence : 5%
• Excercises: 5%
• Mid-term test: 40%
• Final exam: 50%
Springer Series in
Statistics

Trevor
Hastie
Robert
Tibshirani
The Elements of
Jerome
Friedman
Statistical
Learning
Data Mining, Inference,
and Prediction
Second Edition
Visualization ML
DL

ID
Scientific computing E
Data processing/
analysis Natural
processing
Language

You might also like