Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

1.

List 6 steps in data mining process


- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment

2. Give an example for 6 steps in data mining process (Eg: UEH or any
business case) - Minh Hương

MOMO case:

- Business Understanding → business purpose: increase profit, increase loyalty


rate
- Data Understanding → type of data: transaction record (data, users,...)
- Data Preparation → Fill the missing and remove noisy data
- Modeling → Using formulas (excel, other tools,...)
- Evaluation → check the logic/ relevance of model
- Deployment → implement on business operation and adjusting along the
way

3. List all types of data - Anh Khoa


- Text:

- Web data

- Images/videos

- Sound

- Audio

-Data time

-Numeric data

4. What are the differences between qualitative/categorical and


quantitative/numeric data? Give an example. - Anh Khoa

Qualitative data Quantitative data

Countable or measurable, Descriptive, relating to words


relating to numbers and language
Tell us how many, how Describes certain attributes,
much, or how often and helps us to understand
the “why” or “how” behind
certain behaviors

Fixed and universal, ”factual” Dynamic and subjective,


open to interpretation

Gathered by measuring and Gathered through


counting things observations and interviews

Analyzed using statistical Analyzed by grouping the


analysis data into meaning themes or
categories

Example:

- Qualitative data: Male/ Female, Excellent/ Good/ Average

- Quantitative data: height, weight

5. What are the differences between missing data and noisy data? How to
solve this problem -KLinh

DIFFERENCES

MISSING DATA NOISY DATA

data that is not available, not enough when is a meaningless data that can’t be
needed interpreted by machines

Objective reasons ( does not exist at the Objective reasons ( data collection tools,
time of input, crashes, etc.) and subjective transmission errors, technology limitations,
reasons (human agent) etc.) and subjective reasons (human
agent).

fill the missing data in the step data remove the noisy data in the step data
preparation
preparation

HOW TO SOLVE:
-

6. How to prevent missing data when inputting data?

- Design a good database

- Add constraints to the form for validating the date before saving into
database

7. What is outlier? How to find the outliers in data?- Kim Ngân

● What is outlier?

Data (objects) that do not follow the general characteristics/behavior of the


dataset (object).

● How to find the outliers in data?


1. Statistical Distribution-Based
2. Distance-Based
3. Density-Based
4. Deviation-Based

You might also like