Welcome to Scribd!

Upload 5

Uploaded by

0% found this document useful (0 votes)

2 views1 page

The document discusses various aspects of data extraction and preparation for analysis. It notes that data must be selected carefully based on the goals of building a predictive model and should reflect real world situations. Poor data choice or data not representative of the system can lead to inaccurate models. Data preparation, though seemingly less problematic, requires significant resources and time as data comes from different sources in various formats and must be cleaned, normalized and transformed into a consistent format for analysis. The document also defines different types of categorical and numerical data.

Original Description:

Original Title

upload5

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

2 views1 page

Upload 5

Uploaded by

Anil Arora

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 1

Search inside document

Data Extraction

Once the problem has been defined, the first step is to obtain the data in order to
perform the analysis. The data must be chosen with the basic purpose of building the
predictive model, and so data selection is crucial for the success of the analysis as well.
The sample data collected must reflect as much as possible the real world, that is, how
the system responds to stimuli from the real world. For example, if you’re using huge
datasets of raw data and they are not collected competently, these may portray false or
unbalanced situations.
Thus, poor choice of data, or even performing analysis on a dataset that’s not
perfectly representative of the system, will lead to models that will move away from the
system under study.
The search and retrieval of data often require a form of intuition that goes beyond
mere technical research and data extraction. This process also requires a careful
understanding of the nature and form of the data, which only good experience and
knowledge in the problem’s application field can provide.
Regardless of the quality and quantity of data needed, another issue is using the best
data sources.

Data Preparation
Among all the steps involved in data analysis, data preparation, although seemingly
less problematic, in fact requires more resources and more time to be completed. Data
are often collected from different data sources, each of which will have data in it with a
different representation and format. So, all of these data will have to be prepared for the
process of data analysis.
The preparation of the data is concerned with obtaining, cleaning, normalizing, and
transforming data into an optimized dataset, that is, in a prepared format that’s normally
tabular and is suitable for the methods of analysis that have been scheduled during the
design phase.
Many potential problems can arise, including invalid, ambiguous, or missing values,
replicated fields, and out-of-range data.
Data Exploration/Visualization
Exploring the data involves essentially searching the data in a graphical or statistical
presentation in order to find patterns, connections, and relationships. Data visualization
is the best tool to highlight possible patterns.

Types of Data
Data can be divided into two distinct categories:
• Categorical (nominal and ordinal)
• Numerical (discrete and continuous)
Categorical data are values or observations that can be divided into groups or
categories. There are two types of categorical values: nominal and ordinal. A nominal
variable has no intrinsic order that is identified in its category. An ordinal variable
instead has a predetermined order.
Numerical data are values or observations that come from measurements. There are
two types of numerical values: discrete and continuous numbers. Discrete values can be
counted and are distinct and separated from each other. Continuous values, on the other
hand, are values produced by measurements or observations that assume any value
within a defined range.

Datawarehousing
Document10 pages
Datawarehousing
Harshit Jain
No ratings yet
1708443470801
Document71 pages
1708443470801
Ronald Cruz
No ratings yet
Unit 1
Document27 pages
Unit 1
studyexpress12
No ratings yet
Data Mining - Prashant
Document10 pages
Data Mining - Prashant
Kunal Kubal
No ratings yet
Data Processing and Analysis
Document38 pages
Data Processing and Analysis
Surendra Nagar
100% (1)
Data Is A Collection of Facts
Document4 pages
Data Is A Collection of Facts
fartoobusy.kareem
No ratings yet
Intro Lectures To DSA
Document17 pages
Intro Lectures To DSA
rosieteelamae3
No ratings yet
Data Mining Unit-1 Notes
Document18 pages
Data Mining Unit-1 Notes
bkharthik1
No ratings yet
General Data Analyst Interview Questions
Document7 pages
General Data Analyst Interview Questions
Meetanshi Gor
No ratings yet
Data Mining
Document22 pages
Data Mining
vrkatevarapu
No ratings yet
Unit 2 FDS
Document13 pages
Unit 2 FDS
Cherry
No ratings yet
Math211101020
Document12 pages
Math211101020
Saba Shaheen
No ratings yet
Introduction To Data Mining For Business Analytics
Document51 pages
Introduction To Data Mining For Business Analytics
Sherwin Lopez
No ratings yet
Unit 2
Document24 pages
Unit 2
MEENAL AGRAWAL
No ratings yet
Data Analysis
Document7 pages
Data Analysis
Abdo Ali
No ratings yet
Statistics For Data Science - 1
Document38 pages
Statistics For Data Science - 1
Akash Srivastava
100% (1)
Dw&bi PR2,3
Document6 pages
Dw&bi PR2,3
Dhanraj Deore
No ratings yet
PTDLKT
Document11 pages
PTDLKT
Nguyễn Lê Phương Uyên
No ratings yet
Research Methodology (Data Analysis)
Document7 pages
Research Methodology (Data Analysis)
Masood Shaikh
No ratings yet
Bit
Document4 pages
Bit
nithikuttan29
No ratings yet
Quantitative Data Analysis Guide
Document6 pages
Quantitative Data Analysis Guide
Saheed Lateef Ademola
100% (1)
Important Question of Introduction of Data Science
Document10 pages
Important Question of Introduction of Data Science
harisetan13
No ratings yet
Quantitative Data Analysis Guide
Document6 pages
Quantitative Data Analysis Guide
Brahim Kaddafi
No ratings yet
Data Analysis
Document6 pages
Data Analysis
Arun Vidya
No ratings yet
Kenny-230718-Top 60+ Data Analyst Interview Questions and Answers For 2023
Document39 pages
Kenny-230718-Top 60+ Data Analyst Interview Questions and Answers For 2023
vanjchao
No ratings yet
UNIT 1 Exploratory Data Analysis
Document21 pages
UNIT 1 Exploratory Data Analysis
divyar674
No ratings yet
DMW - Unit 1
Document21 pages
DMW - Unit 1
Priya Bhalerao
No ratings yet
DEV Lecture Notes Unit I
Document72 pages
DEV Lecture Notes Unit I
prins l
No ratings yet
Data Analysis & Interpretation
Document5 pages
Data Analysis & Interpretation
vish4wish
No ratings yet
Qualitative Research
Document9 pages
Qualitative Research
suhail iqbal sipra
No ratings yet
Data Analysis in Quantitative, Qualutative, and Mix Method.
Document18 pages
Data Analysis in Quantitative, Qualutative, and Mix Method.
Nadiya Nadiya
No ratings yet
Knowledge Discovery in Databases
Document17 pages
Knowledge Discovery in Databases
Sarvesh Dharme
No ratings yet
Sample Theory & Que. - UGC NET GP-1 Data Interpretation (UNIT-7)
Document23 pages
Sample Theory & Que. - UGC NET GP-1 Data Interpretation (UNIT-7)
Harsh Vardhan Singh Hvs
100% (1)
Notes Chap 1 Business Statistics
Document6 pages
Notes Chap 1 Business Statistics
Ee Jia En
No ratings yet
Unit 2
Document58 pages
Unit 2
radhikakumbhar2978
No ratings yet
Statistical Modeling For Data Analysis
Document24 pages
Statistical Modeling For Data Analysis
fisha adane
No ratings yet
Data Science2
Document7 pages
Data Science2
Kaushal
No ratings yet
Confidential Data 23
Document14 pages
Confidential Data 23
rc
No ratings yet
03-Data Science Methodology
Document8 pages
03-Data Science Methodology
abdessalemdjoudi
No ratings yet
Demystifying Data Analysis Methods
Document4 pages
Demystifying Data Analysis Methods
baceprot z
No ratings yet
Module 1 - Introduction To Data Analytics
Document21 pages
Module 1 - Introduction To Data Analytics
Harikrishna Vallapuneni
No ratings yet
Phase Ii
Document46 pages
Phase Ii
Veera Lakshmi
No ratings yet
Data Mining and Data Profiling - Nargis Hamid Monami
Document7 pages
Data Mining and Data Profiling - Nargis Hamid Monami
Nargis Hamid Monami
No ratings yet
Data Mining
Document18 pages
Data Mining
admin ker
100% (1)
1.2 - Data Processing
Document25 pages
1.2 - Data Processing
Ranveer Sehedeva
No ratings yet
What Is Data Analysis GRP 4
Document7 pages
What Is Data Analysis GRP 4
wilson philipo
No ratings yet
Exercises 5
Document5 pages
Exercises 5
Bhuvana Eswari
No ratings yet
crimevictimservices13_2-08Web
Document4 pages
crimevictimservices13_2-08Web
vklgp144
No ratings yet
Data Analysis Section of Research Paper
Document4 pages
Data Analysis Section of Research Paper
rdssibwgf
100% (1)
Soma.35.Data Analysis - Final
Document7 pages
Soma.35.Data Analysis - Final
Amna iqbal
100% (2)
Data Preprocessing: Enhancing Data for Analysis. The Art of Preprocessing
From Everand
Data Preprocessing: Enhancing Data for Analysis. The Art of Preprocessing
Daniel Garfield
No ratings yet
What Exactly Is Data Science
Document15 pages
What Exactly Is Data Science
s6652565
No ratings yet
Confidential Data 24
Document15 pages
Confidential Data 24
rc
No ratings yet
Las q2 3is Melcweek2
Document7 pages
Las q2 3is Melcweek2
Jaylyn Marie Sapopo
No ratings yet
Research 2 Chapter 8 - 082441
Document13 pages
Research 2 Chapter 8 - 082441
Bin Dump
No ratings yet
Advanced Data Analytics Assignment
Document6 pages
Advanced Data Analytics Assignment
Olwethu N Mahlathini (Lethu)
No ratings yet
Data Analytics Basics - Unlocked
Document59 pages
Data Analytics Basics - Unlocked
Akshat
No ratings yet
Unit 2 Data Analytics
Document16 pages
Unit 2 Data Analytics
Tinku The Blogger
No ratings yet
III CS Datamining - Unlocked
Document68 pages
III CS Datamining - Unlocked
Jana Jana
No ratings yet
Decoding Data: Navigating the World of Numbers for Actionable Insights
From Everand
Decoding Data: Navigating the World of Numbers for Actionable Insights
Ryan Mitchell
No ratings yet
Spectra Bluetooth TX and RX BTI-010
Document5 pages
Spectra Bluetooth TX and RX BTI-010
PatoRMT
No ratings yet
Panasonic CCTV WJ nx400 PDF
Document2 pages
Panasonic CCTV WJ nx400 PDF
Than Htike Aung
No ratings yet
Instrumenting The World With Wireless Sensor Networks
Document4 pages
Instrumenting The World With Wireless Sensor Networks
shariifcqaadir985
No ratings yet
Create A New Standard in The SolidWorks Hole Wizard PDF
Document19 pages
Create A New Standard in The SolidWorks Hole Wizard PDF
pokosferi
No ratings yet
Hash MAC1
Document31 pages
Hash MAC1
Nitin Pandey
No ratings yet
Top 10 Userexits in SD
Document29 pages
Top 10 Userexits in SD
ahdave16
100% (3)
Yahsat Antenna Poining Guide R1a
Document16 pages
Yahsat Antenna Poining Guide R1a
Riyas Perumbadan
No ratings yet
SIM800 Series - STK - Application Note - V1.02
Document22 pages
SIM800 Series - STK - Application Note - V1.02
atulindore2
No ratings yet
ER Diagram Examples
Document2 pages
ER Diagram Examples
pranay639
100% (1)
BCA Advanced
Document43 pages
BCA Advanced
Manasvi Mehta
No ratings yet
Rel-11 Description 20120621
Document265 pages
Rel-11 Description 20120621
Bhaskar Kumar
100% (1)
AIS 4 PLANNING PHASE Topic 2 PDF
Document36 pages
AIS 4 PLANNING PHASE Topic 2 PDF
Razmen Ramirez Pinto
No ratings yet
Makeup Lab Task 1 Solution (BCSM-F18-079)
Document4 pages
Makeup Lab Task 1 Solution (BCSM-F18-079)
Waqar Ghafoor
No ratings yet
1-Intro To Computer System Servicing
Document9 pages
1-Intro To Computer System Servicing
loida mae
No ratings yet
B-Hero App: 1) Background/ Problem Statement
Document7 pages
B-Hero App: 1) Background/ Problem Statement
Marvin Lucas
No ratings yet
Scout APM Announces Python Application Support For Error Monitoring Tool
Document3 pages
Scout APM Announces Python Application Support For Error Monitoring Tool
PR.com
No ratings yet
PaperCut MF - Toshiba Embedded Manual
Document32 pages
PaperCut MF - Toshiba Embedded Manual
Justin Ross
No ratings yet
Cisco Aironet PDF
Document9 pages
Cisco Aironet PDF
Carmen Rosa Lopez
No ratings yet
What Is Identity and Access Management - Guide To IAM
Document8 pages
What Is Identity and Access Management - Guide To IAM
harsh
No ratings yet
Prathyusha
Document7 pages
Prathyusha
Neelam Yadav
No ratings yet
TL16C550CIPT
Document50 pages
TL16C550CIPT
saber
No ratings yet
Com - Rentplease.save - Loader Logcat
Document291 pages
Com - Rentplease.save - Loader Logcat
deby
No ratings yet
8 Ways To Find The IMEI Number On A Mobile Phone - WikiHow
Document7 pages
8 Ways To Find The IMEI Number On A Mobile Phone - WikiHow
izmcnorton
No ratings yet
Tutorial Windev 20
Document453 pages
Tutorial Windev 20
Damasio Mancuello Rodas
No ratings yet
75X120MM 双面彩印折页 128G铜版纸: Foldable Bluetooth Keyboard User Manual
Document1 page
75X120MM 双面彩印折页 128G铜版纸: Foldable Bluetooth Keyboard User Manual
Pa Mide
No ratings yet
Sequence Diagram For Library Management: 11481A0573
Document9 pages
Sequence Diagram For Library Management: 11481A0573
Jeffery Brennan
No ratings yet
Smart Note Taker PPT 578efd034544d
Document33 pages
Smart Note Taker PPT 578efd034544d
Shreya Reddy
No ratings yet
MB-300 Exam Questions
Document20 pages
MB-300 Exam Questions
Ocie Abshire
No ratings yet
FAQ On Printing at ComCen UTown PDF
Document3 pages
FAQ On Printing at ComCen UTown PDF
revo17
No ratings yet
TC3064en-Ed02 Installation Procedure For OmniVista8770 R5.1.16.01
Document70 pages
TC3064en-Ed02 Installation Procedure For OmniVista8770 R5.1.16.01
Jean-philippe Demolin
No ratings yet