Pre 3

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 2

Big Data: Types of data used in Analytics

Data types involved in Big Data analytics are many like Structured Data, Semi structured data
and Unstructured Data

Structured Data

Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data.

A Student table in a database is an example of Structured Data

Student_ID  Student Name  Gender  Department 


2365  Rajesh Kulkarni  Male  Finance
3398  Pratibha Joshi  Female  Marketing
7465  Shushil Roy  Male  HR
7500  Shubhojit Das  Male  Finance 
7699  Priya Sane  Female  Finance 

Characteristics of Structured Data:

 Data conforms to a data model and has easily identifiable structure


 Data is stored in the form of rows and columns
Example : Database
 Data is well organized so, Definition, Format and Meaning of data is explicitly known
 Data resides in fixed fields within a record or file
 Similar entities are grouped together to form relations or classes
 Entities in the same group have same attributes
 Easy to access and query, So data can be easily used by other programs
 Data elements are addressable, so efficient to analyze and process

Unstructured Data

Any data with unknown form or the structure is classified as unstructured data. Unstructured
data is information that either does not have a pre-defined data model or is not organized in a
pre-defined manner. One of the most common types of unstructured data is text. Unstructured
text is generated and collected in a wide range of forms, including Word documents, email
messages, PowerPoint presentations, survey responses, transcripts of call center interactions, and
posts from blogs and social media sites. Other types of unstructured data include images, audio
and video files.

Characteristics of Unstructured Data


 Data neither conforms to a data model nor has any structure.
 Data can not be stored in the form of rows and columns as in Databases
 Data does not follows any semantic or rules
 Data lacks any particular format or sequence
 Data has no easily identifiable structure
 Due to lack of identifiable structure, it cannot used by computer programs easily

Examples of Unstructured Data:

 Web pages
 Images (JPEG, GIF, PNG, etc.)
 Videos
 Memos
 Reports
 Word documents and PowerPoint presentations
 Surveys

Semi structured Data

Semi-structured data is the data which does not conforms to a data model but has some
structure

Characteristics of semi-structured Data


 Data does not conform to a data model but has some structure.
 Data can not be stored in the form of rows and columns as in Databases
 Semi-structured data contains tags and elements (Metadata) which is used to group data
and describe how the data is stored
 Similar entities are grouped together and organised in a hierarchy
 Entities in the same group may or may not have the same attributes or properties
 Does not contains sufficient metadata which makes automation and management of data
difficult
 Size and type of the same attributes in a group may differ
 Due to lack of a well defined structure, it can not used by computer programs easily

Examples of Semi – structured Data


 E-mails
 XML and other markup languages
 TCP/IP packets
 Zipped files
 Integration of data from different sources
 Web pages

Dr. Preyal Sanghavi


Associate Professor
RBIMS

You might also like