M191861 Data Exploration Assignment

Name Ashton Takudzwa
Surname Sibanda
Level 4.1
Reg-number M191861
Programme Information Systems
Module Data Mining
Course code ISH414
Assignment 1
1) Summary statistics
These are quantities that defines and summarizes a set of data and values.
Summary statistics measures:
a. Location – these are summary statistics that describes the central or typical value in a
data set. It gives the sense of the middle or center of the data set.
Types of location are:
 Mean which is the average of the values in a data set.

 Median which the middle value if the data in the data set is sorted first where half of the
values are less than the median and half of the values are greater than the median.
 Mode which is the value that is repeated the most in a data set.
Example of LOCATION statistics:
Fig1
Value Value MEAN = 51.1
(sorted) MODE = 42, 78
35 21 MEDIAN = 46
42 22
78 35
22 42
56 42
50 50
42 56
78 78
21 78
87 87
b. Spread describes how dispersed or varied data is
Types of spread are:
 Standard deviation which describes the amount of variation in the given data set.
 Minimum which is the smallest value in a data set.
 Maximum which is the largest value in a data set.
 Variance is the square of the standard deviation.
 Range which is the difference between the maximum and the minimum.
Example of spread using fig1
Range = 87-21 = 66
Variance = 548.767
Standard deviation = 23.426
c. Shape describes the shape of distribution for a given set of values
Types of shape are:
 Skewness describes whether data values are asymmetrically distributed.
2) Visualisation
Data visualization is a graphical representation of information and data. By using visual

elements like charts, graphs, and maps, data visualization tools provide an accessible way to see
and understand trends, outliers, and patterns in data.
The uses of Data Visualization as follows.
 Powerful way to explore data with presentable results.
 Primary use is the pre-processing portion of the data mining process.
 Supports the data cleaning process by finding incorrect and missing values.
 For variable derivation and selection means to determine which variable to include
and discarded in the analysis.
 Also play a role in combining categories as part of the data reduction process.
Data Visualization Techniques
 Box plots
 Histograms
 Heat maps
 Charts
 Tree maps
 Word Cloud/Network diagram
An example of visualisation is a pie chart
3) Online analytical processing (OLAP)
Online Analytical Processing can be defined as a set of tools and approaches to represent data from
multiple dimensions. In a broader sense, it includes a bunch of practices aimed at modelling
data/databases and creating specific analytical solutions. OLAP systems are capable of combining
classic tables in a sort of table of tables, which can be visualized as a 3D OLAP Cube for simplicity.
Examples of analysis include financial modelling, budget forecasting, production planning, and
determining broad sales and distribution trends.
OLAP databases are divided into one or more cubes. The cubes are designed in such a way that
creating and viewing reports become easy. OLAP stands for Online Analytical Processing.
Types of OLAP Servers
We have four types of OLAP servers −
 Relational OLAP (ROLAP)

ROLAP servers are placed between relational back-end server and client front-end tools. To
store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.
 Multidimensional OLAP (MOLAP)

MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With
multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore,
many MOLAP server use two levels of data storage representation to handle dense and sparse data
sets.
 Hybrid OLAP (HOLAP)
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP
and faster computation of MOLAP. HOLAP servers allows to store the large data volumes of detailed
information. The aggregations are stored separately in MOLAP store.
 Specialized SQL Servers
Specialized SQL servers provide advanced query language and query processing support for SQL
queries over star and snowflake schemas in a read-only environment.

M191861 Data Exploration Assignment

Uploaded by

Copyright:

Available Formats

You might also like

M191861 Data Exploration Assignment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M191861 Data Exploration Assignment

Uploaded by

Copyright:

Available Formats

Name Ashton Takudzwa

Programme Information Systems

Module Data Mining

Course code ISH414

Summary statistics measures:

Types of location are:

 Mean which is the average of the values in a data set.

Example of LOCATION statistics:

b. Spread describes how dispersed or varied data is

Types of spread are:

Example of spread using fig1

Standard deviation = 23.426

c. Shape describes the shape of distribution for a given set of values

Types of shape are:

 Skewness describes whether data values are asymmetrically distributed.

Data visualization is a graphical representation of information and data. By using visual

The uses of Data Visualization as follows.

 Powerful way to explore data with presentable results.

 Primary use is the pre-processing portion of the data mining process.

Data Visualization Techniques

 Word Cloud/Network diagram

An example of visualisation is a pie chart

3) Online analytical processing (OLAP)

Types of OLAP Servers

We have four types of OLAP servers −

 Relational OLAP (ROLAP)

 Multidimensional OLAP (MOLAP)

You might also like