Professional Documents
Culture Documents
M191861 Data Exploration Assignment
M191861 Data Exploration Assignment
M191861 Data Exploration Assignment
Surname Sibanda
Level 4.1
Reg-number M191861
Assignment 1
1) Summary statistics
These are quantities that defines and summarizes a set of data and values.
a. Location – these are summary statistics that describes the central or typical value in a
data set. It gives the sense of the middle or center of the data set.
Fig1
Value Value MEAN = 51.1
(sorted) MODE = 42, 78
35 21 MEDIAN = 46
42 22
78 35
22 42
56 42
50 50
42 56
78 78
21 78
87 87
Standard deviation which describes the amount of variation in the given data set.
Minimum which is the smallest value in a data set.
Maximum which is the largest value in a data set.
Variance is the square of the standard deviation.
Range which is the difference between the maximum and the minimum.
Range = 87-21 = 66
Variance = 548.767
2) Visualisation
Supports the data cleaning process by finding incorrect and missing values.
For variable derivation and selection means to determine which variable to include
and discarded in the analysis.
Also play a role in combining categories as part of the data reduction process.
Box plots
Histograms
Heat maps
Charts
Tree maps
Online Analytical Processing can be defined as a set of tools and approaches to represent data from
multiple dimensions. In a broader sense, it includes a bunch of practices aimed at modelling
data/databases and creating specific analytical solutions. OLAP systems are capable of combining
classic tables in a sort of table of tables, which can be visualized as a 3D OLAP Cube for simplicity.
Examples of analysis include financial modelling, budget forecasting, production planning, and
determining broad sales and distribution trends.
OLAP databases are divided into one or more cubes. The cubes are designed in such a way that
creating and viewing reports become easy. OLAP stands for Online Analytical Processing.