Ananlytics Notes

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Four types of data-

Nominal, Ordinal, Interval and ratio

Nominal- categorize in no order. In nominal form of data there is no order, no meaningful difference
and there is no absolute zero.

Ordinal- Has something meaningful order to it like 1,2,3 or good, excellent, bad etc. It is ranking and
rating. There is no absolute zero in ordinal data.

Interval- There is something meaning full in the data even when it is zero. Ex- 0-degree water temp
does not mean nothing exists but it means that the water is frozen.

The only difference between interval and ratio is that in ratio there is absolute zero.

Nominal Ordinal Interval Ratio


Ordered N Y Y Y
Difference N N Y Y
Absolute zero N N N Y
Example Red, blue Good, bad Temperature Length, Weight
Central tendency Mode Mode, Median Mode, Median, Mode, Median,
measurement Mean Mean

Descriptive analysis-
Types of summaries that can be obtained through descriptive analysis-

1.) Distribution- Distribution shows us the frequency of different outcomes ( or data points ) ina
population or sample. It can be represented as numbers in a list, table or graph. Ex- list
showing no. of those with different hair color.

Using visualizations is common practice in descriptive statistics.

Descriptive statistics

Central tendency Variability

Mean Range

Median Standard Deviation

Mode Interquartile Range


Skewness-

Negative Skewness or Left Skew-

Left tail is longer

Positive Skewness or Right Skew-

Right tail is longer

Value of Skewness (Rule of Thumb)

Skew= 0 means perfect symmetry

Skew between 0 and +/- 0.5 means approx. symmetry

Skew between +/- 0.5 and 1 is moderately symmetry.

Skew between +1 or less than -1 means highly skewed.

Kurtosis (less than 3 is ok)-


Normal distribution has Kurtosis 0

Kurtosis <0 means peak is short and broad, tails are shorter

Kurtosis >0 means peak is higher and thinner, tails are longer

The sample kurtosis is useful measure of weather there is a problem with outliers in a data set.
Larger kurtosis indicates a more serious outlier problem, and may lead the research to choose
alternative statistical methods.

Data Warehouse-
A data warehouse is process for collecting and managing data from varied sources to
provide meaningful business insight.
It is typically used to connect and analyze data from heterogeneous data ( different
sources).
It is a blend of technologies and components which aids the strategic use of data.
It is electronic storage of a large amount of information by a business.
It is a process of transforming data into information and making it available to users in
timely manner to make a difference.
Data warehouse is not a product but an environment.
It is an architectural construct of an information system

 Which provides users with current and


 Historical decision support information.

How data warehouse works?

A data mart is a data storage system that contains information specific to an organizational
business unit. It contains small and selected part of data that the company stores in a larger
storage system. Company uses a data mart to analyze department-specific information
more efficiently.

Supplier database
Customer
database

Data warehouse

Sales database

Data
Mart

Data mart
Financial Manger

Marketing
1) A data warehouse works as a central repository where information arrives from one or more
data sources.

2) Data flows into a data warehouse from the transactional system and other relational databases.

3) Data may be: Structured, semi-structured and Unstructured data.

Benefits of Transactional Processing system-


1) Operations handling- multiple transaction with simple and no error.
2) Untapped markets- Reach and gain customers around the world.
3) Online Transaction systems- Easy to use and very efficient for online shoppers.

Types of data warehouse-

Enterprise data warehouse (EDW)

Operational data source

Data mart

Enterprise data warehouse-

It is a centralized warehouse.

It provides decision support services across the enterprise.

It offers a unified approach for organizing and representing data.

It also provides the ability to classify data according to the subject and give access accordingly.

Operational data source-

Are nothing but data store required when neither data warehouse nor OLTP systems support
organizations reporting needs.

Here data warehouse is refreshed in real time.

It is widely preferred for routine activities like storing record of employees.

Data mart-

Subset of data warehouse

Specially designed for a particular line of business such as sales, finance.

In an independent data mart, data can directly be collected from sources.


OLTP

Enables the real time execution of large no. of database transactions by large no. of people.

Typical over the internet.

A database transaction is a change , insertion, deletion or query of data

OLAP

Is a technology that organizes large business database and supports complex analysis.

It can be used to perform complex analytical queries without negatively affecting transaction
systems.

You might also like