Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

College of Hospitality and Tourism Management

Course Overview

Course No. HMPE 201


Course Code HMPE 201
Descriptive Title DATA ANALYTICS IN THE HOSPITALITY INDUSTRY
Credit Units 4
School Year / Term 2nd semester SY 2020-2021
Mode of Delivery Modular
Name of DR. DINAH F. CATAMCO, JOMARIE C. SALAR
Instructor/Professor
Course Description This course enables a student to extract meaningful information from
hospitality data, to better position the hospitality enterprise for success in
the marketplace.
Course Outcomes  Extract meaningful information from hospitality data
 Develop knowledge in managing useful data
 Develop appreciation in handling useful data into useful
information that would help the hospitality enterprise
succeed in the marketplace
 Analyse and visualize customer and hospitality data
 Forecast demand in the hospitality industry
 Discuss the importance of data analytics in the hospitality
industry
 Apply useful knowledge from hospitality data in decision
making.
SLSU Vision A high quality corporate science and technology University
SLSU Mission SLSU will produce science and technology leaders and competitive
professionals, generate breakthrough research in science and
technology based disciplines, transform and improve the quality of
life in the communities in the service area, be self – sufficient and
financially viable.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Module Guide
How to navigate this module

Hi, welcome to this module “The Basics of Data Analytics”. This module discusses the
different preparation of Beverage Products which comprise the following topics:

1. The Data
2. Visualization of data
3. Data pre-processing
Upon reading this module and answering the assessment provided to you, you will be to:

1. Determine various types of data, its characteristic, components, attributes and their
relationship
2. Define what is data visualization
3. Elucidate how data visualization generate useful information thru using various
techniques
4. Explain the different steps of data preprocessing.

All the learnings that you will acquired in this module is significant in completing all the
laboratory activities on the laboratory guide attached in this module.
The module made use of illustrative examples and visualize graphics for you to easily
understand the topics. The references used for this are the research output published on some
reputable research sites, published books and e-books, and learning materials related to Food
and beverage service operations.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

LESSON 1

THE DATA

Intended Learning Outcome

At the end of this lesson, you will be able to:

1. Describe various types of data.


2. Explain the different data attributes.
3. Identify data characteristics and components.

Now get started

Introduction

Data analytics is the science of analysing raw datasets in order to derive a conclusion
regarding the information they hold. It enables us to discover patterns in the raw data and draw
valuable information from them. Data analytics processes and techniques may use applications
incorporating machine learning algorithms, simulation, and automated systems. The systems and
algorithms work on the unstructured data for human use. These findings are interpreted and used
to help organizations understand their clients better, analyse their promotional campaigns,
customize content, create content strategies, and develop products. Data analytics help
organizations to maximize market efficiency and improve their earnings.
.
_______________________________________________________
Keywords
Database system, data warehouse, Data objects
Data attributes, patterns, association, correlation
_______________________________________________________

Let’s Learn

When this Data has so much importance in our life then it becomes important to properly
store and process this without any error. When dealing with datasets, the category of data plays
an important role to determine which preprocessing strategy would work for a particular set to
get the right results or which type of statistical analysis should be applied for the best results.
Let’s dive into some of the commonly used categories of data.

Database System

An information base framework, additionally called a data set administration framework


(DBMS), comprises of an assortment of interrelated information, known as a data set, and a bunch of
programming projects to oversee and get to the information. The product programs give instruments
to characterizing data set constructions and information stockpiling; for indicating and overseeing
simultaneous, shared, or dispersed information access; and for guaranteeing consistency and security
of the data put away in spite of framework crashes or endeavors at unapproved access. A social data

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

set is an assortment of tables, every one of which is allotted an exceptional name. Each table comprises
of a bunch of traits (segments or fields) and for the most part stores an enormous arrangement of
tuples (records or lines). Each tuple in a social table addresses an item distinguished by a special key
and portrayed by a bunch of trait esteems (Han, Kamber & Pei, 2012).

Data Warehouse

A data warehouse is a large collection of business data used to help an organization make
decisions. The concept of the data warehouse has existed since the 1980s, when it was developed
to help transition data from merely powering operations to fuelling decision support systems that
reveal business intelligence. The large amount of data in data warehouses comes from different
places such as internal applications such as marketing, sales, and finance; customer-facing apps;
and external partner systems, among others.

On a technical level, a data warehouse periodically pulls data from those apps and
systems; then, the data goes through formatting and import processes to match the data already
in the warehouse. The data warehouse stores this processed data so it’s ready for decision makers
to access. How frequently data pulls occur, or how data is formatted, etc., will vary depending on
the needs of the organization.

Framework of Data Warehouse


Source: https://copycoding.com/datawarehouse/architecture.html

Data Objects and Attribute Types

Data sets are made up of data objects. A data object represents an entity—in a sales
database, the objects may be customers, store items, and sales; in a medical database, the objects
may be patients; in a university database, the objects may be students, professors, and courses.
Data objects are typically described by attributes. Data objects can also be referred to as samples,
examples, instances, data points, or objects. If the data objects are stored in a database, they are
data tuples. That is, the rows of a database correspond to the data objects, and the columns
correspond to the attributes. In this section, we define attributes and look at the various attribute
types.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Data Attributes

An attribute is a data field, representing a characteristic or feature of a data object. The


nouns attribute, dimension, feature, and variable are often used interchangeably in the literature.
The term dimension is commonly used in data warehousing. Machine learning literature tends to
use the term feature, while statisticians prefer the term variable. Data mining and database
professionals commonly use the term attribute, and we do here as well. Attributes describing a
customer object can include, for example, customer ID, name, and address. Observed values for a
given attribute are known as observations. A set of attributes used to describe a given object is
called an attribute vector (or feature vector). The distribution of data involving one attribute (or
variable) is called univariate. A bivariate distribution involves two attributes, and so on.

The type of an attribute is determined by the set of possible values—nominal, binary,


ordinal, or numeric—the attribute can have. In the following subsections, we introduce each type.

 Nominal Attribute
Nominal means “relating to names.” The values of a nominal attribute are
symbols or names of things. Each value represents some kind of category,
code, or state, and so nominal attributes are also referred to as categorical.
The values do not have any meaningful order.

Example: Marital Status, Country, Gender, Race, Hair Colour

 Ordinal Attribute
An ordinal attribute is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude between
successive values is not known.

Example: Grade, Educational Level, Satisfaction Level, Socio-economic


status, Income

 Numeric attribute
A numeric attribute is quantitative; that is, it is a measurable quantity,
represented in integer or real values. Numeric attributes can be interval-
scaled or ratio-scaled.
 Interval-Scaled Attributes
Interval-scaled attributes are measured on a scale of
equal-size units. The values of interval-scaled attributes
have order and can be positive, 0, or negative. Thus, in
addition to providing a ranking of values, such attributes
allow us to compare and quantify the difference between
values.

 Ratio-Scaled Attributes
A ratio-scaled attribute is a numeric attribute with an
inherent zero-point. That is, if a measurement is ratio-scaled,
we can speak of a value as being a multiple (or ratio) of
another value. In addition, the values are ordered, and we can
also compute the difference between values, as well as the
mean, median, and mode.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Patterns, Association and Correlation

In organizing data, there are pattern or trend can be formed and drawn out from the
organized data.

Frequent patterns, as the name suggests, are patterns that occur frequently in data. There
are many kinds of frequent patterns, including frequent item sets, frequent subsequence’s (also
known as sequential patterns), and frequent substructures. A frequent item set typically refers to
a set of items that often appear together in a transactional data set—for example, milk and bread,
which are frequently bought together in grocery stores by many customers. A frequently
occurring subsequence, such as the pattern that customers, tend to purchase first a laptop,
followed by a digital camera, and then a memory card, is a (frequent) sequential pattern. A
substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with item sets or subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. Mining frequent patterns leads to the discovery of interesting
associations and correlations within data.

Think what is the


picture says!

Association of Data

According to IBM a data association is a user-defined grouping of related groups and


elements. It can consist of one or more groups along with some or all of the elements within
those groups.

Correlation of data
This means that data moves in coordination with another.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Let’s sum up

A data warehouse is a large collection of business data used to help an


organization make decisions.
Nominal means “relating to names.”
Data sets are made up of data objects
An information base framework, additionally called a data set administration
framework (DBMS)

Let’s assess what have you learned in this lesson

Learning Assessment Task 2.1.1

Identification. Students will be given with 10 items identification covering all the topics under
this lesson. They will rated based on their correct answer.
Instruction: Identify the answer on the scrambled words in the box. Please write your identified
answer on the spaces provided before the number.
Scrambled Words

_________________________ 1. The values of this attributes is symbol or name.


_________________________ 2. An attribute with inherent zero point.
_________________________ 3. A large collection of data.
_________________________ 4. Also known as data trend.
_________________________ 5. Data moves in coordination with another.
_________________________ 6. An attribute where data are measured in equal scale.
_________________________ 7. An attribute with possible values that have a meaningful order
________________________ 8. A data object represents an entity in a database.
________________________9. A set of data held in a computer.
_______________________ 10. Grouping of related data.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Learning Assessment Task 2.2.1

Describe and Explain. You need to describe the four data attributes and give at least five
example for each attribute. Then, give explanation why such example belongs to that attribute.

For the description you will be rated with the writing rubric 1-5 or poor to excellent.
For your explanation you will be rated with the writing rubric 1-5 or poor to excellent.

Write your answer here.

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY


_________________________________________

Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.

Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.

Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

LESSON 2

DATA VISUALIZATION

Intended Learning Outcome

At the end of this lesson, you will be able to:

1. Visualize data in a 2D scatterplot.


2. Elucidate how data visualization generate useful information.

Now get started


Introduction

With so much information being collected through data analysis in the business
world today, each must have a way to paint a picture of that data so we can interpret it. Data
visualization gives a clear idea of what the information means by giving it visual context through
maps or graphs. Data visualization can help by delivering data in the most efficient way possible.
As one of the essential steps in the business intelligence process, data visualization takes the raw
data, models it, and delivers the data so that conclusions can be reached. In advanced analytics,
data scientists are creating machine learning algorithms to better compile essential data into
visualizations that are easier to understand and interpret.

_______________________________________________________
Keywords
Pixel-oriented visualization, geometric projection visualization
Icon based visualization, hierarchal visualization
_______________________________________________________

Let’s Learn

Data visualization uses visual data to communicate information in a manner that is


universal, fast, and effective. This practice can help companies identify which areas need to be
improved, which factors affect customer satisfaction and dissatisfaction, and what to do with
specific products (where should they go and who should they be sold to). Visualized data gives
stakeholders, business owners, and decision-makers a better prediction of sales volumes and
future growth.

Pixel oriented visualization techniques. The task of the knowledge discovery and data
mining process is to extract knowledge from data such that the resulting knowledge is useful in a
given application. Obviously, only the user can determine whether the resulting knowledge
satisfies this requirement. Moreover, what one user may find useful is not necessarily useful to
another user.

Figure 1. Pixel oriented visualization attributes


Source:
https://www.slideshare.net/phakhwan22/02
-data-41812563

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Geometric Projection Visualization Techniques. A drawback of pixel-oriented visualization


techniques is that they cannot help us much in understanding the distribution of data in a
multidimensional space. For example, they do not show whether there is a dense are in a
multidimensional subspace. Geometric projection techniques help users find interesting
projections of multi-dimensional data sets. The central challenge the geometric projection
techniques try to address is how to visualize a high-dimensional space on a 2-D display.
A scatter plot displays 2-D data points using Cartesian coordinates. A third dimension can
be added using different colours or shapes to represent different data points.

Figure 2. Visualization of 2D data using scatterplot


Source: http://www.industrial-electronics.com/data-mining_2b.html

A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses colour,
it can display up to 4-D data points.

A 3D Scatterplot
Figure 3. Visualization of 3D
scatterplot
Source: http://www.industrial-
electronics.com/data-
mining_2b.html

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

The scatter-plot matrix technique is a useful extension to the scatter plot. For an n
dimensional data set, a scatter-plot matrix is an n × n grid of 2-D scatter plots that provides a
visualization of each dimension with every other dimension.

Figure 4. Visualization of the Iris data set using a scatter-plot matrix. Source:
http://support.sas.com/
documentation/cdl/en/grstatproc/61948/HTML/default/images/gsgscmat.gif

To visualize n-dimensional data points, the parallel coordinates technique draws n equally
spaced axes, one for each dimension, parallel to one of the display axes.

Figure 5. Visualization that uses parallel coordinates. Source: www.stat.columbia.edu/∼cook/


movabletype/archives/2007/10/parallel coordi.thml.
Hierarchal visualization techniques. Hierarchical data is data that can be arranged in the form
of a tree. Each item of data defines a node in the tree, and each node may have a collection of other
nodes as child nodes. The relationship between the parent nodes and the child nodes forms a tree

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

network. The formal definition of a tree is that the graph formed by the nodes and edges (defined
between parent and child node) is both connected and contains no cycles.
The following properties of a tree are of more practical use from the point of view of displaying
visualizations:
 One node, called the root node, has no parent.
 All other nodes have exactly one parent.
 Nodes with no children are termed leaf nodes. Nodes with children are
termed interior nodes.

For all nodes in a tree, there is a single unique path up the tree going from parent to pare
“Worlds-within-Words, “also known as n-vision, is a representative hierarchical visualization
method.

Figure 6. “Worlds-within-Worlds” (also


known as n-Vision). Source:
http://graphics.cs.columbia.edu/
projects/AutoVisual/images/1.dipstick.5.gif

Another example of hierarchical visualization methods, tree-maps display


hierarchical data as a set of nested rectangles. All news stories are organized into seven
categories, each shown in a large rectangle of a unique color. Within each category (i.e., each
rectangle at the top level), the news stories are further partitioned into smaller subcategories.

Figure 7. Newsmap: Use of tree-maps to


visualize Google news headline stories.
Source: www.cs.umd.
edu/class/spring2005/cmsc838s/viz4all/s
s/newsmap.png.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Visualizing complex data and relations. There are many new visualization techniques
dedicated to these kinds of data. For example, many people on the Web tag various objects such
as pictures, blog entries and product reviews. A tag cloud is a visualization of statistics of user-
generated tags. Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order.

Figure 8. Using a tag cloud to visualize popular Web site tags. Source: A snapshot of
www.flickr.com/ photos/tags/, January 23, 2010
Icon-based Visualization Techniques. Use small icons to represent multidimensional data
values. We look at two popular icon-based techniques: Chernoff faces and stick figures.

 Chernoff faces make use of the ability of the human mind to


recognize small differences in facial characteristics and to
assimilate many facial characteristics at once.
 Stick figure visualization technique maps multidimensional
data to five-piece stick figures, where each figure has four
limbs and a body.

Figure 9. Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18).

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

To really understand how we get information thru visualization, let us answer the think in a
minute.

Think in a minute!
Look at the data in the scatterplot. Tell me what you can see.

0.75

0.50
International tourist arrivals

0.25

0.00

-0.25

-0.50
0.0 0.5 1.0 1.5 2.0 2.5
Carbon dioxide emission

This is a data of international tourist arrivals and Carbon dioxide emission of a group
of country.
Write answer here.

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
____________________________________________________________________________________________________________.
Let’s take a look on your answer.

“One information can we get from the scatterplot (2D) is that as carbon dioxide increases,
international tourist arrivals is sporadic and later on will drop down to nearly 0% as carbon
dioxide of a given country increases to 100%”..

See, out from the pattern or trend of the points in scatterplot we can generate a useful
information.

Interpreting data in a 2D scatterplot


Please remember that in scatterplot, there are two axis. The X and Y axis. The X axis is the
independent variable and the Y axis is the independent variable. In interpreting the data trend in
scatterplot, you can start reading the data from left to right. In the case above, we interpret the data
by reading from 0.00 Carbon dioxide emission to right 1.00 and so forth.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Take this another example: Weather temperature vs. Cup of coffee


Weather Temperature Cup of coffee
(X) (Y)
5 3
10 7
15 10
20 15
25 18

Here is the scatterplot for the weather temperature vs. cup of coffee
20
18
16
14
Cup of Coffee

12
10
8 Cup of coffee
6
4
2
0
0 5 10 15 20 25 30
Weather temperature

As we can see from the trend in the scatterplot, the cup of coffee increases as the weather
temperature increases.

Remember: You interpret a scatterplot by looking for trends in the data as you go from
left to right:

 If the data show an uphill pattern as you move from left to right, this indicates a positive
relationship between X and Y. As the X-values increase (move right), the Y-values tend to
increase (move up).

 If the data show a downhill pattern as you move from left to right, this indicates a negative
relationship between X and Y. As the X-values increase (move right) the Y-values tend to
decrease (move down).

 If the data don’t seem to resemble any kind of pattern (even a vague one), then no relationship
exists between X and Y.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Let’s sum up

Icon-based Visualization Techniques uses small icons to represent


multidimensional data values.
A tag cloud is a visualization of statistics of user-generated tags. Often, in a tag
cloud, tags are listed alphabetically or in a user-preferred order.
Hierarchical data is data that can be arranged in the form of a tree.
Data visualization uses visual data to communicate information in a manner
that is universal, fast, and effective.

Let’s assess what have you learned in this lesson

Learning Assessment Task 2.2.1

Illustration. You will be given with sets of data and illustrate the data on a 2D scatterplot. You
may draw the scatterplot on the box below. You will be rated based on the correct data points.
Please follow the steps below in placing the data sets in the scatter plot.

Follow these simple steps:


1. First, find the value for x on the x-axis.
2. Next, find the y-value
3. Your point should be plotted at the intersection of x and y.
4. Finally, plot the point on your graph at the appropriate spot.

Table 1. Data sets of Hotel Room supply Vs. Hotel Room demand

Name of Hotel Hotel room demand (Y) Hotel room supply (X)
In % In %
A 10 6
B 9 9
C 5 10
D 9 13
E 7 5
F 12 10
G 6 15
H 4 10
I 2 9
J 15 10

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Draw the scatterplot here.

Learning Assessment Task 2.2.2

Learning Assessment Task 2.2.2


Interpretation. Based on your answer in Learning Assessment Task 2.2.1, write your
observation and interpretation on the data sets being plot in the scatterplot. You will be rated
based on the writing rubric with a rating scale of 1-5 (poor-excellent). Write your answer here.

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY


_________________________________________

Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.

Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.

Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.

Learning Assessment Task 2.2.3

Explanation. Based on your experience in plotting the data sets in a 2D scatterplot explain how
useful information can be generated from the data sets. You will be rated based on the writing
rubric with a rating scale of 1-5 (poor-excellent).

Write your answer here.


______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY


_________________________________________

Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.

Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.

Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

LESSON 3

DATA PREPROCESSING

Intended Learning Outcome

At the end of this lesson, you will be able to:

1. Explain the different steps of data preprocessing.

Now get started

Introduction

Data preprocessing is data mining technique that involves transforming


raw data into an understandable format. Real-world data is often incomplete, inconsistent,
and/or lacking in certain behaviours or trends, and is likely to contain many errors. Data
preprocessing is a proven method of resolving such issues. When using data, most people agree
that your insights and analysis are only as good as the data you are using. Essentially, garbage
data in is garbage analysis out. Data cleaning, also referred to as data cleansing and data
scrubbing, is one of the most important steps for your organization if you want to create a culture
around quality data decision-making.

_______________________________________________________
Keywords
Data cleaning, Data preprocessing
Data reduction, Data transformation
_______________________________________________________

Let’s Learn

To make the process easier, data preprocessing is divided


into four stages: data cleaning, data integration, data reduction,
and data transformation
Data cleaning. It is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data
within a dataset. When combining multiple data sources, there are
many opportunities for data to be duplicated or mislabeled. If data
is incorrect, outcomes and algorithms are unreliable, even though
they may look correct.

Data Integration. It is a data preprocessing technique that


involves combining data from multiple heterogeneous data
sources into a coherent data store and provide a unified view of
the data. These sources may include multiple data cubes,
databases or flat files.
HMPE 201- Data Analytics in the Hospitality Industry
College of Hospitality and Tourism Management

Data reduction. The method of data reduction may achieve a condensed description of the
original data which is much smaller in quantity but keeps the quality of the original data.
Methods of data reduction: These are explained as following below.

1. Data Cube Aggregation. This technique is used to aggregate data in a simpler form. For
example, imagine that information you gathered for your analysis for the years 2012 to 2014,
that data includes the revenue of your company every three months. They involve in the annual
sales, rather than the quarterly average, So it can summarize the data in such a way that the
resulting data summarizes the total sales per year instead of per quarter. It summarizes the
data.
2. Dimension reduction. Whenever it come across any data which is weakly important, then
we use the attribute required for our analysis. It reduces data size as it eliminates outdated or
redundant features.

Step-wise Forward Selection. The selection begins with an empty set of attributes later on
we decide best of the original attributes on the set based on their relevance to other
attributes. We know it as a p-value in statistics.

Step-wise Backward Selection. This selection starts with a set of complete attributes in the
original data and at each point, it eliminates the worst remaining attribute in the set.
Suppose there are the following attributes in the data set in which few attributes are
redundant.
Combination of forwarding and Backward Selection –
It allows us to remove the worst and select best attributes, saving time and making the
process faster.

Data Compression. The data compression technique reduces the size of the files using different
encoding mechanisms (Huffman Encoding & run-length Encoding).

There are two types based on their compression techniques.

1. Lossless Compression. Encoding techniques (Run Length Encoding) allows a simple


and minimal data size reduction. Lossless data compression uses algorithms to restore
the precise original data from the compressed data.
2. Lossy Compression. Methods such as Discrete Wavelet transform technique, PCA
(principal component analysis) are examples of this compression. For e.g., JPEG image
format is a lossy compression, but we can find the meaning equivalent to the original
the image. In lossy-data compression, the decompressed data may differ to the original
data but are useful enough to retrieve information from them.

Numerosity Reduction. In this reduction technique the actual data is replaced with
mathematical models or smaller representation of the data instead of actual data, it is
important to only store the model parameter. Or non-parametric method such as clustering,
histogram, sampling.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Data Transformation. The data are transformed or consolidated so that the resulting mining
process may be more efficient, and the patterns found may be easier to understand. Data
discretization, a form of data transformation.

In data transformation, the data are transformed or consolidated into forms appropriate for
mining. Strategies for data transformation include the following:

1. Smoothing, which works to remove noise from the data. Techniques include binning,
regression, and clustering.

2. Attribute construction (or feature construction), where new attributes are constructed .

Discretization & Concept Hierarchy Operation. Techniques of data discretization are used
to divide the attributes of the continuous nature into data with intervals. We replace many
constant values of the attributes by labels of small intervals.

This means that mining results are shown in a concise, and easily understandable way.

1. Top-down discretization. Consider one or a couple of points (so-called breakpoints or


split points) to divide the whole set of attributes and repeat of this method up to the end,
then the process is known as top-down discretization also known as splitting.
2. Bottom-up discretization. Consider all the constant values as split-points, some are
discarded through a combination of the neighborhood values in the interval, that process
is called bottom-up discretization

Let’s sum up

Techniques of data discretization are used to divide the attributes of the


continuous nature into data with intervals.
The data compression technique reduces the size of the files using different
encoding mechanisms.
Data Integration is a technique that involves combining data from multiple
heterogeneous data sources into a coherent data store and provide a unified
view of the data.
Data cleaning is the process that removes data that does not belong in your
dataset.
Data transformation is the process of converting data from one format or structure
into another.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

Let’s assess what have you learned in this lesson

Learning Assessment Task 2.3.1


Search and Discuss. You are task to search on the internet one hotel establishment in Pasay City
and the number of rooms available for the guests. Moreover, you need also to find out 2020
foreign tourist arrivals in the Philippines, particularly in Pasay City. Then apply the preprocessing
steps of the data that you learn from this lesson to the data sets that you have. Then explain the
following:
 What you have encountered during the process of addressing the requirements?
 How did you apply the steps in pre-processing the data on the data sets that you have
search.
 Why is it important to preprocess the data
 What have you learned in this lesson.
For each question you will be rated with the writing rubric with the scale of 1-5 (poor-excellent).

Write your answer here.


______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY


_________________________________________

Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.
Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.

Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.

HMPE 201- Data Analytics in the Hospitality Industry


College of Hospitality and Tourism Management

References:
1. Rumsey,D. Statistics Workbook For Dummies, Statistics II For Dummies, and Probability For
Dummies.
2. Han, J. et.al. (2012). Data Mining. Concepts and Techniques.Morgan Kaufinnan
Publishers.
3. Kelly A. McGuire (2016).The Analytic Hospitality Executive: Implementing Data
Analytics in Hotels and Casinos
4. Rodrigues, JP., Sousa, MJ. (2020). Systematic literature review on hospitality analytics.
International Journal of Business Intelligence Research. Volume 11, Issue #2.
5. Shereni, N. C., & Chambwe, M. (2019). Hospitality Big Data Analytics in Developing
Countries. Journal of Quality Assurance in Hospitality & Tourism, 21(3), 361–369.
https://doi.org/10.1080/1528008x.2019.1672233
6. Rodrigues, J. P., Sousa, M. J., & Brochado, A. (2020). A Systematic Literature
Review on Hospitality Analytics. International Journal of Business Intelligence
Research, 11(2), 47–55. https://doi.org/10.4018/ijbir.20200701.oa2
7. Gupta, K., Gauba, T., & Jain, S. (2020). Big data in Hospitality Industry: A Survey.
International Research Journal of Engineering and Technology. 11 (4). e-ISSN: 2395-0056

HMPE 201- Data Analytics in the Hospitality Industry

You might also like