Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

68 Data Mining and Data Warehousing in the Airline Industry

Data Mining and Data Warehousing in the Airline Industry

Mark Revels
Western Kentucky University

Hélène Nussbaumer
Embry-Riddle Aeronautical University

Organizations are constantly looking to enhance their decision-making activities in order to


improve business processes and build a competitive advantage. Each day they collect and store
large amounts of data that may be analyzed to reduce costs, increase revenues, improve
efficiencies, and predict future trends and customer behaviors. Data mining, which is the
automated extraction of predictive information from large databases, helps connect large
volumes of this heterogeneous data and allows organizations to analyze it from multiple
perspectives. Designed for query and analysis rather than transaction processing, a data
warehouse is a relational database that centralizes data coming from multiple sources. It
translates information into common models, names, and definitions while also providing a mean
to make information available for decision making. Although data mining and data warehousing
are powerful tools for organizations they can present several challenges. The airline industry
collects and stores large amount of heterogeneous data from a wide variety of sources. Studying
the successes and failures of this industry to conduct data mining and data warehousing
activities as airlines struggle in an increasingly competitive environment can be beneficial to
other economic sectors as well.

Keywords: Data analysis, competitive advantage, prediction analysis, airline industry

Introduction
In a global environment, organizations continually strive to enhance their decision-making
activities in order to improve business processes and retain a competitive advantage. Although
organizations collect and store a large amount of information on a daily basis, this data comes
from multiple disparate sources, thus making it difficult to aggregate and review. Yet to
compete in today’s highly competitive environment organizations need ready access to timely
and complex analysis on an aggregated view of quality data (Kleissner, 1998). In recent years
advances in technology and business processes, improved data management, increased in
information availability, and decreased storage costs have facilitated the development of data
mining and data warehousing, both of which can support an organization’s efforts to efficiently
utilize their data.

The automated process of first extracting the data, then analyzing it from different perspectives,
and finally summarizing it into useful information is known as data mining. Data mining helps
connect a large volume of heterogeneous data and provides organizations with knowledge they
can exploit to predict future trends and behaviors, decrease costs, increase revenues, and improve
processes. Data warehousing, on the other hand, is the process of centralized data management

Electroniccopy
Electronic copy available
available at:
at:https://ssrn.com/abstract=2519737
http://ssrn.com/abstract=2519737
Academy of Business Research Journal, Vol. III, 2013 69

and retrieval. It helps transform vast amount of data into useful and reliable information that
organizations can leverage to help them remain efficient and competitive. Thus, data
warehouses not only provide the foundation for powerful data analysis techniques such as data
mining, but it also empowers users with the information they need for decision-making activities.

While data mining and data warehousing can be powerful tools for organizations, they also
present challenges. The outcome of data mining is only as good as the data analyzed or the user’s
ability to understand and interpret the information. Furthermore, the implementation of a data
warehouse is often a long and resource-intensive process. Many organizations may not have the
necessary expertise to setup and then maintain a data warehouse. Also, systems integration, data
integrity and security, and other limitations affect the ability of the organization to utilize the
information to improve their decision-making activities. Finally, the cost associated with the
infrastructure needed for data management may be prohibitive for some organizations.

As both the amount of information available for collection and storage capabilities increase,
organizations may turn to newer technologies such as cloud computing to provide them with the
flexible, dynamic, and elastic infrastructure they need while reducing their overall cost.
Moreover, companies may enhance their decision-making activities and increase their
economical advantage against competitors by tapping into under-used sources of information
such as text and knowledge.

The airline industry runs multiple types of operations simultaneously, such as customer service,
baggage handling, flight scheduling, ticket sales, and overall business management. As a result,
airlines collect and store large amount of heterogeneous data from a wide variety of sources that
they can leverage to identify opportunities to improve processes, reduce costs, and increase
revenues. However, older legacy systems, complex interactions between business partners,
multiple mergers and acquisitions, and systems interoperability challenges may impact their
ability to successfully leverage critical information. This can affect their ability to compete in a
highly competitive environment.

In this paper, we will first define and explain data mining and data warehousing and how airlines
can utilize them. We will then discuss issues related to data mining and data warehousing and
how they affect the airline industry. We will also look at a data warehouse implementation
success story. Finally, we will consider future technologies and address how airlines can benefit
from them.

Data Mining and Data Warehousing

Data mining is the process of objectively exploring and analyzing large quantities of data that are
previously being collected (West, 2008). Through the use of automated tools, data mining serves
to uncover hidden patterns that would otherwise have remained unknown. Data mining uses a
wide variety of mathematical tools such as regression, cluster analysis, and decision trees in
order to find meaningful patterns and trends that can be leveraged to support business decisions.
Thus data mining does not entail just the collection and management of data but it also consists
of analysis and prediction (Seifert, 2006).

Electroniccopy
Electronic copy available
available at:
at:https://ssrn.com/abstract=2519737
http://ssrn.com/abstract=2519737
70 Data Mining and Data Warehousing in the Airline Industry

Data warehousing provides a centralized repository for corporate data and information assets. A
data warehouse is not identical to the organization’s database used for transaction processing. A
data warehouse may be defined as a relational database that is designed for query and analysis
rather than for transaction processing. This process of centralized data management and retrieval
rely on data warehouses, which is defined as a subject-oriented, integrated, time-variant, non-
volatile collection of data in support of management's decision making process. In other words,
a data warehouse is organized by subjects, not by applications and is constructed by integrating
multiple data sources. It maintains both historical and current data and once the data is loaded, it
cannot be modified.

How Does Data Mining Work and What Can it Do?

Data mining uses sophisticated mathematical algorithms to automatically and systematically


analyze large amount of data to find relationships and evaluate the probability of future events.
Based on users’ open-ended queries, data mining software facilitate the knowledge discovery
process by analyzing relationships and patterns in stored transaction data. Thus, the first step in
the data mining process is the collection of information and data, usually through the use of a
data warehouse. However, collecting data is not enough; business users need to locate that data
and refine it in order to use it. Next, the organization needs to develop a model for known
situations and apply it to unknown ones. Since a model uses an algorithm to act on a set of data,
end users can run queries to determine possible relationships and define a solution to a problem
they want to solve.

Not only does data mining help connect large volume of heterogeneous data, it also allows
organizations to analyze data from multiple perspectives, categorize it, and uses the information
to predict future trends and behaviors, decrease costs, increase revenues, and improve processes.
Additionally, data mining reduces time-consuming inquiries and allows organization to
proactively make decisions. Companies can leverage data mining techniques to improve
customers’ loyalty through market segmentation, understand what their competitors are doing,
forecast sales, monitor business performance, and detect fraud, waste, and abuse (Anderson-
Lehman et al., 2004, p. 163).

How Does Data Warehousing Work?

Data warehousing centralizes data coming from a multitude of sources within an organization
and uses consistent and repeatable processes for loading operational data that support the day-to-
day business operations into support databases. Additionally, data warehousing translates
heterogeneous information into common models, names, and definitions and provides a means to
make information available for decision making. Moreover, a data warehouse presents the
information according to specific subjects, integrates data from multiple sources, and displays it
in one form. It also stores historical data that remains consistent regardless of the time it is
accessed; the data kept in the warehouse will not change. Finally, the architecture is open and
scalable and built in such ways that it can support the future expansion of data.

Data Warehouse Design: Bottom-up

Electronic copy available at: https://ssrn.com/abstract=2519737


Academy of Business Research Journal, Vol. III, 2013 71

Kimball (Ballard et al., 1998) supports the bottom-up design and believes that individual data
marts, smaller data warehouses that can function independently or can be interconnected to form
a global integrated data warehouse, should follow a “bus structure” with all common elements
defined for the entire organization. The bus structure allows data marts to be located on one
server or across multiple ones with the data warehouse being a virtual entity representing the
sum of the data marts (Exforsys, 2005). Kimball’s data warehouse architecture reverses the
position of the data marts and data warehouse with data marts giving only a small view of
organizational data that are created first and later grouped together into a larger data warehouse.

Kimball’s’ design allows organizations to plan and design data marts without needing the more
global data warehouse to be in place. This approach contributes to more immediate results from
data marts and offers faster payback. However, a bottom-up design can lead to data redundancy
and inconsistency and the use of multiple data marts may increase the load on the operational
system (Ballard et al., 1998).

Data Warehouse Design: top-down

The top-down warehouse design advocated by Inmon proposes the opposite view than that of
Kimball. Indeed, Inmon believes that all corporate data should be transferred from various
OnLine Transaction Processing (OLTP) systems into a centralized place in order to be analyzed
(Exforsys, 2005). As such, the data warehouse needs to be in place before implementing data
marts. Inmon further argues that data should be subject-oriented and data marts department
specific (Exforsys, 2005).

Inmon’s top-down design works well when organizations have a good centralized information
system and offers the advantage of greater integration and more consistent data definitions
(Ballard et al., 1998). However, this model requires more planning and design work, thus
leading to a higher cost.

Data Warehouse Design: Hybrid

A hybrid data warehouse design seeks to integrate the speed and user-oriented approach of the
bottom-up approach with the integration and enterprise wide data consistency of the top-down
design (Exforsys, 2005).

While the implementation of a hybrid may be more complex, leveraging both designs may allow
organizations to alleviate the issues each design presents and careful planning and monitoring of
the implementation process can give enterprises the opportunity to benefit of both designs
(Ballard et al., 1998).

Data Mining and Data Warehousing in the Airline Industry


The airline industry runs multiple types of operations simultaneously, including customer
service, baggage handling, flight scheduling, and overall business operations management. As a
result, airlines collect and store vast amount of heterogeneous data through multiple systems (e.g.

Electronic copy available at: https://ssrn.com/abstract=2519737


72 Data Mining and Data Warehousing in the Airline Industry

Frequent Flyer Programs and Central Reservation Systems). They can leverage the information
to enhance their customer knowledge and thus improve their customer service by offering
individually targeted communication, which can lead to increased revenues. Airlines can also
utilize data warehouse and data mining tools to analyze aviation safety data and enhance internal
airline safety analysis. For example, this is what American Airlines (AA) did when it partnered
with the MITRE Corporation and implemented MITRE’s Aviation Safety Data Mining
Workbench to analyze AA Aviation Safety Action Program (ASAP) data (Nazeri, 2003).

Wilber (2008) notes how data mining can contribute to aviation safety by analyzing daily flights
data and pilot incident reports to identify potential problem such as unsafe landing or takeoff
practices, difficult landing approaches, and risks associated with midair or ground collision.
Following the analysis of the data from thousands of flights, US Airways changed its landing
checklist to avoid un-stabilized approaches (Wilber, 2008). Finally, airlines can use their data to
enhance maintenance reliability or understand pilot workload and improve efficiencies
throughout the organization.

Vendors also offer airline-based products. For example, Oracle sells an extendable and
customizable off-the-shelf data warehouse framework called the Oracle Airline Data Model
(OADM). OADM is designed to provide “a single scalable repository for transactional and
historical data that can be used to provide real-time business intelligence and strategic insights”
(Oracle Data Sheet, 2011, p. 1). The pre-build data warehouse is aligned with existing airline
industry data formats, which facilitate not only a rapid implementation, but also interoperability
with other systems. Additionally, the OADM incorporates pre-built data mining, On-line
Analytical Processing (OLAP), and dimensions models, which provide airlines with industry-
specific metrics they can leverage to improve their business processes (Oracle Data Sheet, 2011,
p. 2).

Technological Issues

Over the years, technological advances and the increase in business interactions and transactions
contributed to a sharp rise in the amount of data available. While legacy systems can manage
pre-defined queries, they often are unable to support more complex, ad hoc analytics (Inmon and
Valente, 2010, p.1). Moreover, the integration of data can be extremely complex and unknown
issues with the source systems can exist. Additionally, system crashes, downtime or overload
can affect an organization’s ability to effectively utilize its data for timely decision-making
activities (van Gelder, nd).

A key implementation challenge for data warehouses is integrating conflicting or redundant data
from different sources. Furthermore, the size of the database and query complexity will affect
the type of system needed by organizations. Finally, interoperability, or “the ability of a
computer system and/or data to work with other systems or data using common standards or
processes” (Seifert, 2006, p. CRS-17), proves to be a critical factor. Indeed, interoperability can
affect the efficiency of the data mining process and organizations often run into issues when they
attempt to data mine information across legacy systems that do not communicate well with one
another.

Electronic copy available at: https://ssrn.com/abstract=2519737


Academy of Business Research Journal, Vol. III, 2013 73

Data Quality, Integrity, and Security Issues

The value of reports that management can run depends on the quality of the data, which includes
its consistency and accuracy. In order to create a data warehouse, heterogeneous information
from multiple sources must be collected, cleaned, and translated into a common language. Thus,
as high numbers of sources are combined, the integrity of the data may be compromised or some
data may not be captured, thus affecting the quality of the query outcome.

As organizations decide which security measures to implement to protect their data, they must
understand that one type of security solution may not fit all their needs and should thus consider
different levels of security. However, data encryption should be carefully considered as it can
affect a company’s ability to use the information. Indeed, Agosta (2003) explains that “blanket,
global encryption degrades performance, lessens availability and requires complex encryption
key administration” (p. 2). Another security issue concerns web data mining. While web data
mining can be beneficial to businesses, it may present ethical and privacy challenges as users
may not be aware that their information is being collected or do not how the organization plans
to use the information collected. Consequently, web users cannot provide an informed consent
as to how an organization utilizes the data (van Wel and Royakkers, 2004). This can result in
potential violations of user’s privacy, especially when the data is used for a purpose other than its
original purpose.

Data Mining Limitations

The value of data mining is only as good as the ability of the users to interpret the significance of
the patterns and relationships (Seifert, 2006). Moreover, data mining only allows for the
discovery of connections between variables; it does not ascertain the causal relationship. As
Seifert (2006) explains, this determination relies on the skills of experts and its success requires
skilled technical and analytical specialists who can structure the analysis and interpret the output
that is created. In addition, the data analysis can only be as good as the data that is being
analyzed and any integrity issues will affect the query outcome. According to a 2006 CRS
Report for Congress, the presence of duplicate records, the lack of data standards, the timeliness
of updates, and human error can all significantly impact the effectiveness of the more complex
data mining techniques, many of which are sensitive to subtle differences that may exist in the
data (Seifert, 2006).

Other issues include “mission creep”, which Seifert (2006) defines as “the use of data for
purposes other than those for which the data was originally collected” (p. CRS-18). As data is
being analyzed, unexpected patterns or relationship may emerge, thus leading the organization to
abandon its original query. Also, data mining must be relevant to essential business processes
and objectives in order to impact an organization: it requires a company to understand its
business as well as its data. Moreover, organizations should know what kind of data is collected
and what decision they want to make with the information. Finally, companies should have a
clear and precise understanding of the problem they seek to solve as the data provided is only as
good as the questions asked.

Data Warehousing Issues

Electronic copy available at: https://ssrn.com/abstract=2519737


74 Data Mining and Data Warehousing in the Airline Industry

The implementation of a data warehouse is often a long-term, time-consuming, and resource


intensive process. Organizations may not have the necessary expertise to setup and maintain a
data warehouse or they may over-estimate the needs of the system, thus leading to higher costs.
Furthermore, the capability of a data warehouse to efficiently collect and store information
depends on the reliability of the system. Also, while the unit cost of storage is decreasing, the
infrastructure cost for data management is increasing (Inmon, W.H, nd). In other words, the
larger the number of bytes managed by a vendor, the higher the cost of the infrastructure needed
to manage the bytes. As Inmon (nd) explains, “for several terabytes of storage managed by IBM
or Teradata in a data warehouse environment the cost of the infrastructure to manage those
terabytes of data may be from $500,000 to $1,000,000 per terabyte” (p. 4).

Also, data warehouses capture only a fraction of the information needed by managers for
decision-making activities and they often cannot collect, retrieve, and disperse worker’s
knowledge. This would require organizations to implement a knowledge warehouse, which
necessitates a complex and expensive system to enable the collection of multiple forms of
knowledge feeds and support not only the transformation of tacit to explicit knowledge, but also
the spread of explicit knowledge throughout the organization.

Challenges for the Airline Industry and a Success Story

The airlines industry faces interoperability issues as it uses multiple and complex information
technology (IT) systems to support their operations. Also, many airlines operate older legacy
systems, which can create IT systems integration issues and make it difficult to deploy data
warehouses and data mining software. Multiple mergers and acquisition can also complicate the
integration and reconciliation of conflicting or redundant data and can comprise data integrity
and security. Additionally, implementing a data warehouse can be costly and cash- strapped
airlines may be unable to secure the necessary funding. Another challenge for airlines as they
collect information on passengers is the need to balance the privacy of their customers with the
requirements of providing government agencies with the necessary information to support
national security efforts. Finally, airlines must be able to exchange data and information with
their multiple business partners, thus complicating further the need for IT systems integration.

In the mid-1990s, Continental Airlines (Continental) ranked the lowest among major U.S.
Airlines in regards to on-time performance, mishandled baggage, and customer complaints.
When Gordon Bethune took over as CEO, he received approval from the Board of Directors to
implement the “Go Forward Plan”. In his plan, Bethune highlighted four critical areas for to
focus on for Continental including “better understand customers’ needs, change its costs and
cash flow, make reliability a reality, and create a stronger organizational culture” (Anderson-
Lehman et al., 2004, p. 164). Historically, Continental had outsourced its operational systems
and only received a limited set of scheduled reports and no support for ad hoc queries. There
was also no consistent approach to data management and reporting and a lack of corporate data
infrastructure (Anderson-Lehman et al., 2004, p. 164).

As part of the “Go Forward Plan”, Continental decided to develop an enterprise data warehouse,
which the CIO identified as “core to Continental strategy and thus should not be outsourced”

Electronic copy available at: https://ssrn.com/abstract=2519737


Academy of Business Research Journal, Vol. III, 2013 75

(Anderson-Lehman et al., 2004, p. 164). After Continental returned to profitability and ranked
first in the airline industry on several performance metrics, Bethune expanded the vision with the
“First to Favorite” strategy (Anderson-Lehman et al., 2004, p. 166). To support this strategy, the
need for real-time data became increasingly important. Because the data warehouse team had
anticipated this possibility and built from the onset an architecture that could handle real-time
data, Continental was able to quickly move to real-time information. The airline leveraged the
real-time information to improve the recovery of lost airline reservations, customer value
analysis, marketing insight, flight management dashboard, and fraud investigations.

Over six years, Continental invested US $30 million in hardware and software that realized over
US $500 million in increased revenues and cost savings, and went from “worst to first”
(Anderson-Lehman et al., 2004, p. 163). The real-time technology proved critical to support the
company “First to Favorite” strategy. The data warehouse received information from 25
operational systems and two external data sources, and the data was loaded either real-time or in
batch and critical information from analysis in the data warehouse was fed back to the
operational systems (Anderson-Lehman et al., 2004). The real-time information came from
multiple sources including the mainframe reservation system, satellite feeds from airplanes, and
a central customer database. Finally, users were able to access the information in multiple ways,
either through standard query interfaces or custom-built applications (Anderson-Lehman et al.,
2004). Some key factors to the Continental success story include a strong data warehouse team,
data warehouse governance, alignment of the data warehouse with business needs to secure
funding, an architecture designed to support real-time Business Intelligence (BI) needs, the
decision to show users what real-time BI could offer, the creation of common standards to
integration decision support and operational systems, and changes to downstream decision-
making and business processes.

Emerging Technologies
As organizations increasingly rely on BI to enhance their decision-making activities and
facilitate their operational processes, they seek to avoid some of the issues associated with data
warehousing (e.g. system stability, cost). Additionally, companies realize that the information
needed for decision-making activities comes in a format other than transaction-oriented data and
require different means to capture it. As a result, they increasingly turn towards new technologies
to meet their evolving data warehousing and data mining needs.

Cloud Computing

Cloud computing, which is the concept of hosting services over the Internet, may be a potential
solution. The service provided via cloud computing can take the form either Software-as-a-
Service (SaaS) or Platform-as-a-Service (PaaS). While the latter allows customers to run their
own applications on the cloud, the former gives them the opportunity to access the application
via the Internet (van Gelder, nd).

The use of cloud computing for data warehousing presents several advantages. First,
organizations no longer need in-house experts to build and maintain the system hosting their data

Electronic copy available at: https://ssrn.com/abstract=2519737


76 Data Mining and Data Warehousing in the Airline Industry

warehouse as this activity is handled by the vendor. Second, the risk of over-provisioning, which
is over-estimating the system needs, is limited since organizations no longer hold the
responsibility of estimating the computational power need. Third, because cloud computing is
elastic, the amount of resources can automatically increase or decrease based on the data
warehouse needs (e.g., a data warehouse may have less usage at night thus requiring less
resources). In a PaaS model customers may save money as they only pay for the computational
power that is actually used (pay-per-use). Finally, cloud computing can facilitate workload
management by detecting changes in workload in real-time.

While cloud computing offers advantages, it also presents risks. First, organizations are
dependent not only upon the Internet but also upon the infrastructure of the cloud provider for
uploading the necessary data into the data warehouse for cloud storage. This can lead to cost and
performance issues. Another performance issue exists when transferring large amount of data
from the cloud storage to virtual nodes for computing. Second, organizations lose some control
over their information and data which can result in security and trust issues. Indeed,
organizations that use cloud computing to store their data may not know where the data is
hosted, including which country (Brodkin, 2008). Third, wide area network (WAN) latency can
be an issue for applications running in the cloud (van Gelder, nd). Fourth, the cloud provider
may become acquired by another entity, which could make it challenging for organizations to
readily access their data. Finally, if the cloud provider does not replicate the data and application
infrastructure across multiple sites, organizations could lose their information in case of system
failure (Brodkin, 2008).

Textual Data

To date, the majority of the data stored in data warehouses is transaction-oriented (Inmon and
Valente, 2010). However, another key source of information for business decision-making
exists in the form of textual data (e.g. email, contracts, chat logs). Textual data represents the
most abundant and common data in enterprise today, yet it is also the least used for corporate
decision-making (Inmon and Valente, 2010). One of the main reason lies with the form of the
data itself: text does not comply with any computer-based rules and thus lacks the structure
required by databases. As Inmon and Valente (2010) explain, “…text is not repetitive and is
unstructured; databases are built for repetitive and structured data” (p. 1). Although few
solutions exist to handle textual data, one emerging technology is Textual Extract, Transform,
and Load (ETL). ETL facilitates the process of transforming text into a form and structure that
meets the requirements of a database thus giving organizations an opportunity to tap into this
large source of corporate knowledge.
However, leveraging textual data presents several challenges. First, the volume of data will
expand exponentially, thus requiring databases and data warehouses with larger capabilities.
Unless the cost of storage decreases, organizations may be unable to afford the increased storage
needs. Second, the IT infrastructure may not be able to support the amount of data that needs to
be loaded. Third, textual data is less structured than “pure” data and as a result, any analysis
requires an ad hoc process at all times (Inmon and Valente, 2010). Finally, organizations will
need to assess the age of the data and decide when to archive it.

Knowledge Management

Electronic copy available at: https://ssrn.com/abstract=2519737


Academy of Business Research Journal, Vol. III, 2013 77

Although most organizations know how to leverage data warehouses and data mining to provide
decision-makers with key information, computers hold only a fraction of the information
necessary for decision-making activities. Indeed, another critical source of information comes
from employees' knowledge itself. However, this knowledge often exists only in their minds.
Organizations struggle or fail to capture this critical intellectual asset. To facilitate capturing,
coding, retrieving, and disseminating knowledge across an organization and give workers an
intelligent analysis platform to improve the knowledge management process, companies can
leverage existing technologies (e.g. data warehouse and data mining) to implement a knowledge
warehouse. In order to successfully implement a knowledge warehouse, Nemati et al. (2002)
recommend six technological requirements: data/knowledge acquisition module; two feedback
loops; extraction, transformation, loading module; knowledge warehouse (storage) module;
analysis workbench; and communication manager/user interface module.

However, creating a knowledge warehouse presents challenges as there are multiple types and
forms of knowledge and the architecture needs to support the transformation of tacit knowledge
to support new knowledge. Additionally, a knowledge warehouse must support the storage,
initiation, execution, and management of knowledge analysis tasks and associated
implementation technology. It must also be tied to the appropriate knowledge paradigm.
Moreover, the knowledge warehouse architecture must support two feedback loops, on-line
knowledge extraction and real-time storage request (Nemati et al., 2002). While a knowledge
warehouse can provide organizations with valuable and untapped information, the complexity of
implementation results in high amount of organizational time, efforts, and resources and may
involve multiple business units.

Future Applications in the Airline Industry


In a highly competitive environment, airlines continuously strive to retain their competitive
advantage. However, their complex and often out-dated information technology infrastructure
may prevent them from fully leveraging critical information and data. Cloud computing gives
them a unique opportunity to move away from inefficient legacy systems and leverage digital
technologies and high-speed, high-capacity computing power (SITA, 2011). Indeed, in a
challenging economic environment along with cash constraint, airlines must be able to scale their
resource needs up and down quickly. They need a dynamic system to help them retain their
competitive advantage.

Since cloud computing follows a pay-per-use approach, airlines can improve their cash flows by
more accurately matching the actual costs incurred. Furthermore, airlines can lower, or even
avoid, the upfront capital investment usually required by traditional data warehouses by using a
cloud computing third party vendor for their data warehousing need. Also, cloud computing can
help reduce total cost of ownership (TCO) by shifting the cost of data center management to a
third party vendor. In addition, cloud computing can improve the real time analytics of data,
thus providing airlines with better business intelligence and leading to increased revenue and
enhanced customer loyalty. Finally, mid-market airlines can benefit from the agility and speed
of smaller carriers while at the same time accessing the level of data and processing power of a
Tier One airline.

Electronic copy available at: https://ssrn.com/abstract=2519737


78 Data Mining and Data Warehousing in the Airline Industry

However, airlines must carefully consider the services offered by the cloud provider and
investigate not only how the data is segregated between the various clients, but also the technical
support provided in case of system failure or illegal activity . Furthermore, airlines collect data
on passengers that they need to submit to federal agencies (e.g. Federal Aviation Administration)
to comply with national security measures. The loss of control that exists with cloud computing
could lead to the exposure of confidential and sensitive data.

In addition to cloud computing, textual data and knowledge warehouses may provide airlines
with the competitive edge they need to conduct business operations in a highly competitive
market. Indeed, multiple mergers and acquisitions generate additional knowledge and increased
amount of textual data, which airlines can benefit from and leverage to enhance their decision-
making activities, streamline their work processes, and retain their competitive advantage.
However, the cost of implementing and maintaining a knowledge warehouse or system that
could support textual data may be prohibitive.

Summary
Organizations collect and store vast amount of data in different formats and databases and they
now have more information at their disposal than before. In order to retain their competitive
advantage, organizations require timely and complex analysis of integrated data that can reveal
previously unknown, unavailable, or untapped information that can enhance their decision-
making activities. Yet organizations often find it challenging to access this data and struggle as
to the best approach to make use of it. Over the years, advances in data capture, computer
processing power, disk storage capabilities, and statistical software have not only facilitated the
integration of information from various databases into data warehouses but also dramatically
increased the accuracy of analysis while driving down the cost.

Data mining and data warehousing can facilitate the centralization of heterogeneous data coming
from a multitude of sources within an organization and the automated extraction of predictive
information from large databases. This allows organizations to analyze data from multiple
perspectives, identify opportunities for improvement, and enhance their decision-making
activities. However, data mining and data warehousing present some challenges. While data
mining can be a powerful tool for organization, it also needs to be relevant to underlying
business processes so that the data mining process itself does not impact operations. Data
integrity and the ability of end users to interpret the query outcome also affect the value provided
by data mining software. In addition, when implementing a data warehouse, organizations must
consider potential integration issues not only with legacy systems, but also from conflicting or
redundant data. They must also carefully assess the cost associated with the project and identify
which expenses are one-time or recurring expenses.

As the amount of data collected and stored increases, the infrastructure to support the data
mining and warehousing processes becomes more complex and enterprises need more flexibility
and the capability to store ever growing amount of data and information. New technologies such
as cloud computing can facilitate the process of data warehousing and data mining vast amount
of information. Additionally, textual data and knowledge warehouses can give organizations

Electronic copy available at: https://ssrn.com/abstract=2519737


Academy of Business Research Journal, Vol. III, 2013 79

access to vast amount of untapped and underused data. However organizations need to assess
the risks of performance issues (e.g. loading large amount of data, reliance on the Internet), loss
of control, and WAN latency associated with cloud computing as well as the cost and complexity
of involved with textual data and data warehouses.

The airline industry collects and stores large amount of heterogeneous data from a wide variety
of sources. In an environment where competition is constantly increasing, it is critical that they
strategically leverage technology to access, analyze, and use their organizational data in order to
enhance their decision-making activities and improve their business processes. The Continental
case study shows that, while the investment to support data mining and data warehousing
activities can be significant, the quantifiable benefits of they offer can be much larger.

References
Agosta, L. (2003). Special issues with data warehousing security. Information Management.
Retrieved April 29, 2012 from http://www.information-
management.com/issues/20031001/7398-1.html

Anderson-Lehman, R., Watson, H.J., Wixom, B.H., & Hoffer, J. A. (2004). Continental Airlines
flies high with real-time business intelligence. MIS Quaterly Executive, (3)4, pp. 163-
176.

Ballard, C., Herreman, D., Schau, D., Bell, R., Kim, E., & Valencic, A. (1998). Data modeling
techniques for data warehousing. International Technical Support Organization, pp.15-
22. Retrieved April 10, 2012 from
http://www.redbooks.ibm.com/redbooks/pdfs/sg242238.pdf

Brodkin, Jon. (2008). Gartner: Seven cloud-computing security risks. NetworkWorld. Retrieved
May 17, 2012 from http://www.networkworld.com/news/2008/070208-cloud.html

Exforsys Inc. (2005). Design of data warehouse: Kimball Vs. Inmon. Retrieved April 10, 2012
from http://www.exforsys.com/tutorials/msas/data-warehouse-design-kimball-vs-
inmon.html

Inmon, W. H. (nd). Some straight talk about the costs of data warehousing. An Inmon Consulting
White Paper. Retrieved May 5, 2012 from
http://www.dataupia.com/pdfs/industryanalysts/reg/Inmon_White_Paper.pdf

Inmon, B., & Valente, G. (2010). A peek into the future: The next wave of data warehousing.
Enterprise Systems. Retrieved May 2, 2012 from http://esj.com/articles/2010/02/03/next-
wave-dw.aspx

Electronic copy available at: https://ssrn.com/abstract=2519737


80 Data Mining and Data Warehousing in the Airline Industry

Kleissner, C. (1998). Data mining for the enterprise. Proceedings of the 31st Annual Hawaii
International Conference on Systems Science. Retrieved Jun 27, 2012 from
http://www.informatik.uni-trier.de/~ley/db/conf/hicss/index.html.

Nazeri, Z. (2003). Application of aviation safety data mining workbench at American Airlines.
The MITRE Corporation.

Nemati, H.R., Steiger, D.M., Lyer, L.S., & Herschel, R.T. (2002). Knowledge warehouse: an
architectural integration of knowledge management, decision support, artificial
intelligence and data warehousing. Decision Support Systems, (33), 143-161. Retrieved
April 23, 2012 from
http://www.sciencedirect.com/science/article/pii/S0167923601001415

Oracle Data Sheet. (2011). Oracle Airline Data Model. Retrieved May 2, 2012 from
http://www.oracle.com/us/products/database/airline-data-model-ds-1419516.pdf

Seifert, J.W. (2006). Data mining: An overview. CRS Report for Congress. Retrieved April 22,
2012 from http://www.au.af.mil/au/awc/awcgate/crs/rl31798.pdf

SITA. (2011). Cloud computing: An opportunity for blue sky thinking. New Frontiers Paper.
Retrieved May 12, 2012 from
http://www.aticloud.com/pdf/259_SITA_New_Frontiers_Paper_(new_brand)_14.06.11_v
2.6(LoRes).pdf

van Gelder, K. (nd). Elastic data warehousing in the cloud: Is the sky really the limit? Retrieved
May 9, 2012 from http://homepages.cwi.nl/~boncz/msc/2011-KeesvanGelder.pdf

van Wel, L., and Royakkers, L. (2004). Ethical issues in web data mining. Ethics and
Information Technology, (6), 129-140. Retrieved April 22, 2012 from
http://alexandria.tue.nl/repository/freearticles/612259.pdf

West, D. (2008). What is data mining? [YouTube Video]. Retrieved April 13, 2012 from
http://www.youtube.com/watch?v=wqpMyQMi0to
Wilber, D.Q. (2008). Avoiding plane crashes by crunching numbers. The Washington Post.
Retrieved April 30, 2012 from http://www.washingtonpost.com/wp-
dyn/content/article/2008/01/12/AR2008011202407.html

Wong, C. (2009). Finance reporting. Oracle Business Intelligence Enterprise Edition (OBIEE)
Training. Retrieved May 19, 2012 from
http://fiscaff.sfsu.edu/services/onlineform/forms/pdf/fms_finance_reporting.pdf

Electronic copy available at: https://ssrn.com/abstract=2519737

You might also like