What Is Business Analytics?: Satinderpal Kaur MBA3 (D)

Satinderpal Kaur
MBA3(D)
What is Business Analytics?

Business Analytics (BA) is not a new phenomenon. It has been around for many years, but
predominantly with companies operating in the technically oriented environment. Only
recently its making its breakthrough and we can see more and more companies, especially in
the financial and the telecom sector, deal with business analytics in order to support business
processes and improve performance. So what does business analytics refers to?
Business Analytics is translating data into information that is necessary for business owners
to make informed decisions and investments. It is the difference between running business on
a hunch or intuition versus looking at collected data and predictive analysis. It is a way of
organizing and converting data into information to help answer questions about the business.
It leads to better decision making by looking for patterns and trends in the data and by being
able to forecast impact of decisions before they are taken.
BA can serve throughout the whole company and all C-level executives can take an
advantage of it. For example Chief Marketing Officers (CMOs) can use BA to get
better customer insight and enhance customer loyalty. Chief Financial Officers (CFOs)
can better manage financial performance and use financial forecasts. Chief Risk Officers
(CROs) can get a holistic view of risk, fraud and Compliance information across the
organization and take an action. Chief Operating Officers (COOs) can get better insight into
supply chains and operations and enhance efficiency.
Companies use business analytics to data-driven decision making. For being successful they
need to treat their data as a corporate asset and leverage it for competitive advantage.
Successful business analytics depends on data quality, skilled analysts who understand the
technologies and the business and organizational commitment to data-driven decision
making.
Examples of BA uses include:
Exploring data to find new relationships and patterns (data mining)

Explaining why a certain result occurred (statistical analysis, quantitative analysis)
Experimenting to test previous decisions (A/B testing, multivariate testing
Forecasting future results (predictive modeling, predictive analytics)
Why Is Business Analytics important?

Becoming an analytics-driven organization helps companies to extract insights from their
enterprise data and help them to achieve costs reduction, revenue
increase and competitiveness improvement. This is why business analytics is one of the top
priorities for CIOs. An IBM study shows that CFOs in organizations that make extensive use
of analytics report growth in revenues of 36 percent or more, a 15 percent greater return on
invested capital and twice the rate of growth in EBITDA (earnings before interest, taxes,
depreciation and amortization).
Chandigarh Business School
Page 1
Satinderpal Kaur
MBA3(D)
Business Analytics helps you make better, faster decisions and automate processes. It helps
you address the questions and ensure you to stay one step ahead your competition. Some of
the basic questions in retail environment could be:
How big should a store be?

What market segments should be targeted?
How should a certain market segment be targeted in terms of products, styles, price
points, store environment, location?
Who are our customers?
How should space in the store be allocated to the various product groups and price
points for maximum profitability?
What mitigation strategies are effective and cost efficient for example changes to
packaging, fixtures, placement of product?
What is the best customer loyalty program for our customers?
What is the optimal staffing level on the sales floor?
How many checkouts are optimal in a store?
Would acquisition of a particular store brand improve profitability?
Would creation of a new store brand improve profitability?
DATA MODELLING
Data modeling is a process used to define and analyze data requirements needed to support
the business processes within the scope of corresponding information systems in
organizations. Therefore, the process of data modeling involves professional data modelers
working closely with business stakeholders, as well as potential users of the information
system.
According to Hoberman, data modeling is the process of learning about the data, and
the data model is the end result of the data modeling process (data(Data Modeling a
technique in which data is converted into easy way for decision making, easily
understandable, , a blueprint, data modeling is the process of learning about the data, and
the Data Model data model is the end result of the data modeling process)
For Example: A Company want to build guest house base)<end result>.They call building
architect (Data Modeler)who tells how to do it. Then he will tell what is required building
requirements(Business Requirement).then he build a plan how to do it blueprint(developed
data model).
In other words
Data modeling is the formalization and documentation of existing processes and events that
occur during application software design and development. Data modeling techniques and
tools capture and translate complex system designs into easily understood representations of
the data flows and processes, creating a blueprint for construction and/or re-engineering.
Page 2
Satinderpal Kaur
MBA3(D)
A data model can be thought of as a diagram or flowchart that illustrates the relationships
between data
There are several different approaches to data modeling, including:
1) Conceptual Data Model
2) Logical Data Model
3) Physical Data Model
1) Conceptual data model
A conceptual data model identifies the highest-level relationships between the different
entities. Features of conceptual data model include:
Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.
The figure below is an example of a conceptual data mode
Page 3
Satinderpal Kaur
MBA3(D)
2) Logical Data Model

A logical data model describes the data in as much detail as possible, without regard to how
they will be physical implemented in the database. Features of a logical data model include:
Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity is specified.
Foreign keys (keys identifying the relationship between different entities) are
specified.
Normalization occurs at this level.
3) Physical Data Model

Physical data model represents how the model will be built in the database. A physical
database model shows all table structures, including column name, column data type, column
constraints, primary key, foreign key, and relationships between tables. Features of a physical
data model include:
Page 4
Satinderpal Kaur
MBA3(D)
Specification all tables and columns.
Foreign keys are used to identify relationships between tables.
Renormalizations may occur based on user requirements.
Physical considerations may cause the physical data model to be quite different from
the logical data model.
DATA MODELING TECHNIQUES

Data Modeling identifies the data or information a system needs to be able to store, maintain,
or provide access to. Business analysts often use a combination of diagrams, textual
descriptions, and matrices to model data. Each modeling technique helps us analyze and
communicate different information about the data-related requirements. In this article, well
look at 4 different data modeling techniques and discuss when and how to use each
technique.
1) Entity Relationship Diagram
An Entity Relationship Diagram (ERD) model entities, relationships, and attributes. Heres a
simple ERD (a more complex ERD is included in the Visual Model Sample Pack).
Page 5
Satinderpal Kaur
MBA3(D)
In This Example
Customer and Order are entities. The items listed inside each entity, such as Customer
Name are attributes of the entity. The line connecting Customer and Order is showing
the relationship between the two entities, specifically that a Customer can have 0 to many (or
any number of) orders.
ERDs can be used to model data at multiple levels of specificity, from the low-level physical
database model to mid-level logical database model, to the high-level business domain
model.
An ERD is a good choice if you have multiple concepts or database table and are analyzing
the boundaries of each concept or table. By defining the attributes, you figure out what
belongs with each entity. By defining the relationships, you figure out how each entity relates
to the other entities in your model.
2) Data Matrix
A Data Matrix provides more detailed information about the data model and can take a
variety of different forms. Typically a Data Matrix is captured in a spreadsheet format and
contains a list of attributes, along with additional information about each attribute. Some
common types of additional information that might be captured in a column in a data matrix
include the following:
Data Type
Allowable Values
Required or Optional
Sample Data
Notes
A Data Matrix is a good choice when its necessary to analyze detailed information about
each attribute in your data model. This information is often used to design and build the
Page 6
Satinderpal Kaur
MBA3(D)
physical database and so is needed by the data architect or database developer. A sample data
matrix is included with the Data Model sample in theVisual Model Sample Pack.
(there is a marketer, and his sales are growing and he want to know why, he is supplying
every state of india, only one information should not represent it no of factors will as sales is
increasing because of good quality lower price )
3) Data Mapping Specification
A Data Mapping Specification shows how information stored in two different databases
connect to each other. The databases are often part of two different information technology
systems which may be owned by your organization, your organization and a third party
vendor, or two cooperating organizations.
For Example, when I worked for an online job board company, we created a data mapping
specification to define how wed import job content from some of our bigger clients who did
not wish to manually input the details of each job using our employer portal.
Any time you are connecting two systems together through a data exchange or import, a data
mapping specification will be a good choice. A sample data mapping specification and
template are included in the Business Analyst Template Toolkit.
4) Data Flow Diagram
A Data Flow Diagram illustrates how information flows through, into, and out of a system.
Data Flow Diagrams can be created using a simple workflow diagram or one of two formal
notations listed in the BABOK Guide the Yourdon Notation or the Gane-Sarson Notation.
A Data Flow Diagram does not tell you much about what data is created or maintained by a
system, but it does tell you a lot about how the data flows through the system or a set of interconnected systems. A Data Flow Diagram shows the data stores, data processes, and data
outputs.
A Data Flow Diagram is a good choice if your data goes through a lot of processing, as it
helps clarify when and how those processes are executed. Then, each data store could be
modeled using an ERD and/or Data Matrix and each process using a Data Mapping
Specification. Samples of data flow diagrams in all three notations are included in the Visual
Model Sample Pack.
MUTLIDIMENSIONAL MODELING
Dimensional Data Model
Dimensional data model is most often used in data warehousing systems. This
is different from the 3rd normal form, commonly used for transactional (OLTP)
type systems. As you can imagine, the same data would then be stored
differently in a dimensional model than in a 3rd normal form model. To
Page 7
Satinderpal Kaur
MBA3(D)
understand dimensional data modeling, let's define some of the terms commonly
used in this type of modeling:
Dimension: A category of information. For example, the time dimension.

Attribute: A unique level within a dimension. For example, Month is an attribute in the
Time Dimension.
Hierarchy: The specification of levels that represents relationship between different
attributes within a dimension. For example, one possible hierarchy in the Time dimension is
Year Quarter Month Day.
Fact Table: A fact table is a table that contains the measures of interest. For example, sales
amount would be such a measure. This measure is stored in the fact table with the appropriate
granularity. For example, it can be sales amount by store by day. In this case, the fact table
would contain three columns: A date column, a store column, and a sales amount column.
SCHEMA
1) Star Schema
In the star schema design, a single object (the fact table) sits in the middle and is radically
connected to other surrounding objects (dimension lookup tables) like a star. Each dimension
is represented as a single table. The primary key in each dimension table is related to a
foreign key in the fact table.
Sample star schema
Page 8
Satinderpal Kaur
MBA3(D)
All measures in the fact table are related to all the dimensions that fact table is related to. In
other words, they all have the same level of granularity.
A star schema can be simple or complex. A simple star consists of one fact table; a complex
star can have more than one fact table.
Let's look at an example: Assume our data warehouse keeps store sales data, and the different
dimensions are time, store, product, and customer. In this case, the figure on the left
represents our star schema. The lines between two tables indicate that there is a primary key /
foreign key relationship between the two tables. Note that different dimensions are not related
to one another.
2) Snowflake Schema
The snowflake schema is an extension of the star schema, where each point of the star
explodes into more points. In a star schema, each dimension is represented by a single
dimensional table, whereas in a snowflake schema, that dimensional table is normalized into
multiple lookup tables, each representing a level in the dimensional hierarchy.
Sample snowflake schema
Page 9
Satinderpal Kaur
MBA3(D)
For example, the Time Dimension that consists of 2 different hierarchies:

1. Year Month Day
2. Week Day
We will have 4 lookup tables in a snowflake schema: A lookup table for year, a lookup table
for month, a lookup table for week, and a lookup table for day. Year is connected to Month,
which is then connected to Day. Week is only connected to Day. A sample snowflake schema
illustrating the above relationships in the Time Dimension is shown to the right.
The main advantage of the snowflake schema is the improvement in query performance due
to minimized disk storage requirements and joining smaller lookup tables. The main
disadvantage of the snowflake schema is the additional maintenance efforts needed due to the
increase number of lookup tables.
3) Fact constellation
DATA MART
Page 10
Satinderpal Kaur
MBA3(D)
A data mart is a segment of a data warehouse that can provide data for reporting and analysis
on a section, unit, department or operation in the company, e.g. sales, payroll, production.
Data marts are sometimes complete individual data warehouses which are usually smaller
than the corporate data warehouse.It is an indexing and extraction system. Instead of putting
the data from all the departments of a company into a warehouse, data mart contains database
of separate departments and can come up with information using multiple databases when
asked.
IT managers of any growing company are always confused as to whether they should make
use of data marts or instead switch over to the more complex and more expensive data
warehousing. These tools are easily available in the market, but pose a dilemma to IT
managers.
Difference between Data Warehousing and Data Mart
It is important to note that there are huge differences between these two tools though they
may serve same purpose. Firstly, data mart contains programs, data, software and hardware of
a specific department of a company. There can be separate data marts for finance, sales,
production or marketing. All these data marts are different but they can be coordinated. Data
mart of one department is different from data mart of another department, and though
indexed, this system is not suitable for a huge data base as it is designed to meet the
requirements of a particular department.
Data Warehousing is not limited to a particular department and it represents the database of a
complete organization. The data stored in data warehouse is more detailed though indexing is
light as it has to store huge amounts of information. It is also difficult to manage and takes a
long time to process. It implies then that data marts are quick and easy to use, as they make
use of small amounts of data. Data warehousing is also more expensive because of the same
reason.
DATA WAREHOUSING
A data warehouse is a collection of data marts representing historical data from different
operations in the company. This data is stored in a structure optimized for querying and data
analysis as a data warehouse. Table design, dimensions and organization should be consistent
throughout a data warehouse so that reports or queries across the data warehouse are
consistent. A data warehouse can also be viewed as a database for historical data from
different functions within a company. This is the place where all the data of a company is
stored. It is actually a very fast computer system having a large storage capacity. It contains
data from all the departments of the company where it is constantly updated to delete
redundant data. This tool can answer all complex queries pertaining data.
DATA INTEGRATION
Page 11
Satinderpal Kaur
MBA3(D)
Data integration involves combining data from several disparate sources, which are stored
using various technologies and provide a unified view of the data. Data integration becomes
increasingly important in cases of merging systems of two companies or consolidating
applications within one company to provide a unified view of the company's data assets. The
later initiative is often called a data warehouse.
Probably the most well known implementation of data integration is building an enterprise's
data warehouse. The benefit of a data warehouse enables a business to perform analyses
based on the data in the data warehouse. This would not be possible to do on the data
available only in the source system. The reason is that the source systems may not contain
corresponding data, even though the data are identically named, they may refer to different
entities.
EXTRACT ,TRANSFORM AND LOAD

The term ETL which stands for extract, transform, and load is a three-stage process in
database usage and data warehousing. It enables integration and analysis of the data stored in
different databases and heterogeneous formats. After it is collected from multiple sources
(extraction), the data is reformatted and cleansed for operational needs (transformation).
Finally, it is loaded into a target database,data warehouse or a data mart to be analyzed. Most
of numerous extraction and transformation tools also enable loading of the data into the end
target. Except for data warehousing and business intelligence, ETL Tools can also be used to
move data from one operational system to another.
EXTRACT
The purpose of the extraction process is to reach to the source systems and collect the data
needed for the data warehouse.
Usually data is consolidated from different source systems that may use a different data
organization or format so the extraction must convert the data into a format suitable for
transformation processing. The complexity of the extraction process may vary and it depends
on the type of source data. The extraction process also includes selection of the data as the
source usually contains redundant data or data of little interest.
For the ETL extraction to be successful, it requires an understanding of the data layout. A
good ETL tool additionally enables a storage of an intermediate version of data being
extracted. This is called "staging area" and makes reloading raw data possible in case of
further loading problem, without re-extraction. The raw data should also be backed up and
archived.
TRANSFORM
Page 12
Satinderpal Kaur
MBA3(D)
The transform stage of an ETL process involves an application of a series of rules or

functions to the extracted data. It includes validation of records and their rejection if they are
not acceptable as well as integration part. The amount of manipulation needed for
transformation process depends on the data. Good data sources will require little
transformation, whereas others may require one or more transformation techniques to meet
the business and technical requirements of the target database or the data warehouse. The
most common processes used for transformation are conversion, clearing the duplicates,
standardizing, filtering, sorting, translating and looking up or verifying if the data sources are
inconsistent. A good ETL tool must enable building up of complex processes and extending a
tool library so custom user's functions can be added.
LOAD
The loading is the last stage of ETL process and it loads extracted and transformed data into a
target repository. There are various ways in which ETL load the data. Some of them
physically insert each record as a new row into the table of the target warehouse involving
SQL insert statement build-in, whereas others link the extraction, transformation, and loading
processes for each record from the source. The loading part is usually a bottleneck of the
whole process. To increase efficiency with larger volumes of data we may need to skip SQL
and data recovery or apply external high-performance sort that additionally improves
performance.
An ideal ETL architecture contains a data warehouse

Below youll find the ideal ETL architecture supporting the three major steps in ETL.
Page 13
Satinderpal Kaur
MBA3(D)
DATA WAREHOUSE
DEFINITION
Different people have different definitions for a data warehouse. The most popular definition
came from Bill Inmon, who provided the following:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process.
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example,
source A and source B may have different ways of identifying a product, but in a data
warehouse, there will be only a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the most recent data is kept. For
example, a transaction system may hold the most recent address of a customer, where a data
warehouse can hold all addresses associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
Page 14
Satinderpal Kaur
MBA3(D)
Ralph Kimball provided a more concise definition of a data warehouse:

A data warehouse is a copy of transaction data specifically structured for query and
analysis.
This is a functional view of a data warehouse. Kimball did not address how the data
warehouse is built like Inmon did; rather he focused on the functionality of a data warehouse.
Functionality of Data Warehouses
Data warehouses exist to facilitate complex, data-intensive and frequent adhoc queries.
Data warehouses must provide far greater and more efficient query support than is demanded
of transactional databases. The data warehouse access component supports enhanced
spreadsheet functionality, efficient query processing, structured queries, adhoc queries, data
mining and materialized views. Particularly enhanced spreadsheet functionality includes
support for state-of-the art spreadsheet applications as well as for OLAP applications
programs. These provide preprogrammed functionality such as the following:
Roll-up: Data is summarized with increasing generalization
Drill-down: Increasing levels of detail are revealed
Pivot: Cross tabulation that is, rotation, performed

Slice and dice: Performing projection operations on the dimensions
Sorting: Data is sorted by ordinal value
Selection: Data is available by value or range
Derived or computer attributes: Attributes are computed by operations on stored and
derived values.
OLTP (ON-LINE TRANSACTION PROCESSING)

OLTP (ON-LINE TRANSACTION PROCESSING) is characterized by a large number of
short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP
systems is put on very fast query processing, maintaining data integrity in multi-access
environments and an effectiveness measured by number of transactions per second. In OLTP
database there is detailed and current data, and schema used to store transactional databases is
the entity model (usually 3NF).
Page 15
Satinderpal Kaur
MBA3(D)
OLTP System deals with operational data. Operational data are those data involved in the
operation of a particular system.
Example: In a banking System, you withdraw amount from your account. Then Account
Number, Withdrawal amount, Available Amount, Balance Amount, Transaction Number etc
are operational data elements
OLAP (ON-LINE ANALYTICAL PROCESSING)

OLAP (ON-LINE ANALYTICAL PROCESSING) is characterized by relatively low volume
of transactions. Queries are often very complex and involve aggregations. For OLAP systems
a response time is an effectiveness measure. OLAP applications are widely used by Data
Mining techniques. In OLAP database there is aggregated, historical data, stored in multidimensional schemas (usually star schema).
OLAP deals with Historical Data or Archival Data. Historical data are those data that are
archived over a long period of time. Data from OLTP are collected over a period of time and
store it in a very large database called Data warehouse. The Data warehouses are highly
optimized for read (SELECT) operation.
Example: If we collect last 10 years data about flight reservation, The data can give us many
meaningful information such as the trends in reservation. This may give useful information
like peak time of travel, what kinds of people are traveling in various classes
(Economy/Business)etc.
In other words, the ability to analyze metrics in different dimensions such as time,
geography, gender, product, etc. For example, sales for the company is up. What region is
most responsible for this increase? Which store in this region is most responsible for the
increase? What particular product category or categories contributed the most to the increase?
Answering these types of questions in order means that you are performing an OLAP
analysis. Depending on the underlying technology used, OLAP can be broadly divided into
two different camps: MOLAP and ROLAP. A discussion of the different OLAP types can
be found in the MOLAP, ROLAP, and HOLAP section.
MOLAP, ROLAP, And HOLAP:

In the OLAP world, there are mainly two different types: Multidimensional OLAP
(MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies
that combine MOLAP and ROLAP.
Page 16
Satinderpal Kaur
MBA3(D)
MOLAP:
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a
multidimensional cube. The storage is not in the relational database, but in proprietary
formats.
Advantages:
Excellent performance:
MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing
operations.
Can perform complex calculations:
All calculations have been pre-generated when the cube is created. Hence, complex
calculations are not only doable, but they return quickly.
Disadvantages:
Limited in the amount of data it can handle:

Because all calculations are performed when the cube is built, it is not possible to include a
large amount of data in the cube itself. This is not to say that the data in the cube cannot be
derived from a large amount of data. Indeed, this is possible. But in this case, only
summary-level information will be included in the cube itself.
Requires additional investment:

Cube technology are often proprietary and do not already exist in the organization.
Therefore, to adopt MOLAP technology, chances are additional investments in human and
capital resources are needed.
ROLAP:
This methodology relies on manipulating the data stored in the relational database to
give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each
action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
Advantages:
Can handle large amounts of data:

The data size limitation of ROLAP technology is the limitation on data size of the
underlying relational database. In other words, ROLAP itself places no limitation on data
amount.
Page 17
Satinderpal Kaur
MBA3(D)
Can leverage functionalities inherent in the relational database: Often, relational

database already comes with a host of functionalities. ROLAP technologies, since they sit
on top of the relational database, can therefore leverage these functionalities.
Disadvantages:
Performance can be slow:

Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the
relational database, the query time can be long if the underlying data size is large.
Limited by SQL functionalities:
Because ROLAP technology mainly relies on generating SQL statements to query the
relational database, and SQL statements do not fit all needs (for example, it is difficult to
perform complex calculations using SQL), ROLAP technologies are therefore traditionally
limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the
tool out-of-the-box complex functions as well as the ability to allow users to define their own
functions.
HOLAP:
HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For
summary-type information, HOLAP leverages cube technology for faster performance. When
detail information is needed, HOLAP can "drill through" from the cube into the underlying
relational data.
Page 18
Satinderpal Kaur
MBA3(D)
DATA MINING
DEFINITION OF 'DATA MINING'
A process used by companies to turn raw data into useful information. By using software to
look for patterns in large batches of data, businesses can learn more about their customers and
develop more effective marketing strategies as well as increase sales and decrease costs. Data
mining depends on effective data collection and warehousing as well as computer processing.
Page 19
Satinderpal Kaur
MBA3(D)
Data mining is a logical process that is used to search through large amount of data in order
to find useful data. The goal of this technique is to find patterns that were previously
unknown. Once these patterns are found they can further be used to make certain decisions
for development of their businesses.
Three steps involved are
1) Exploration
2) Pattern identification
3) Deployment
1) Exploration: In the first step of data exploration data is cleaned and transformed into
another form, and important variables and then nature of data based on the problem are
determined.
2) Pattern Identification: Once data is explored, refined and defined for the specific
variables the second step is to form pattern identification. Identify and choose the patterns
which make the best prediction.
3) Deployment: Patterns are deployed for desired outcome.
ADVANTAGES AND DISADVANTAGES OF DATA MINING
ADVANTAGES OF DATA MINING:

1) Marking/Retailing:
Data mining can aid direct marketers by providing them with useful and accurate
trends about their customers purchasing behavior.
Based on these trends, marketers can direct their marketing attentions to their
customers with more precision.
For example, marketers of a software company may advertise about their new
software to consumers who have a lot of software purchasing history.
In addition, data mining may also help marketers in predicting which products their
customers may be interested in buying.
Through this prediction, marketers can surprise their customers and make the
customers shopping experience becomes a pleasant one.
Retail stores can also benefit from data mining in similar ways.
Page 20
Satinderpal Kaur
MBA3(D)
For example, through the trends provide by data mining, the store managers can
arrange shelves, stock certain items, or provide a certain discount that will attract their
customers.
2) Banking/Crediting:
Data mining can assist financial institutions in areas such as credit reporting and loan
information.
For example, by examining previous customers with similar attributes, a bank can
estimated the level of risk associated with each given loan.
In addition, data mining can also assist credit card issuers in detecting potentially
fraudulent credit card transaction.
Although the data mining technique is not a 100% accurate in its prediction about
fraudulent charges, it does help the credit card issuers reduce their losses.
3) Law enforcement:
Data mining can aid law enforcers in identifying criminal suspects as well as
apprehending these criminals by examining trends in location, crime type, habit, and other
patterns of behaviors.
4) Researchers:
Data mining can assist researchers by speeding up their data analyzing process; thus,
allowing them more time to work on other projects.
DISADVANTAGES OF DATA MINING
1) Privacy Issues:
Personal privacy has always been a major concern in this country. In recent years,
with the widespread use of Internet, the concerns about privacy have increase
tremendously. Because of the privacy issues, some people do not shop on Internet. They
are afraid that somebody may have access to their personal information and then use that
information in an unethical way; thus causing them harm.
Although it is against the law to sell or trade personal information between different
organizations, selling personal information have occurred. For example, according to
Washing Post, in 1998, CVS had sold their patients prescription purchases to a different
company.
Page 21
Satinderpal Kaur
MBA3(D)
In addition, American Express also sold their customers credit care purchases to
another company.8 What CVS and American Express did clearly violate privacy law
because they were selling personal information without the consent of their customers.
The selling of personal information may also bring harm to these customers because
you do not know what the other companies are planning to do with the personal
information that they have purchased.
2) Security issues:
Although companies have a lot of personal information about us available online, they
do not have sufficient security systems in place to protect that information.
For example, recently the Ford Motor credit company had to inform 13,000 of the
consumers that their personal information including Social Security number, address,
account number and payment history were accessed by hackers who broke into a database
belonging to the Experian credit reporting agency.
This incidence illustrated that companies are willing to disclose and share your
personal information, but they are not taking care of the information properly. With so
much personal information available, identity theft could become a real problem.
3) Misuse of information/inaccurate information:
Trends obtain through data mining intended to be used for marketing purpose or for
some other ethical purposes, may be misused.
Unethical businesses or people may used the information obtained through data
mining to take advantage of vulnerable people or discriminated against a certain group of
people.
In addition, data mining technique is not a 100 percent accurate; thus mistakes do
happen which can have serious consequence.
Page 22

What Is Business Analytics?: Satinderpal Kaur MBA3 (D)

Uploaded by

Copyright:

Available Formats

You might also like

What Is Business Analytics?: Satinderpal Kaur MBA3 (D)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Is Business Analytics?: Satinderpal Kaur MBA3 (D)

Uploaded by

Copyright:

Available Formats

Satinderpal Kaur

What is Business Analytics?

Exploring data to find new relationships and patterns (data mining)

Why Is Business Analytics important?

How big should a store be?

Chandigarh Business School

Includes the important entities and the relationships among them.

No primary key is specified.

The figure below is an example of a conceptual data mode

Chandigarh Business School

2) Logical Data Model

Includes all entities and relationships among them.

All attributes for each entity are specified.

The primary key for each entity is specified.

Normalization occurs at this level.

3) Physical Data Model

Specification all tables and columns.

Foreign keys are used to identify relationships between tables.

Renormalizations may occur based on user requirements.

DATA MODELING TECHNIQUES

Chandigarh Business School

Chandigarh Business School

Dimension: A category of information. For example, the time dimension.

Chandigarh Business School

Chandigarh Business School

For example, the Time Dimension that consists of 2 different hierarchies:

Chandigarh Business School

Chandigarh Business School

EXTRACT ,TRANSFORM AND LOAD

Chandigarh Business School

The transform stage of an ETL process involves an application of a series of rules or

An ideal ETL architecture contains a data warehouse

Chandigarh Business School

Ralph Kimball provided a more concise definition of a data warehouse:

Pivot: Cross tabulation that is, rotation, performed

OLTP (ON-LINE TRANSACTION PROCESSING)

Chandigarh Business School

OLAP (ON-LINE ANALYTICAL PROCESSING)

MOLAP, ROLAP, And HOLAP:

Limited in the amount of data it can handle:

Requires additional investment:

Can handle large amounts of data:

Chandigarh Business School

Can leverage functionalities inherent in the relational database: Often, relational

Performance can be slow:

Chandigarh Business School

Chandigarh Business School

ADVANTAGES OF DATA MINING:

Chandigarh Business School

Chandigarh Business School

3) Misuse of information/inaccurate information:

Chandigarh Business School

You might also like