Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Concept Check

1. b
2. a
3. c
4. b
5. d
6. a
7. c
8. c
9. c
10. b
11. c
12. b
13. a

Discussion Questions

14. (SO 1) How does data differ from information? Data is the basic facts
collected from a transaction. Information is data that has been manipulated
by summarizing, categorizing, or analyzing to make that data useful to a
decision maker.

15. (SO 1) Why is it important for companies to store transaction data? There are
four reasons that a company must collect and store transaction data. Those
reasons are: 1) to complete transactions; 2) for follow-up on later transactions
or for reference regarding future transactions with the same entity; 3) to
prepare external reports such as financial statements; and 4) to provide
information to management as they attempt to run the organization efficiently
and effectively.

16. (SO 2) Which type of data storage media is most appropriate when a single
record of data must be accessed frequently and quickly? Random access
storage works best for situations in which a single record must be accessed
quickly and easily.

17. (SO 3) Identify one type of business that would likely use real-time data
processing rather than batch processing. Describe the advantages of real-
time processing to this type of business. A business that sells items on a web
site, such as Amazon, would be likely to use real-time data processing. This
is true because the system must be able to determine information such as
whether an item ordered is currently in stock. The main advantage of real-
time processing is its ability to provide information immediately. There are
many examples of the need for real-time data processing. Airline reservation
systems are another example.

18. (SO 4) Differentiate between data redundancy and concurrency. Data


redundancy occurs when the same data are stored in more than one file.
Thus, there is redundant, or repeated data. Concurrency means that all of
the multiple instances of the same data are exactly the same. It is harder to
achieve concurrency when there is much data redundancy.

19. (SO 4) What is the term for the software program(s) that monitors and
organizes the database and controls access and use of data? Describe how
this software controls shared access. This software system is called a
Database Management System, or DBMS. The DBMS manages the access
of users or processes to the online database. The DBMS manages the data
sharing by updating the data available to users immediately upon recording
any changes.

20. (SO 4) Describe the trade-offs of using the hierarchical model of database
storage. A hierarchical model database is very efficient for processing large
volumes of similar transactions. It is not efficient for accessing or processing
a single record from a large database. Therefore, it works well with batch
processing, but would not be efficient in those situations where accessing a
single record, or answering flexible queries, is necessary.

21. (SO 4) Describe the organization of a flat file database. Flat file records are
two dimensional tables with rows and columns. The records are stored in text
format in sequential order, and all processing must occur sequentially. No
relationships are defined between records. These systems must use batch
processing only and batches must be processed in sequence. The system
makes the processing of large volumes of similar transactions very efficient.

22. (SO 4) What four conditions are required for all types of databases? 1) Items
in a column must all be the same data type. 2) Each column must be
uniquely named. 3) Each row must be unique in at least one column. 4)
Each intersection of a row and column must contain only one data item.

23. (SO 4) Within a hierarchical database, what is the name for the built-in
linkages in data tables? Which data relationships can be contained in a
hierarchical database? Record pointers are used to link a record to the next
record having the same attribute. Using a record pointer system, one-to-one
and one-to-many relationships can be represented in a hierarchical database.

24. (SO 4) Which database models are built on the inverted tree structure? What
are the disadvantages of using the inverted tree structure for a database?
Both the hierarchical database model and the network database model are
based on an inverted tree structure. The network model is more complex
because it uses more than one inverted tree structure. This allows two or
more paths into the data. Two disadvantages are that new data cannot be
added until all related information is known, and deleting a parent record can
delete all child records.

25. (SO 4) Which database model is used most frequently in the modern
business world? Why do you believe it is frequently used? The relational
database model is now used most frequently. It is frequently used because it
is the most flexible database model. An English-like query language, SQL,
can be used to retrieve data from the database in a very flexible manner. In
addition, the increasing computer power and decreasing cost of computing
power have made any inefficiencies in a relational database less significant.

26. (SO 5) How is the primary key used in a relational database? The primary
key is the unique identifier for each record in the table and it is used to sort,
index, and access records from that table.

27. (SO 5) What language is used to access data from a relational database?
Why is the language advantageous when accessing data? Structured Query
Language, SQL, is the language used to access data in a relational database.
Its advantage is its English-like query language that allows easy access to the
data in the database and presentation in a manner most useful to the user.

28. (SO 5) Which type of database model has the most flexibility for querying?
How does this flexibility assist management? The relational database model
is the most flexible database model for querying. It provides important
assistance to managers through its flexibility in answering an unlimited
number of queries about customers, products, vendors, or any other
information in the database.

29. (SO 5) What are the first three rules of normalization? What is meant by the
statement that the rules of normalization are additive? 1) Eliminate repeating
groups. 2) Eliminate redundant data. 3) Eliminate columns not dependent on
the primary key. Additive means that if a table meets the third rule, it has also
met the preceding rules: one and two.

30. (SO 6) Differentiate between a data warehouse and an operational database.


An operational database is the database in which data is continually updated
as transactions are processed. The operational database includes data for
the current fiscal year and it supports day-to-day operations and record
keeping for the transaction processing systems. The data warehouse is an
integrated collection of enterprise-wide data that includes five to 10 years of
non-volatile data, and it is used to support management in decision making
and planning. Periodically, new data is uploaded to the data warehouse from
the operational data, but other than this updating process, the data in the data
warehouse does not change.
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

31. (SO 7) How is data mining different than data warehousing? Data mining is
the use of data analysis tools to analyze data in a data warehouse. Tools
such as OLAP are used in data mining. An example of data mining is
analyzing sales data to determine customer buying patterns. The data
warehouse is the database in which the data to be analyzed is stored.

32. (SO 7) How has Anheuser-Busch used data warehousing and data mining
successfully? Anheuser-Busch has used a data warehouse and data mining
to analyze sales history, price-to-consumer, holidays and special events, daily
temperature, and forecasted data such as anticipated temperature to create
forecasts of sales by store and by product. Data are used by salespeople
and distributors to rearrange displays, rotate stock, and inform stores of
promotion campaigns. Using these buying trends, Anheuser-Busch creates
promotional campaigns, new products, and local or ethnic target marketing.

33. (SO 7) Identify and describe the analytical tools in OLAP. The analytical
tools that are usually part of OLAP are: drill-down, consolidation, pivoting,
time-series analysis, exception reports, and what-if simulations. Drill down is
the successive expansion of data into more detail, going from high-level data
to successively lower levels of data. Consolidation is the aggregation or
collection of similar data; it is the opposite of drill down in that consolidation
takes detailed data and summarizes it into larger groups. Pivoting is
examining data from different perspectives. Time series analysis is the
comparison of figures over several successive time periods to uncover trends.
Exception reports present variances from expectations. What-if simulations
use changing variables to examine interactions between different parts of the
business.

34. (SO 8) Differentiate between centralized data processing and distributed data
processing. In centralized data processing, data processing and databases
are stored and maintained in a central location. In distributed data
processing, the processing and the databases are dispersed to different
geographic locations of the organization. A distributed database is actually a
collection of smaller databases dispersed across several computers on a
computer network.

35. (SO 8) What are the “clients” and “servers” in a client/server distributed
database system? Servers are computers or processes that manage files
and databases, printers, or networks. Clients are usually PCs or workstations
that run the applications. Clients rely on servers for resources such as files,
printers, and even processing power.

36. (SO 9) Why is control over unauthorized access so important in a database


environment? Data are valuable resources that must be protected with good
internal controls such as those that prevent unauthorized access. Access
controls help prevent unauthorized users from accessing, altering, or
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

destroying data in the database. The database is such a critical resource for
most organizations that they must insure the data is accurate and complete.

37. (SO 9) What are some internal control measures that could prevent a hacker
from altering data in your company’s database? Measures that prevent
hackers from accessing and altering data include authentication and hacking
controls such as login procedures, passwords, security tokens, biometric
controls, firewalls, encryption, intrusion detection, and vulnerability
assessment. In addition to these controls, the database management system
(DBMS) must be set up so that each authorized user has a limited view
(schema) of the database.

38. (SO 9) Why are data considered a valuable resource that is worthy of
extensive protection? The database of an organization is a critically important
component of the organization. Data are a valuable resource that must be
protected with good internal controls. Missing or incorrect data can have a
negative impact on the ability to conduct the necessary business processes.

Brief Exercises

39. (SO 2) Arrange the following data storage concepts in order from smallest to
largest, in terms of their size: File, Record, Database, Character, and Field.
The hierarchy of terms is character, field, record, file, and database.

40. (SO 2) Think of a telephone book as a database. Identify the fields likely to
be used in this database. If you were constructing this database, how many
spaces would you allow for each field? The fields and suggested sizes that
usually be needed are: last name (24), first name (24), middle initial or name
(24), address line 1 (50), address line 2 (50), apartment number (12), city
(24), state (2), zip code (9), phone number (10). For businesses, a field for
business name (40) would be used rather than last name and first name. The
number of spaces for each field can vary. Of course, fields such as zip code
and phone number are more certain. It is important that the field size must be
slightly larger than the longest item to appear in that field. In the case of
items for which we know the size precisely, the field size can be set
accordingly. For example, zip codes will never include more than 9 digits.

41. (SO 3) Suppose that a large company uses batch processing for recording its
inventory purchases. Other than its slow response time, what would be the
most significant problem with using a batch processing system for recording
inventory purchases? A company would not know its true inventory balance
until the batch of transactions was processed. There would be no online,
current balance of inventory to be used to respond to inquiries from
managers, employees, or customers. Therefore, purchases and sales of
inventory might need to be delayed until the batch processing occurs and new
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

balances are known. This delay can cause the company to maintain higher
or lower levels of inventory than may be desired. With a longer time to place
an order, the company might need to maintain higher inventory levels to avoid
a stock out.

42. (SO 4) Arrange the following database models in order from earliest
development to most recent: Network Databases, Hierarchical Databases,
Flat File Databases, and Relational Databases. The historical order is flat file,
hierarchical, network, and relational databases.

43. (SO 4) Categorize each of the following as one-to-one, one-to-many, or


many-to-many relationships.
 Subsidiary ledgers and general ledgers. This is best categorized as a
one- to-many relationship. A general ledger account, such as accounts
receivable, could have many supporting sub-accounts in the accounts
receivable subsidiary ledger. It is also true that a general ledger would
have many subsidiary ledgers (accounts receivable, accounts payable,
inventory, payroll).
 Transactions and special journals. This is best categorized as a one- to-
many relationship. A special journal, such as sales journal, would have
many supporting transactions recorded in the special journal.
 General ledgers to trial balances. This is best categorized as a one-to-
one relationship. For each time period, one set of general ledger balances
would result in one trial balance.

44. (SO 6) How might a company use both an operational database and a data
warehouse in the preparation of its annual report? A company would use the
operational database for the current fiscal year reports, but may need past
information from the data warehouse to prepare comparative financial
statements from previous years. The company might also use the data
warehouse to examine and report important trends in financial information.

45. (SO 7) Using Anheuser-Busch’s BudNet example presented in this chapter,


think about the list queries that might be valuable if a company like Gap Inc.
used data mining to monitor its customers’ buying behavior. The Gap could
use queries related to: the effects of promotional pricing; dates or holiday
buying patterns; dates when seasonal style updates should be introduced in
stores; regional clothing preferences; ethic group clothing patterns; and GAP
sales in relation to competitors.

Problems

46. (SO 3) Differentiate between batch processing and real-time processing.


What are the advantages and disadvantages of each form of data
processing? Which form is more likely to be used by a doctor’s office in
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

preparing the monthly patient bills?

Batch processing occurs when similar transactions are grouped into a batch
and that batch is processed as a group. The alternative to batch processing
is real time processing. Real-time processing occurs when transactions are
processed as soon as they are entered. Real-time processing is interactive
because the transaction is processed immediately.

The advantages of batch processing are that it is an efficient way to process a


large volume of like transactions, it is less complex than real-time systems, it
is easier to control and maintain an audit trail; and the data can be stored in
less complex, sequential storage. The major disadvantage of batch
processing is the slow response time. Balances are not updated in real-time
and therefore, management does not have current information at all times.

The major advantage of real-time processing is the rapid response time.


Since balances are updated in real-time, management always has current
information. The disadvantages of real-time processing are that it is less
efficient for processing large volumes of like transactions; it is more complex
than batch systems; it is more difficult to control and maintain an audit trail;
and data must be stored in random access databases.

Monthly processing of patient bills could be batch processing. There would


be a high volume of like transactions at month-end.

47. (SO 4) Allibyr Company does not use a database system; rather, it maintains
separate data files in each of its departments. Accordingly, when a sale
occurs, the transaction is initially recorded in the sales department. Next,
documentation is forwarded from the sales department to the accounting
department so that the transaction can be recorded there. Finally, the
customer service group is notified so its records can be updated. Describe
the data redundancy and concurrency issues that are likely to arise under this
scenario at Allibyr.

There would be much data redundancy in this system. For example,


customer name, address, and other contact information must be maintained
in separate files in both the sales department and the customer service
department. Customer service and the sales department would have nearly
identical fields in their data, but maintained in separate files.

It may take hours our days for the sale documentation to move from one
department to the next. Therefore not all departments have the same
information stored in their files at the same time. After a sale is recorded in
the sales department, it may be days before that sale is recognized in the
customer service department. Therefore on any given day, managers in the
two departments will be operating with feedback from data sets that do not
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

match. If someone in the sales department needs to check with customer


service regarding a particular sale, it is possible that the customer service
department has not yet received information for that sale. This lengthens
response time in answering queries or following up on orders.

48. (SO 6) List and describe the steps involved in building a data warehouse.

The steps are: identify the important data to be stored in the data warehouse;
standardize that data across the enterprise; scrub or cleanse the data; and
upload that data to the data warehouse.

Identifying the proper data requires examining user needs and high-impact
processes (HIPs). HIPs are the processes that are critically important and
that must be executed correctly if the organization is to survive and thrive.
Data needed by users and data from HIPs should be in the data warehouse.
The data must then be standardized across the enterprise. Various subunits
within the enterprise might have conflicting definitions or field names for the
same type of data. The designers of the data warehouse must design a
standard format for the data. The data must also be scrubbed or cleansed to
remove errors and inconsistencies in the data. The data must then be
uploaded to the data warehouse. Also there should be a periodic upload of
data from the operational databases into the data warehouse.

49. (SO 8) Describe the advantages and disadvantages of using a distributed


database and distributed data processing. Do you think the advantages are
worthwhile? Explain your answer.

The advantages are: 1) Reduced hardware cost. Distributed systems use


networks of smaller computers rather than a single mainframe computer.
This configuration is much less costly to purchase and maintain. 2) Improved
Responsiveness. Access is faster since data can be located at the site of the
greatest demand for that data. Processing speed is improved since the
processing workload is spread over several computers. 3) Easier incremental
growth. As the organization grows or requires additional computing
resources, new sites can be added quickly and easily. Adding smaller,
networked computers is easier and less costly than adding a new mainframe
computer. 4) Increased user control and user involvement. If data and
processing are distributed locally, the local users have more control over the
data. This control also allows users to be more involved in the maintenance
of the data and users are therefore more satisfied. 5) Automatic integrated
backup. When data and processing are distributed across several
computers, the failure of any single site is not as harmful. Other computers
within the network can take on extra processing or data storage to make up
for the loss of any single site.
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

The disadvantages are: 1) Increased difficulty of managing, controlling, and


maintaining integrity of the data. 2) Increased likelihood of concurrency
problems.

Yes, I think it is worthwhile to have distributed, local control of the data and
automated, integrated backup of a distributed system. However, greater
attention must be paid to controls that ensure the security and concurrency of
the data in a distributed system.

50. (SO 10) Describe the ethical obligations of companies to their online
customers.

A company must put processes and safeguards into place to protect the
privacy and confidentiality of customer data. The nine privacy practices
described by the AICPA Trust Services Principles are a good source of the
guidelines a company should follow.

Web Exercises

51. (SO 6) Using an internet search engine, search for the terms “data
warehouse” + Amazon + Neteeza Corporation. Neteeza assisted Amazon in
implementing new data warehouse system in mid-2005. Describe some
feature of this new data warehouse and why it is so important to
Amazon.com’s business.

In May of 2005, Amazon announced that it would be using a Neteeza data


analysis system. The announcement on Neteeza’s web site indicates that the
“Netezza Performance Server system is the market-leading data warehouse
appliance, built specifically to analyze terabytes of detailed data 10 to 50
times faster than existing data warehouse options, at half the cost.”

The “Netezza data warehouse appliance can query data in near real-time,
delivering faster and deeper analysis capabilities on terabytes of information.”
The nature of Amazon’s online sales of books and other items requires
examining data quickly. For example, when a person orders a book on
Amazon.com, the web site gives feedback to the customer about books that
other customers buy. Therefore there is a need to quickly analyze the book
purchased by a customer, and extract the other books that customers who
bought that book also buy.

In addition, Amazon needs to analyze customer buying habits and patterns so


that it can appropriately plan delivery, set sale prices, and offer promotions.

(SO 8) Oracle is the world’s largest enterprise software company. Its business is
based on information: helping people use it, share it, manage it, and protect it.
Using an Internet search engine, search using the terms “Google” and
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

“distributed database”. Describe how and why Google uses a distributed database.
What problems is Google encountering related to its distributed database? Google
uses distributed databases in a few ways. One way is to store user settings and
user ID information for the many Google services such as gmail, Google Alerts,
personalized home page, and Google Answers. This distributed database of
user settings is an Oracle database. Whenever a user logs in, and from
wherever they log in, the distributed database can access and provide the
appropriate user settings. An article at
http://www.oracle.com/customers/snapshots/google-oracle-berkeley-db-
casestudy.pdf indicates that: “Read requests—such as when an existing user returns
and logs in—are routed to any device in the partition, be it a master or a replica. However,
updates to the database—such as the addition of a new user or modification of an existing
user’s data—are routed only to the partition master. That master system records the change
and then propagates it to the replicas. It considers the update complete when more than half
the replicas have recorded the change. This design ensures that if a system failure occurs,
the nodes can replicate the change among themselves and throughout the rest of the
partition.” Google also uses a distributed database in its search processes. It
is a distributed database system that breaks search queries into fragments
and distributes them to multiple computers in a network to get faster results.
The problem that Google has encountered recently is that it has been sued
by Northeastern University and a company named Jarg for patent
infringement. The suit claims that Google illegally uses this patented
distributed database technology developed at Northeastern University.

52. (SO 10) List and describe the ten privacy practices recommended by the
AICPA Trust Services Principles’ Privacy Framework. If you have ever made
a purchase online, you have likely seen these practices in use. Provide any
examples from your own personal experience.

Management. An organization should properly manage privacy. This means


that a person or persons must be assigned responsibility to oversee privacy
practices. This person should insure that privacy practices are documented
and followed.
Notice. Notice implies that companies provide their privacy policy to
customers. In many cases, the privacy policy is posted on the website. The
notice should inform customers of the purpose for collecting the information
and how it will be used.
Choice and consent. The company should tell customers of any choices the
customer might have to opt out of providing information. Also, the online
company should gain consent from the customer to collect, use, and retain
the information. Consent can be implicit or explicit.
Collection. The company should limit collection of customer data to that data
for which they have received consent.
Use and retention. The company should only use customer data as
disclosed in its privacy policy and should retain such data only as long as
necessary.
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

Access. The company should provide access that would allow the customer
to view, edit, or delete personal information.
Disclosure to third parties. Any data collected from customers should not
be shared with third parties unless the customer has given consent.
Security for privacy. The organization should have protections to insure
that customer data is not lost, destroyed, altered, or subject to unauthorized
access.
Quality. Protection and controls should exist to insure that the data has
quality. Quality means it is accurate and complete.
Monitoring and enforcement. Privacy practices should be continuously
monitored to insure they are followed.

Examples that students provide will differ from student to student. Examples
would be things such as viewing the privacy policies on the web site, granting
consent for data collected, etc.

Cases

53. The following relationships appear in the screen capture.


a. Suppliers table is related to Products table. It is a one-to-many
relationship. The linked field is SupplierID.
b. Categories table is related to Products table. It is a one-to-many
relationship. The linked field is CategoryID.
c. Products table is related to Order Details table. It is a one-to-many
relationship. The linked field is ProductID.
d. Order Details table is related to Orders table. It is a one-to-many
relationship. The linked field is OrderID.
e. Employees table is related to Orders table. It is a one-to-many
relationship. The linked field is EmployeeID.
f. Customers table is related to Orders table. It is a one-to-many
relationship. The linked field is CustomerID.
g. Shippers table is related to Orders table. It is a one-to-many relationship.
The linked field is ShipperID.

54. Shuttle-Eze operates airport shuttle vans in twelve large cities. Those cities
are Los Angeles, San Diego, San Francisco, Phoenix, Las Vegas, Houston,
Dallas, Chicago, New York City, Washington D.C., Miami, and Orlando.
Shuttle-Eze operates passenger vans to shuttle travelers to and from the
airports for a $25 fee per person.

Required:
a. Design the tables that the company would need in its database to operate
these shuttles. Remember that they must collect, record, and track
information about customers, payments, flights, gates, vans, drivers, pick
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

up and delivery addresses. There may be other types of data not


mentioned in the previous sentence that you would wish to add. The
tables you design should have attributes (columns) for each critical piece
of data. See Exhibit 13-4 for the concept of the table layouts. The tables
you design should meet the first three rules of data normalization.
b. Describe the advantages and disadvantages of using a centralized
database for Shuttle-Eze.

There is no single correct answer to this question. Also, student


responses are not likely to be comprehensive enough to represent the
entire set of data needed. The question will, however, generate thought
and discussion about which data to keep and how to store it in tables. It
also illustrates how complex a database can be with the full scope of data
required and the interrelationships between data. A sample answer
follows.

Part a: The tables proposed are:

Customer table, including fields for Customer ID; first name; last name;
address line 1; address line 2, city; state, zip code; home phone; business
phone; cell phone; fax number; billing address line 1; billing address line 2,
billing city; billing state, billing zip code; credit card number; credit card
expiration date

Reservation table, including fields for Reservation ID; Customer ID; Arrival
date; arrival time; arrival flight airline; arrival flight number; arrival gate;
departure date; departure time; departure flight airline; departure flight
number; departure gate; deliver to address; deliver to city; deliver to city;
deliver to state; deliver to phone; pick up address; pick up city; pick up
state; pick up zip; pick up; phone number

Employee tables, including Employee ID, first name; last name; address
line 1; address line 2; city; state; zip code; SSN; birth date; hire date;
review date; pay rate; federal exemptions; state exemptions; fields for all
other pay deductions

Driver table, including fields for Employee ID; city assigned to; license
type;

Van table, including fields for VanID; Van make; van model; van VIN; city
assigned to; current driver assigned; last service date; mileage at last
service

Part b: A centralized database could be advantageous for several


reasons. First, it avoids the need to keep several distributed databases
concurrent. If each location maintains a separate database, it would be
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

more difficult to share customer information between locations. A single


customer may use the company’s services in Miami and Chicago. With a
single, centralized database, once the customer’s basic data is entered, it
need not be reentered when that customer uses the services in another
city. Additionally, vans might be shuttled between cities if needed. For
example, a shortage of vans in Orlando might be fixed by driving a van or
vans from Miami. A centralized database could easily be changed to
reflect the new location of the van. If separate databases were maintained,
the van details would have to be reentered at the new location.

55. Kroger, a large, nation-wide grocery company, maintains a customer reward


system titled the “Kroger Plus Shopper’s Card”. Customers who enroll in this
system are entitled to discounts on products at Kroger stores and Kroger
gasoline. To earn discounts and other rewards, the shopper must use the
“Kroger Plus” card at the time of checkout. The card has a bar code that
identifies the customer. For Kroger, the system allows the opportunity to
determine customer buying patterns and to use this data for data mining.

Required:
Using a Web search engine, search for “data mining’ and grocery. Describe
what type of information grocery stores collect that they can use for data
mining purposes. Also, describe how grocery chains use data mining to
improve performance.

Grocery stores use these customer reward systems, sometimes called


“loyalty cards”, to induce customer loyalty and to track purchases. A grocery
store could collect data such as products and brands purchased. So as an
example, if a customer’s buying habits include frequent purchase of Tyson
chicken products, the store’s system can be set to print chicken or Tyson
coupons at the time of check-out. Such data mining can assist the grocery
chain in identifying customer buying habits and thereby increase sales
through targeted promotions.

Many grocers hire a firm named Catalina to assist in the data mining. One
article says the following: “Catalina collects loyalty and bulk sales data from
more than 20,000 stores, then uses it to create pictures of shoppers over
time, says CEO L. Dick Buell. The picture gets clearer the more data stores
collect. For example, Catalina can help retailers determine that someone lives
in an upscale area, buys diapers, and may be interested in high-end baby
food.” (http://www.usatoday.com/tech/news/internetprivacy/2006-07-11-data-
mining_x.htm)

There appear to be many web sites online that discuss that one Midwest
chain found a connection between men who buy diapers on Thursday and
increased purchase of beer by those men. Other web sites suggest that is an
urban myth, but it appears unresolved as to whether it is true.
Alaa.aliasrei@gmail.com ‫@ عالء هحسن شحن‬Aliasrei‫صفحة الباحث العلوي هجانا‬ ‫تلكرام‬

This data collection has also caused privacy concerns. Some have suggested
that grocery chains do not need loyalty cards to collect data about sales of
products. They believe grocers could collect such data by analyzing sales in
general and inventory levels.

You might also like