ADS 2 Lab

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Lab Work

Practical No. 4-5


Q4. Case study of any distributed and parallel database management system.

A4. A DISTRIBUTED SYSTEM IN A ROOM:

The most common distributed system is a one-site system. A single site avoids many of the
operational and organizational problems of distribution, because it is run just like a centralized
system. Some of the advantages of geographic centralization of a distributed computer system are:

* LANs give high-speed, low-cost, reliable connections among nodes.

* Only one operations staff is required.

* Hardware and software maintenance are concentrated in one location, and are hands-on rather than
thin-wire remote.

* It is easier manage the physical security of premises and data. A one-site distributed system differs
from a classical system. It consists of many loosely-coupled processors one giant processor. The
arguments for chosing a architecture rather than a centralized one are:

* CAPACITY: No single processor is powerful enough for the job.

* MODULAR GROWTH: Processing power can be added in small increments.

* AVAILABILITY: Loose coupling gives excellent fault containment.

* PRICE: The distributed system is cheaper than the alternatives.

* REFLECT ORGANIZATION STRUCTURE: Each functional unit has one or more nodes that
perform its function. IBM'S JES, TPF-HPO, IMS Data Sharing, and DEC's VAX Cluster are
examples of using loosely coupled processors to build a large systemin-a-room out of modular
components. A European bank gives a good example of a Tandem customer using this architecture.
The bank began by supporting their 100 Automated Tellers, 2500 human tellers, inter-bank switch,
and administrative services for retail banking [Sammer]. The initial system supported a classic
memo-post application: Data, captured during the day from tellers and ATMs, is processed by
nightly-batch runs to produce the new master file, monthly statements to customers, interbank
settlements, and management reports. The peak online load is 50 transactions per second. In addition
to designing the user (teller/customer) interfac:e, the bank took considerable care in designing the
operator interfaces and setting up procedures so that the system is simple to operate. Special
emphasis was put on monitoring, diagnosing, and repairing communications lines and remote
equipment. The appllcation and device drivers maintain a detailed status and history of each
terminal, line and device. This information is kept in an online relational database. Another
application formats and displays this information to the operator, thus helping him to understand the
situation and to execute and interpret diagnostics. Recurring operator activities, (batch runs) are
managed automatically by the system. Batch work against online data is broken into many mini-
batch jobs so that, at anyone time, most of the data is available for online access. The initial system
consisted of 32 processors with 64 disc spindles. The processors were divided into three production
nodes and two small development nodes connected via Tandem's FOX fiber optic ring [Horst]. Once
retail banking was operational, the next application to be implemented was cash management. It
allows customers to buy or sell foreign currency, and to transfer assets among accounts in different
currencies. This is a separate part of the banking business, so it has an "arms-length" relationship to
the retail banking system.

Although it runs as part of the bank's distributed system, the cash management system does not
directly read and write the retail banking databases. Rather, it sends messages to the retail banking
system server processes that debit and credit accounts. These retail banking servers access the retail
database using the procedures implemented by the retail banking organization. Money transfers
between the two bank departments must be atomic -- it would be improper to debit a savings account
and not credit a foreign currency account. The transaction mechanism (locks and logs) automatically
makes such cross-organization multi-server transactions atomic. This interaction between retail and
cash-management banking exemplifies the case for the requestor-server architecture. Even though
the data was "local", the designers elected to go through a server so that the two parts of the
application could be independently managed and reorganized. Later, electronic mail was added to
the system. The bank is now going completely online, eliminating the problems implicit in a memo-
post nightly-batch system.

The bank has demonstrated linear growth -- it can add processors and discs to the system and get a
proportional growth in throughput. In addition, the bank's development staff has been able to add
applications which build on the existing application servers. The system consists of 68 processors
(over 100 mips) with over 200 megabytes of main memory and 90 spindles of duplexed disc.
Theprocessors are divided into 10 nodes: each node consists of two to sixteen processors. Particular
nodes are functionally specia~ized: two handle the long-haul network, three others do retail banking,
one does cash management and one does electronic mail, while others are reserved for auditors or
for developers. This functional division of nodes simplifies system tuning and gives each department
control over its resources. Although many processors and discs are involved, programs have a
"single-system-image", that is all files, processes and terminals appear to be local. This software
illusion is made a reality by connecting all the processors together with high speed (100mbit/sec)
local networks. (figure 2). a single building. All management and operations staff are in one place,
only the terminals are remote. This is easier to manage than supervising multiple computer rooms
and staff spread over a broad geographic area.

This system could have been a large (100 mip, 200Mb, 100 disc> centralized computer system if
such a computer were available. The additional desire for modular growth, availability and low cost
each forced the choice of a distributed hardware and software architecture.
2. A one site system of 9 nodes connected via a local network (dark line). The nodes are functionally
specialized. Some functions are large enough to require inultiple nodes.

Case Study Overview Of Parallel Database

The case study presented in this appendix provides techniques for designing new applications for use
with Oracle Parallel Server. You can also use these techniques to evaluate existing applications and
determine how well suited they are for migration to Oracle Parallel Server.

Note:

Always remember that your goal is to minimize contention: doing so results in optimized
performance.

This case study assumes you have made an initial database design. To optimize your design for
Parallel Server, follow the methodology suggested here.

Develop an initial database design.

Analyze access to tables.


Analyze transaction volume.

Decide how to partition users and data.

Decide how to partition indexes, if necessary.

Choose high or low granularity locking.

Implement and tune your design.

Case Study: From Initial Database Design to Oracle Parallel Server

This case study demonstrates analytical techniques in practice. Although your applications will
differ, this example helps you to understand the process. The topics in this section are:

"Eddie Bean" Catalog Sales

Tables

Users

Application Profile

"Eddie Bean" Catalog Sales

The case study is about the fictitious "Eddie Bean" catalog sales company. This company has many
order entry clerks processing telephone orders for various products. Shipping clerks fill orders and
accounts receivable clerks handle billing. Accounts payable clerks handle orders for supplies and
services the company requires internally. Sales managers and financial analysts run reports on the
data. This company's financial application has three business processes operating on a single
database:

Order entry

Accounts payable

Accounts receivable

Tables

Tables from the Eddie Bean database include:


Table A-1 "Eddie Bean" Sample Tables

Table Contents

ORDER_HEADER

Order number, customer name and address.

ORDER_ITEMS

Products ordered, quantity, and price.

ORGANIZATIONS

Names, addresses, phone numbers of customers and suppliers.

ACCOUNTS_PAYABLE

Tracks the company's internal purchase orders and payments for supplies and services.

BUDGET

Balance sheet of the company's expenses and income.

FORECASTS

Projects future sales and records current performance.

Users

Various application users access the database to perform different functions:

Order entry clerks

Accounts payable clerks

Accounts receivable clerks

Shipping clerks

Sales manager

Financial analyst

Application Profile

Operation of the Eddie Bean application is fairly consistent throughout the day: order entry, order
processing, and shipping occur all day. These functions are not for example, segregated into separate
one-hour time slots.
About 500 orders are entered per day. Each order header is updated about 4 times during its lifetime.
So we expect about 4 times as many updates as inserts. There are many selects, because many
employees are querying order headers: people doing sales work, financial work, shipping, tracing the
status of orders, and so on.

There are on average 4 items per order. Order items are never updated: an item may be deleted and
another item entered. The ORDER_HEADER table has four indexes. Each of the other tables has a
primary key index only.

Budget and forecast activity has a much lower volume than the order tables. They are read
frequently, but modified infrequently. Forecasts are updated more often than budgets, and are
deleted once they go into actuals.

The vast bulk of the deletes are performed as a nightly batch job. This maintenance activity does not,
therefore, need to be included in the analysis of normal functioning of the application.

Q5. Study and use of open source data mining tool: Weka.

A5. Weka offers four option functionalities. These functionalities are command-line interface (CLI),
Explorer, Experimenter and Knowledge flow. Explorer allows the definition of data source, data
preparation, machine learning algorithms, and visualization. The Experimenter is used mainly for
comparison of the performance of different algorithms on same dataset. Weka has a few features:

a. Apply learning method to a dataset and analyze its output to learn more about the data.

b. Learned models can be use to generate predictions on new instances.

c. Predictions can be choose based on the application of several different learners and comparison on
their performance.

Eventhough, Weka support many model evaluation procedures and metrics, but many data survey
and visualization methods are absence. Weka also more oriented towards classification and
regression problems and less towards descriptive statistics and clustering methods. The support for
big data, text mining and semi-supervised learning is also currently limited.

You might also like