Week 2

DISTRIBUTED SYSTEMS DEVELOPMENT
WEEK 2
Lecturer: Edison Kagona
BIT-6 S1-205
Distributed Systems Vs Centralised Systems
Distributed Systems Vs Centralised Systems
Distribution Transparency
• A distributed system should hide the fact that its processes
and resources are physically distributed across multiple
computers. A transparent distributed system is one that is
able to present itself to users and applications as if it were
only a single computer system
• Types of transparency:
• With Access transparency we wish to hide differences in
machine architectures, like machines with different operating
systems, each having their own file-naming conventions.
These should all be hidden from users and applications.
• With Location transparency users should not tell where a

resource is physically located in the system. Location
transparency can be achieved by assigning only logical
names to the resources, e.g. the URL
http://www.prenhall.com/index.html. gives no clue about the
location of Prentice Hall's main Web server or whether
index.html has always been at its current location or was
recently moved there.
• With Migration transparency resources can be
moved without affecting how those resources can be
accessed.
• With Relocation transparency resources can be

relocated while they are being accessed without the
user or application noticing anything.
• In Replication transparency resources are
replicated to increase availability or to improve
performance by placing a copy close to the place
where it is accessed.
• Replication transparency hides the fact that several

copies of a resource exist. It is necessary that all
replicas have the same name and the system should
also support location transparency.
• With Concurrent transparency resources can be
shared but each user does not notice that the other is
making use of the same resource e.g. two
independent users may each have stored their files
on the same file server or may be accessing the
same tables in a shared database. Concurrent
access to a shared resource should leave that
resource in a consistent state.
• Consistency can be achieved through locking

mechanisms where users are, in turn, given
exclusive access to the desired resource.
• With Failure transparent a user does not notice that a
resource fails to work properly, and that the system
subsequently recovers from that failure. The main
difficulty in masking failures lies in the inability to
distinguish between a dead resource and a painfully
slow resource.
• Distribution transparency is generally good but there

are situations in which attempting to completely hide all
distribution aspects from users is not a good idea. An
example is requesting your electronic newspaper to
appear in your mailbox before 7 A.M. local time, as
usual, while you are currently at the other end of the
world living in a different time zone. Your morning paper
will not be the morning paper you are used to
Degree of Transparency
• Likewise, a wide-area distributed system that connects
a process to distant areas cannot hide the fact that the
message will be delivered at the same time. Signal
transmission is not only limited by the speed of light but
also by limited processing capacities of the
intermediate switches.
• There is also a trade-off between a high degree of

transparency and the performance of a system. For
example, many Internet applications repeatedly try to
contact a server before finally giving up. Attempting to
mask a server failure before trying another one may
slow down the system.
• Another example is where we need to guarantee that
several replicas, located on different continents, need to
be consistent all the time i.e. if one copy is changed, that
change should be propagated to all copies before
allowing any other operation. It is clear that a single
update operation may now even take several seconds to
complete and this cannot be hidden from users.
• There are situations where hiding distribution is not a

good idea and it is best to actually expose distribution
rather than trying to hide it e.g. for an office worker who
wants to print a file from a notebook computer it is better
to send the print job to a busy nearby printer, rather than
to an idle one at corporate headquarters in a different
country.
• Other arguments against distribution transparency
also exist. It may be much better to make distribution
clear so that the user and application developer will
much better understand the behavior of a distributed
system.
• Aiming for distribution transparency may be a nice

goal when designing and implementing distributed
systems, but that it should be considered together
with other issues such as performance and
comprehensibility.
Openness
• The openness of a computer system is the
characteristic that determines whether the system
can be extended and re-implemented in various
ways. It is determined by the degree to which new
resource sharing services can be added and be made
available for use by a variety of client programs.
• The system may be extended at the hard ware level

by the addition of computers to the network and at the
software level by the introduction of new services and
the re-implementation of old ones, enabling
application programs to share resources. A further
benefit for open systems is their independence from
individual vendors
Openness
• Openness can be achieved when the specification and
documentation of the key software interfaces of the
components of a system are made available to software
developers (published).
• The designers of the Internet protocols introduced a series

of documents called 'Requests For Comments', or RFCs,
each of which is known by a number. These have continued
to form the basis of the technical documentation of the
Internet. This series includes discussions as well as the
specifications of protocols. Copies can be obtained from
www.ietf.org
• RFCs are not the only means of publication. For example,

CORBA is published through a series of technical
documents, including a complete specification of the
interfaces of its services. www.omg.org
Openness
• The challenge to designers of distributed systems is to tackle
the complexity of distributed systems consisting of many
components engineered by different people.
• In summary:
o Open systems are characterized by the fact that their key
interfaces are published.
o Open distributed systems are based on the provision of a

uniform communication mechanism and published
interfaces for access to shared resources.
o Open distributed systems can be constructed from

heterogeneous hardware and software, possibly from
different vendors. But the conformance of each component
to the published standard must be carefully tested and
verified if the system is to work correctly.
Scalability
• It describes the ability of the system to dynamically adjust its

own computing performance by changing
available computing resources and scheduling methods.
• Scalability of a system can be measured along three different

dimensions.
o A system can be scalable with respect to its size; we can
easily add more users and resources to the system.
o Geographically scalable system; the users and resources
may lie far apart.
o Administratively scalable; it can still be easy to manage even
if it spans many independent administrative organizations.
• Unfortunately, a system that is scalable in one or more of these

dimensions often exhibits some loss of performance as the
system scales up.
Scalability Problems
Scaling with respect to size:

• If more users or resources need to be supported, limitations
of centralized services, data, and algorithms are realized.
• Many services are implemented by means of only a single

server running on a specific machine in the distributed
system. The server can become a bottleneck as the number
of users and applications grows.
• Using only a single server however is sometimes unavoidable

e.g. for a service managing highly confidential information
such as medical records, bank accounts, etc. It is best to
implement those services by means of a single server in a
highly secured separate room, and protected from other parts
of the distributed system through special network
components.
• Just as bad as centralized services are centralized data, e.g.

keeping track of the telephone numbers and addresses of 50
million people would undoubtedly saturate all the
communication lines into and out of it. Likewise, if the
Internet’s Domain Name System (DNS) is implemented as a
single table and each request to resolve a URL had to be
forwarded to that one and only DNS server, it would be hard
to use it.
• Centralized algorithms are also a bad idea. In a large

distributed system, an enormous number of messages have
to be routed over many lines. Collecting and transporting all
the input and output information would again be a bad idea
because these messages would overload part of the network.
Geographical scalability
• It is currently hard to scale existing distributed
systems that were designed for LANs because they
are based on synchronous communication. A client
requesting service, blocks until a reply is sent back.
Building interactive applications using synchronous
communication in wide-area systems requires a
great deal of care.
• Another problem that hinders geographical

scalability is that communication in wide-area
networks is inherently unreliable, and virtually
always point-to-point.
Administrative Scalability
• A major problem here is conflicting policies with
respect to resource usage (and payment),
management, and security. Many components of a
distributed system that reside within a single
domain can often be trusted by users that operate
within that same domain. This trust however, does
not expand naturally across domain boundaries.
• If a distributed system expands into another

domain, two types of security measures need to be
taken:
1. The distributed system has to protect itself against

malicious attacks from the new domain. Users from
the new domain may have only read access to the
file system in its original domain. Likewise, facilities
such as expensive image setters or high-
performance computers may not be made available
to foreign users.
2. The new domain has to protect itself against

malicious attacks from the distributed system.
Scaling Techniques
There are basically three techniques for scaling:

1. Hiding communication latencies
2. Distribution
3. Replication
Scaling Techniques
• Hiding communication latencies is important to achieve

geographical scalability. You try to avoid waiting for responses
to remote (and potentially distant) service requests as much
as possible.
• For example, when a service has been requested from a

remote machine, an alternative to waiting for a reply from the
server is to do other useful work at the requester's side. When
a reply comes in, the application is interrupted and a special
handler is called to complete the previously-issued request.
• Alternatively, a new thread of control can be started to perform

the request. Although it blocks waiting for the reply, other
threads in the process can continue.
Scaling Techniques
• Some applications cannot make effective use of

asynchronous communication like in interactive
applications where a user sends a request and he
will generally have nothing better to do than to wait
for the answer.
• In such cases, a much better solution is to reduce

the overall communication, by moving part of the
computation that is normally done at the server to
the client process requesting the service
Scaling Techniques
• The Distribution Scaling Technique involves taking a

component, splitting it into smaller parts, and subsequently
spreading those parts across the system. An excellent
example of distribution is the Internet Domain Name System
(DNS).
• The DNS name space is hierarchically organized into a tree of

domains, which are divided into nonoverlapping zones, as
shown below. The names in each zone are handled by a
single name server.
• To resolve the name nl. vu.cs.flits, it is first passed to the

server of zone 21 which returns the address of the server for
zone 22, to which the rest of name, vu.cs.flits, can be handed.
Scaling Techniques
Scaling Techniques
• The server for 22 will return the address of the server

for zone 23, which is capable of handling the last part
of the name and will return the address of the
associated host.
• This example illustrates how the naming service, as

provided by DNS, is distributed across several
machines, thus avoiding that a single server has to
deal with all requests for name resolution
Scaling Techniques
• The replication technique replicates components across a

distributed system. It increases availability of the
component and also helps to balance the load between
components leading to better performance.
• In geographically widely-dispersed systems, having a copy

nearby can hide much of the communication latency
problems mentioned before.
• Caching is a special form of replication since it results in

making a copy of a resource in the proximity of the client
accessing that resource. In contrast to replication, caching
is a decision made by the client of a resource, and not by
the owner of a resource. Caching also happens on demand
whereas replication is often planned in advance.
Scaling Techniques
• One serious problem of replication and caching is

that modifying one copy makes that copy different
from the others and this leads to consistency
problems. The extent to which inconsistencies can
be tolerated depends highly on the usage of a
resource.
Scaling Techniques
• When considering these scaling techniques, size

scalability is the least problematic. Simply increasing
the capacity of a machine will the situation.
• Geographical scalability is a much tougher problem,

however, combining distribution, replication, and
caching techniques with different forms of
consistency will often prove sufficient in many cases.
• Administrative scalability seems to be the most

difficult one, because we need to solve nontechnical
problems (e.g., politics of organizations and human
collaboration).
Pitfalls
• By following a number of design principles,

distributed systems that strongly adhere to the
goals discussed before can be developed.
• Distributed systems differ from traditional software

because components are dispersed across a
network. Not taking this dispersion into account
makes so many systems complex and results in
mistakes.
Pitfalls
• The following false assumptions are often made

when developing a distributed application for the
first time:
1. The network is reliable.
2. The network is secure.
3. The network is homogeneous.
4. The topology does not change.
5. Latency is zero.
6. Bandwidth is infinite.
7. Transport cost is zero.
8. There is one administrator.
TYPES OF DISTRIBUTED SYSTEMS
 Distributed Computing Systems
 Distributed Information Systems
 Distributed Pervasive (Embedded)
Systems

Week 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 2

Uploaded by

Copyright:

Available Formats

DISTRIBUTED SYSTEMS DEVELOPMENT

Lecturer: Edison Kagona

• With Location transparency users should not tell where a

• With Relocation transparency resources can be

• Replication transparency hides the fact that several

• Consistency can be achieved through locking

• Distribution transparency is generally good but there

• There is also a trade-off between a high degree of

• There are situations where hiding distribution is not a

• Aiming for distribution transparency may be a nice

• The system may be extended at the hard ware level

• The designers of the Internet protocols introduced a series

• RFCs are not the only means of publication. For example,

o Open distributed systems are based on the provision of a

o Open distributed systems can be constructed from

• It describes the ability of the system to dynamically adjust its

• Scalability of a system can be measured along three different

• Unfortunately, a system that is scalable in one or more of these

Scaling with respect to size:

• Many services are implemented by means of only a single

• Using only a single server however is sometimes unavoidable

• Just as bad as centralized services are centralized data, e.g.

• Centralized algorithms are also a bad idea. In a large

• Another problem that hinders geographical

• If a distributed system expands into another

1. The distributed system has to protect itself against

2. The new domain has to protect itself against

There are basically three techniques for scaling:

• Hiding communication latencies is important to achieve

• For example, when a service has been requested from a

• Alternatively, a new thread of control can be started to perform

• Some applications cannot make effective use of

• In such cases, a much better solution is to reduce

• The Distribution Scaling Technique involves taking a

• The DNS name space is hierarchically organized into a tree of

• To resolve the name nl. vu.cs.flits, it is first passed to the

• The server for 22 will return the address of the server

• This example illustrates how the naming service, as

• The replication technique replicates components across a

• In geographically widely-dispersed systems, having a copy

• Caching is a special form of replication since it results in

• One serious problem of replication and caching is

• When considering these scaling techniques, size

• Geographical scalability is a much tougher problem,

• Administrative scalability seems to be the most

• By following a number of design principles,

• Distributed systems differ from traditional software

• The following false assumptions are often made

 Distributed Computing Systems

 Distributed Information Systems

 Distributed Pervasive (Embedded)

You might also like