0% found this document useful (0 votes)
25 views46 pages

Understanding Distributed Systems Basics

Uploaded by

yabsram94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views46 pages

Understanding Distributed Systems Basics

Uploaded by

yabsram94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter One

Introduction
Motivation
 Before the mid-80s, computers were
 Very expensive (hundred of thousands or even millions of dollars)
 Very slow (a few thousand instructions per second)
 Not connected among themselves
 After the mid-80s: two major developments
 Cheap and powerful microprocessor-based computers appeared
(including smartphones)
 Computer networks
 LANs at speeds ranging from 10 to 1000 Mbps (now even 10,
40, and 100Gbps)
 WANs at speed ranging from 64 Kbps to gigabits/sec
 Consequence
 Feasibility of using a large network of computers to work for the
same application; this is in contrast to the old centralized
systems where there was a single computer with its peripherals
1
Objectives of the Chapter
 We will discuss
 Alternative definitions of distributed systems
 Why we need distribution and problems in distribution
 Goals of a distributed system: transparency, openness, and
scalability
 Three types of distributed systems
 Distributed computing systems
 Distributed information systems
 Distributed pervasive systems

2
1.1 Introduction and Definition
 A distributed system is a collection of autonomous computing
elements that appears to its users as a single coherent system
(Tanenbaum & Van Steen)
 This definition has two aspects:
 Hardware: Autonomous computing elements, also referred to as
nodes, be they hardware devices or software processes
 We have to manage which nodes are part of the system
through group management mechanisms
 A group can be open (any device can send messages to any
other) or closed (only members of the group can
communicate with each other)
 A distributed system is usually organized as an overlay
network (a network which is built on top of another network,
e.g., P2P networks)
 Software: Single coherent system; users or applications
perceive a single system ⇒ nodes need to collaborate
3
Introduction and Definition (2)
 Other Definitions
A distributed system is a system designed to support the
development of applications and services which can exploit a
physical architecture consisting of multiple, autonomous
processing elements that do not share primary memory but
cooperate by sending asynchronous messages over a
communication network (Blair & Stefani)

A distributed system is one that stops you getting any work done
when a machine you have never even heard of crashes (Leslie)
A distributed system is one in which the failure of a computer you
didn’t even know existed can render your own computer unusable
(Lamport)

4
Introduction and Definition (3)
 In summary
 The computers that comprise of a distributed system can be
large or small, but they are geographically dispersed, why it is
called a distributed system
 The size may vary from a handful of devices to millions of
computers
 The interconnection network may be wired, wireless, or a
combination of both
 Distributed systems are often dynamic in the sense that
computers can join or leave with the topology and performance
continuously changing

5
1.1.1 Why Distributed?
 Resource and Data Sharing
 Printers, databases, multimedia servers, ...
 Availability, Reliability
 The loss of some instances can be hidden
 Scalability, Extensibility
 The system grows with demand (e.g., extra servers)
 Performance
 Huge power (CPU, memory, ...) available
 Inherent distribution, communication
 Organizational distribution, e-mail, video

6
1.1.2 Problems of Distribution
 Concurrency, Security
 Clients must not disturb each other
 Privacy
 e.g., when building a preference profile such as using cookies
 Unwanted communication such as spam
 Partial failure
 We often do not know where the error is (e.g., RPC)
 Location, Migration, Relocation, Replication
 Clients must be able to find their servers
 Heterogeneity
 Hardware, platforms, languages, management

7
1.1.3 Characteristics of Distributed Systems
 Differences between the computers and the ways they
communicate are hidden from users
 Users and applications can interact with a distributed system in a
consistent and uniform way regardless of location
 Distributed systems should be easy to expand and scale
 A distributed system is normally continuously available, even if
there may be partial failures

8
1.1.4 Middleware and Distributed Systems
 To support heterogeneous computers and networks and to provide a
single-system view, a distributed system is often organized by means
of a layer of software called middleware that extends over multiple
machines

A distributed system organized as middleware; note that the middleware


layer extends over multiple machines, and offers each application the
same interface

Ack: most diagrams in all slides are taken from the textbook
9
Middleware and Distributed Systems (2)
 In a sense, a middleware is the same to a distributed system as
what an operating system is to a computer: a manager of
resources
 In addition, it offers services such as
 Facilities for interapplication communication
 Security services
 Accounting services
 Masking of and recovery from failures

10
1.2 Goals of a Distributed System
 A distributed system should easily connect users with resources
(printers, computers, storage facilities, data, files, Web pages, ...)
 Some of the reasons
 Economics: sharing resources such as printers and high-
speed computers
 To collaborate and exchange information, e.g., BitTorrent
 e-commerce: buying and selling goods
 etc.
 A distributed system should be transparent (invisible): hide the fact
that the resources and processes are distributed across multiple
computers
 A distributed system should be open
 A distributed system should be scalable

11
Goals of a Distributed System (2)
 Transparency in a Distributed System
 A distributed system that is able to present itself to users and
applications as if it were only a single computer system is said
to be transparent
 Different forms of transparency in a distributed system
Transparency Description
Access Hide differences in machine architecture, data
representation (endianness, file naming, ...) and
how an object is accessed
Location Hide where an object is physically located; where
is [Link] (naming)
Migration Hide that an object may move to another location
Relocation Hide that an object may be moved to another
location while in use; e.g., mobile users using their
wireless laptops and moving from place to place
Note: an object refers to a resource or a process
12
Goals of a Distributed System (3)
Transparency Description
Replication Hide that an object is replicated (for availability
and performance); all replicas have the same
name
Concurrency Hide that an object may be shared by several
competitive users; an object must be left in a
consistent state; through locking
Failure Hide the failure and recovery of an object

 But trying to achieve all distribution transparency may be


impossible or may not be a good idea
 There are communication latencies that cannot be hidden
 Completely hiding failures of networks and nodes is impossible
 You cannot distinguish a slow computer from a failing one

13
Goals of a Distributed System (4)
 Openness in a Distributed System
 A distributed system should be open
 We need well-defined interfaces
 Interoperability
 Components of different origin can communicate
 Portability
 Components work on different platforms
 Another goal of an open distributed system is that it should be
flexible and extensible; easy to configure the system out of
different components; easy to add new components, to replace
existing ones; easier said than done
 An Open Distributed System is a system that offers services
according to standard rules that describe the syntax and
semantics of those services; e.g., protocols in networks
 Standards - a necessity
14
Goals of a Distributed System (5)
 In distributed systems, such services are often specified through
interfaces often described using an Interface Definition
Language (IDL)
 Specify only syntax: the names of the functions, types of
parameters, return values, possible exceptions, ...
 Semantics is given in an informal way by means of natural
languages

15
Goals of a Distributed System (6)
 Scalability in Distributed Systems
 A distributed system should be scalable; there are three
dimensions
 Size: adding more users and resources to the system without
any noticeable loss of performance
 Geographically: users and resources may be far apart but the
fact that communication delays may be significant is hardly
noticed
 Administratively: should be easy to manage even if it spans
many independent administrative organizations
 But a scalable system may exhibit performance and other
problems

16
Goals of a Distributed System (7)
 Size scalability problems leading to low performance
Concept Example
Single server (or a cluster of servers) for all
Centralized services
users - mostly for security reasons
Centralized data A single on-line telephone book
Doing routing based on complete
Centralized algorithms
information
 In these cases there are essentially three root causes for a server
becoming a bottleneck:
 The computational capacity, limited by the CPUs
 The storage capacity, including the I/O transfer rate
 The network between the user and the centralized service

17
Goals of a Distributed System (8)
 A major problem in administrative scalability is that of conflicting
policies with respect to resource usage (and payment),
management, and security

 Scaling Techniques: how to solve performance problems


 Performance problem arises as a result of limitations in the
capacity of servers and networks (for geographical scalability
with high latency and mostly unreliable links)
 Simply improving their capacity (e.g., by increasing memory,
upgrading CPUs, or replacing network modules) is often a
solution, referred to as scaling up
 When it comes to scaling out, that is, expanding the distributed
system by essentially deploying more machines, there are three
techniques: hiding communication latencies, distribution of work,
and replication

18
Goals of a Distributed System (9)
 Hide Communication Latencies (for geographical scalability)
 Try to avoid waiting for responses to remote service requests
 Let the requester do other useful job
 i.e., construct requesting applications that use only
asynchronous communication instead of synchronous
communication; when a reply arrives the application is
interrupted
 Good for batch processing and parallel applications since
independent tasks can be scheduled while another task is
waiting for communication to complete or use multithreading for
non-parallel programs
 Hiding communication latencies is not in general applicable for
interactive applications
 For interactive applications, try to reduce communication; move
part of the job to the client to reduce communication; e.g., filling
a form to access a database and checking the entries
19
Goals of a Distributed System (10)
 e.g., checking the completeness of mandatory fields
Client Server
M
U
First Name MULUNEH L
U
Last Name KEBEDE N
E-Mail MUKEB@[Link] E
H

(a) Check Form Process Form


Client Server

First Name MULUNEH MULUNEH


Last Name KEBEDE KEBEDE
MUKEB@[Link] MUKEB@[Link]
E-Mail

Check Form (b) Process Form

(a) A server checking the correctness of field entries (b) A client doing the job

 Shipping code is now supported in Web applications using Java


Applets and ActiveX controls (with some security issues)
20
Goals of a Distributed System (11)
 Partitioning and Distribution
 Means splitting a component into smaller parts and spreading
those parts across the system
 e.g., DNS - Domain Name System (abebe@[Link])
 Divide the name space into nonoverlapping zones
 For details, see later in Chapter 5 - Naming

An example of dividing the (original) DNS name space into zones

21
Goals of a Distributed System (12)
 Replication
 Replicate components across a distributed system to increase
availability and for load balancing, leading to better performance
 Also, in geographically widely dispersed systems, having a copy
nearby can hide much of the communication latency problems
 Replication is decided by the owner of a resource
 Caching (a special form of replication) also reduces
communication latency; decided by the user
 But, caching and replication may lead to consistency problems
(see Chapter 7 - Consistency and Replication)

22
Pitfalls when Developing Distributed Systems
 Because of false assumptions made by first time developers (of
distributed systems) which are related to the properties of
distributed systems and do not occur in nondistributed applications
 The network is reliable (making it difficult to achieve failure
transparency)
 The network is secure
 The network is homogeneous
 The topology does not change
 Latency is zero
 Bandwidth is infinite
 Transport cost is zero
 There is one administrator

23
1.3 Types of Distributed Systems
 Three types: distributed computing systems, distributed information
systems, and distributed pervasive systems
 Distributed Computing Systems
 Used for high-performance computing tasks
 Two groups: cluster computing and grid/cloud computing
 Cluster Computing
 A collection of similar workstations or PCs (homogeneous,
running the same operating system), closely connected by
means of a high-speed LAN
 Used for parallel programming in which a single compute
intensive program is run in parallel on multiple machines
 Each compute node runs the same operating system extended
with typical middleware functions for communication, storage,
fault tolerance, etc.
24
Types of Distributed Systems (2)

An example of a cluster computing system


 A master node runs a middleware (containing libraries for
parallel programs) and controls other compute nodes; it
 Allocates tasks
 Provides an interface to users, etc.
25
Types of Distributed Systems (3)
 Grid Computing
 “Resource sharing and coordinated problem solving in dynamic,
multi-institutional virtual organizations” (Ian Foster)
 High degree of heterogeneity: no assumptions are made
concerning hardware, operating systems, networks,
administrative domains, security policies, etc.
 Globus is a software system for Grid Computing
 Read about the Globus Alliance at [Link]
 From the perspective of grid computing, a next logical step is to
simply outsource the entire infrastructure that is needed for
compute-intensive applications
 That is what cloud computing is all about: providing the facilities
to dynamically construct an infrastructure and compose what is
needed from available services

26
Types of Distributed Systems (4)
 Cloud Computing
 A recent and more general approach to grid computing
 It is an alternative to maintaining huge local infrastructures, i.e.,
one of the important reasons to migrate to a cloud environment
is that it may be much cheaper compared to maintaining a local
computing infrastructure
 Cloud computing is characterized by an easily usable and
accessible pool of virtualized resources
 Which and how resources are used can be configured
dynamically, providing the basis for scalability: if more work
needs to be done, a customer can simply acquire more
resources
 It is generally based on a pay-per-use model in which
guarantees are offered by means of customized service level
agreements (SLAs)
 Note: Details on how specific cloud computations are actually
carried out are generally hidden

27
Types of Distributed Systems (5)

 Clouds are organized into four layers


a. Hardware: The lowest layer is formed by the means to manage
the necessary hardware: processors, routers, power and cooling
systems
 It is implemented at data centers and contains the resources
that customers never see directly
28
Types of Distributed Systems (6)
b. Infrastructure: Provides customers an infrastructure consisting of
virtual storage and computing resources
 Cloud computing evolves around allocating and managing
virtual storage devices and virtual servers

c. Platform: This layer provides to a cloud computing customer what


an operating system provides to application developers: the means
to easily develop and deploy applications that need to run in a
cloud
 In practice, an application developer is offered a vendor-specific
API, which includes calls to uploading and executing a program
in that vendor’s cloud

29
Types of Distributed Systems (7)
d. Application: Actual applications run in this layer and are offered to
users for further customization
 Well-known examples include those found in office suites (text
processors, spreadsheet applications, presentation applications,
and so on)
 They are executed in the vendor’s cloud
 They can be compared to the traditional suite of applications
that are shipped when installing an operating system

30
Types of Distributed Systems (8)
 Types of services
 Cloud-computing providers offer these layers to their customers
through various interfaces (including command-line tools,
programming interfaces, and Web interfaces), leading to three
different types of services
 Infrastructure as a Service (IaaS): covering the hardware and
infrastructure layer (basic infrastructure)
 Platform as a Service (PaaS): covering the platform layer
(system-level services)
 Software as a Service (SaaS): contains actual applications
 Some obstacles in cloud computing
 Provider/vendor lock-in
 Security and privacy issues
 Dependency on the availability of services
31
Types of Distributed Systems (9)
 Distributed Information Systems
 An organization may have many networked applications
 Problem: interoperability
 The issue is how to integrate applications into an enterprise-
wide information system
 A networked application simply consists of a server running that
application (often including a database) and making it available
to remote programs, called clients
 Such clients send a request to the server for executing a specific
operation, after which a response is sent back
 Integration at the lowest level: wrap a number of requests into a
single larger request and have it executed as a distributed
transaction; all or none of the requests would be executed
 Integration should also take place by letting applications
communicate directly with each other, i.e., Enterprise Application
Integration (EAI)
32
Types of Distributed Systems (10)
a. Transaction Processing Systems
 Consider database applications
 Special primitives are required to program transactions, supplied
either by the underlying distributed system or by the language
runtime system
 Exact list of primitives depends on the type of application;
procedure calls, ordinary statements, etc. can also be included
Primitive Description
BEGIN_TRANSACTION Mark the start of a transaction
END_TRANSACTION Terminate the transaction and try to commit

ABORT_TRANSACTION Kill the transaction and restore the old


values
READ Read data from a file, a table, etc.
WRITE Write data to a file, a table, etc.

33
Types of Distributed Systems (11)
 The Transaction Model
 The model for transactions comes from the world of business
 A supplier and a retailer negotiate on
 Price
 Delivery date
 Quality
 etc.
 Until the deal is concluded they can continue negotiating or one
of them can terminate
 But once they have reached an agreement they are bound by
law to carry out their part of the deal
 Transactions between processes is similar with this scenario

34
Types of Distributed Systems (12)
 e.g., assume the following banking operation
 Withdraw an amount x from account 1
 Deposit the amount x to account 2
 What happens if there is a problem after the first activity is
carried out?
 Group the two operations into one transaction; either both are
carried out or neither
 We need a way to roll back when a transaction is not completed

35
Types of Distributed Systems (13)
 e.g. reserving a seat from Manchester to Lalibella through
Heathrow and AA Bole airports

BEGIN_TRANSACTION BEGIN_TRANSACTION
reserve Man → Heathrow; reserve Man → Heathrow;
reserve Heathrow → Bole; reserve Heathrow → Bole;
reserve Bole → Lalibella; reserve Bole → Lalibella full ⇒
END_TRANSACTION ABORT_TRANSACTION

(a) (b)
(a) Transaction to reserve three flights commits
(b) Transaction aborts when the third flight is unavailable

36
Types of Distributed Systems (14)
 Properties of transactions, often referred to as ACID
 Atomic: to the outside world, the transaction happens
indivisibly; a transaction either happens completely or not at all;
intermediate states are not seen by other processes
 Consistent: the transaction does not violate system invariants;
e.g., in an internal transfer in a bank, the amount of money in
the bank must be the same as it was before the transfer (the
law of conservation of money); this may be violated for a brief
period of time, but not seen to other processes
 Isolated or Serializable: concurrent transactions do not interfere
with each other; if two or more transactions are running at the
same time, the final result must look as though all transactions
run sequentially in some order
 Durable: once a transaction commits, the changes are
permanent; see later in Chapter 8 - Fault Tolerance
37
Types of Distributed Systems (15)
 Classification of Transactions
 A transaction could be flat or nested
 Flat Transaction
 Consists of a series of operations that satisfy the ACID
properties
 Simple and widely used but with some limitations
 Does not allow partial results to be committed or aborted
 i.e., atomicity is also partly a weakness
 In our airline reservation example, we may want to
accept the first two reservations and find an alternative
one for the last
 Some transactions may take too much time

38
Types of Distributed Systems (16)
 Nested Transaction
 Constructed from a number of subtransactions; it is logically
decomposed into a hierarchy of subtransactions; the flight
reservation can be split into three transactions, each accessing
a different database
 The top-level transaction forks off children that run in parallel, on
different machines; to gain performance or for programming
simplicity Nested transaction
 Each may also execute one Subtransaction Subtransaction
or more subtransactions
 Permanence (durability)
applies only to the top-level
transaction; commits by Airline database Hotel database
children should be undone
Two different (independent) databases

39
Types of Distributed Systems (17)
b. Enterprise Application Integration
 How to integrate applications independent from their databases
 Application components should be able to communicate directly
with each other and not merely by means of the request/reply
behavior that was supported by transaction processing systems
 How can applications communicate with each other; by means
of a middleware
 There are different communication models
 RPC (Remote Procedure Call)
 MOM (Message-Oriented Middleware)
 Multicast Communication
 See later in Chapter 4 - Communication

40
Types of Distributed Systems (18)

Middleware as a communication facilitator in


enterprise application integration

41
Types of Distributed Systems (19)
 Distributed Pervasive Systems
 The distributed systems discussed so far are characterized by
their stability; fixed nodes having high-quality connection to a
network
 There are also mobile and embedded computing devices which
are small, battery-powered, mobile, and with a wireless
connection leading to what are generally referred to as
pervasive systems

 There are three different types of pervasive systems (with an


overlap between them)
 Ubiquitous computing systems, mobile systems, and sensor
networks
 For details, read pages 40 - 50

42
[ Diversion
 Different approaches to distribution - Lost in the forest of
distribution
 Distributed System
 N autonomous computers (sites): n administrators, n
data/control flows
 An interconnection network
 User view: one single (virtual) system
 (traditional) programmer view: client-server
 Parallel System
 1 computer, n nodes: one administrator, one scheduler, one
power source
 Memory: it depends (shared or separate)
 Programmer view: one single machine executing parallel
codes; various programming models (message passing,
distributed shared memory, …)
43
Diversion (2)
 Cluster Computing
 Use of PCs interconnected by a (high performance) network as
a parallel (cheap) machine
 Network Computing
 From LAN (cluster) computing to WAN computing
 Set of machines distributed over a MAN/WAN that are used to
execute parallel loosely coupled codes
 Depending on the infrastructure, network computing comes in
many flavours: grid/cloud computing, P2P, Internet computing,
etc.
a. Grid/Cloud Computing
 Grid Computing: “Resource sharing and coordinated problem
solving in dynamic, multi-institutional virtual organizations”
(Ian Foster)
 Cloud Computing: A general term for anything that involves
delivering hosted services over the Internet

44
Diversion (3)
b. Peer-to-Peer Computing
 A site is both client and server
 Application: mostly file sharing, but also others like Internet
Telephony (Skype)
 2 approaches:
 Centralized management: Napster
 Distributed management: Gnutella, Kazaa
c. Internet Computing
 Use of (idle) computers interconnected by Internet for
processing large throughput applications
 Programmer view: a single master, n servants
]

45

You might also like