0% found this document useful (0 votes)

14 views8 pages

File Models and Caching in Distributed Systems

The document discusses various file models used in distributed systems, including unstructured and structured files, mutable and immutable files, and different file accessing models such as remote service and data-caching models. It also covers file caching schemes, modification propagation, cache validation, and file replication, highlighting their advantages and challenges. Key concepts include the importance of caching for performance, the need for consistency in replicated files, and the mechanisms to achieve reliability and scalability in distributed file systems.

Uploaded by

malikaisanangel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

File Models and Caching in Distributed Systems

Uploaded by

malikaisanangel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Subject: DC Semester:VIII

File Models:
Unstructured and Structured files
In the unstructured model, a file is an unstructured sequence of bytes. The interpretation
of the meaning and structure of the data stored in the files is up to the application (e.g.,
UNIX and MS-DOS). Most modern operating systems use the unstructured file model.
In structured files (rarely used now) a file appears to the file server as an ordered
sequence of records. Records of different files of the same file system can be of different
sizes.

2. Mutable and immutable files

Based on the modifiability criteria, files are of two types, mutable and immutable. Most
existing operating systems use the mutable file model. An update performed on a file
overwrites its old contents to produce the new contents.
In the immutable model, rather than updating the same file, a new version of the file is
created each time a change is made to the file contents and the old version is retained
unchanged. The problems in this model are increased use of disk space and increased
disk activity.
File Accessing Models
This depends on the method used for accessing remote files and the unit of data access.

1. Accessing remote files

A distributed file system may use one of the following models to service a client’s file
access request when the accessed file is remote:

a. Remote service model

Processing of a client’s request is performed at the servers node. Thus, the clients request
for file access is delivered across the network as a message to the server, the server
machine performs the access request, and the result is sent to the client. Need to minimize
the number of messages sent and the overhead per message.

b. Data-caching model
This model attempts to reduce the network traffic of the previous model by caching the
data obtained from the server node. This takes advantage of the locality feature of the
found in file accesses. A replacement policy such as LRU is used to keep the cache size
bounded.

While this model reduces network traffic it has to deal with the cache coherency
problem during writes, because the local cached copy of the data needs to be updated, the
original file at the server node needs to be updated and copies in any other caches need to
be updated.

Comparison of Data-caching model over the Remote service model:

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

File Caching Schemes

Every distributed file system uses some form of caching. The reasons are:

1. Better performance since repeated accesses to the same information is handled additional
network accesses and disk transfers. This is due to locality in file access patterns.

2. It contributes to the scalability and reliability of the distributed file system since data can be
remotely cached on the client node.

Key decisions to be made in file-caching scheme for distributed systems:

1. Cache location
2. Modification Propagation
3. Cache Validation

Cache Location

This refers to the place where the cached data is stored. Assuming that the original location of a
file is on its servers’ disk, there are three possible cache locations in a distributed file system:

1. Servers main memory

In this case a cache hit costs one network access.
It does not contribute to scalability and reliability of the distributed file system.

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

Since we every cache hit requires accessing the server.

Advantages:
a. Easy to implement
b. Totally transparent to clients
c. Easy to keep the original file and the cached data consistent.

2. Client’s disk
In this case a cache hit costs one disk access. This is somewhat slower than having the
cache in server’s main memory. Having the cache in server’s main memory is also
simpler.

Advantages:
a. Provides reliability against crashes since modification to cached data is lost in a
crash if the cache is kept in main memory.
b. Large storage capacity.
c. Contributes to scalability and reliability because on a cache hit the access request
can be serviced locally without the need to contact the server.

3. Clients main memory

Eliminates both network access cost and disk access cost. This technique is not preferred
to a clients disk cache when large cache size and increased reliability of cached data are
desired.

Advantages:
a. Maximum performance gain.
b. Permit’s workstations to be diskless.
c. Contributes to reliability and scalability.

Modification Propagation

When the cache is located on clients’ nodes, a files data may simultaneously be cached on
multiple nodes. It is possible for caches to become inconsistent when the file data is changed by
one of the clients and the corresponding data cached at other nodes are not changed or discarded.

There are two design issues involved:

1. When to propagate modifications made to a cached data to the corresponding file server.
2. How to verify the validity of cached data.

The modification propagation scheme used has a critical effect on the systems performance and
reliability. Techniques used include:

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

a. Write-through scheme.
When a cache entry is modified, the new value is immediately sent to the server for updating the
master copy of the file.

Advantage:
High degree of reliability and suitability for UNIX-like semantics.
This is due to the fact that the risk of updated data getting lost in the event of a client crash is
very low since every modification is immediately propagated to the server having the master
copy.

Disadvantage:
This scheme is only suitable where the ratio of read-to-write accesses is fairly large. It does
not reduce network traffic for writes.
This is due to the fact that every write access has to wait until the data is written to the master
copy of the server. Hence the advantages of data caching are only read accesses because the
server is involved for all write accesses.

b. Delayed-write scheme.
To reduce network traffic for writes the delayed-write scheme is used. In this case, the new data
value is only written to the cache and all updated cache entries are sent to the server at a later
time.

There are three commonly used delayed-write approaches:

i. Write on ejection from cache
Modified data in cache is sent to server only when the cache-replacement policy has
decided to eject it from clients cache. This can result in good performance but there can
be a reliability problem since some server data may be outdated for a long time.

ii. Periodic write

The cache is scanned periodically and any cached data that has been modified since the
last scan is sent to the server.

iii. Write on close

Modification to cached data is sent to the server when the client closes the file. This does
not help much in reducing network traffic for those files that are open for very short
periods or are rarely modified.

Advantages of delayed-write scheme:

1. Write accesses complete more quickly because the new value is written only client
cache. This results in a performance gain.

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

2. Modified data may be deleted before it is time to send to send them to the server (e.g.,
temporary data). Since modifications need not be propagated to the server this results in a
major performance gain.

3. Gathering of all file updates and sending them together to the server is more efficient
than sending each update separately.
Disadvantage of delayed-write scheme:
Reliability can be a problem since modifications not yet sent to the server from a client’s cache
will be lost if the client crashes.

Cache Validation schemes

The modification propagation policy only specifies when the master copy of a file on the server
node is updated upon modification of a cache entry. It does not tell anything about when the file
data residing in the cache of other nodes is updated.

A file data may simultaneously reside in the cache of multiple nodes. A clients cache entry
becomes stale as soon as some other client modifies the data corresponding to the cache entry in
the master copy of the file on the server.

It becomes necessary to verify if the data cached at a client node is consistent with the master
copy. If not, the cached data must be invalidated and the updated version of the data must be
fetched again from the server.

There are two approaches to verify the validity of cached data: the client-initiated approach and
the server-initiated approach.

Client-initiated approach
The client contacts the server and checks whether its locally cached data is consistent with the
master copy. Two approaches may be used:

1. Checking before every access.

This defeats the purpose of caching because the server needs to be contacted on every
access.

2. Periodic checking.
A check is initiated every fixed interval of time.

Disadvantage of client-initiated approach: If frequency of the validity check is high, the cache
validation approach generates a large amount of network traffic and consumes precious server
CPU cycles.

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

Server-initiated approach
A client informs the file server when opening a file, indicating whether a file is being opened for
reading, writing, or both. The file server keeps a record of which client has which file open and
in what mode.
So, server monitors file usage modes being used by different clients and reacts whenever it
detects a potential for inconsistency. E.g., if a file is open for reading, other clients may be
allowed to open it for reading, but opening it for writing cannot be allowed. So also, a new client
cannot open a file in any mode if the file is open for writing.

When a client closes a file, it sends intimation to the server along with any modifications made to
the file. Then the server updates its record of which client has which file open in which mode.

When a new client makes a request to open an already open file and if the server finds that the
new open mode conflicts with the already open mode, the server can deny the request, queue the
request, or disable caching by asking all clients having the file open to remove that file from their
caches.

Note: On the web, the cache is used in read-only mode so cache validation is not an issue.

Disadvantage: It requires that file servers be stateful. Stateful file servers have a distinct
disadvantage over stateless file servers in the event of a failure.

File Replication

High availability is a desirable feature of a good distributed file system and file replication is the
primary mechanism for improving file availability.

A replicated file is a file that has multiple copies, with each file on a separate file server.

Difference Between Replication and Caching

1. A replica of a file is associated with a server, whereas a cached copy is normally

associated with a client.
2. The existence of a cached copy is primarily dependent on the locality in file access
patterns, whereas the existence of a replica normally depends on availability and
performance requirements.
3. As compared to a cached copy, a replica is more persistent, widely known, secure,
available, complete, and accurate.
4. A cached copy is contingent upon a replica. Only by periodic revalidation with respect
to a replica can a cached copy be useful.

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

Advantages of Replication

1. Increased Availability:
Alternate copies of a replicated data can be used when the primary copy is unavailable.

2. Increased Reliability:
Due to the presence of redundant data files in the system, recovery from catastrophic
failures (e.g., hard drive crash) becomes possible.

3. Improved response time:

It enables data to be accessed either locally or from a node to which access time is lower
than the primary copy access time.

4. Reduced network traffic:

If a files replica is available with a file server that resides on a clients node, the clients
access request can be serviced locally, resulting in reduced network traffic.

5. Improved system throughput:

Several clients request for access to a file can be serviced in parallel by different servers,
resulting in improved system throughput.

6. Better scalability:
Multiple file servers are available to service client requests since due to file replication.
This improves scalability.

Replication Transparency

Replication of files should be transparent to the users so that multiple copies of a replicated file
appear as a single logical file to its users. This calls for the assignment of a single identifier/name
to all replicas of a file.

In addition, replication control should be transparent, i.e., the number and locations of replicas of
a replicated file should be hidden from the user. Thus, replication control must be handled
automatically in a user-transparent manner.

Multicopy Update Problem

Maintaining consistency among copies when a replicated file is updated is a major design issue
of a distributed file system that supports file replication.

1. Read-only replication
In this case the update problem does not arise. This method is too restrictive.

Prof.S.S. Aloni [Link] Computer Engineering

Subject: DC Semester:VIII

2. Read-Any-Write-All Protocol
A read operation on a replicated file is performed by reading any copy of the file and a
write operation by writing to all copies of the file. Before updating any copy, all copies
need to be locked, then they are updated, and finally the locks are released to complete
the write.

Disadvantage: A write operation cannot be performed if any of the servers having a copy
of the replicated file is down at the time of the write operation.

3. Available-Copies Protocol
A read operation on a replicated file is performed by reading any copy of the file and a
write operation by writing to all available copies of the file. Thus if a file server with a
replica is down, its copy is not updated. When the server recovers after a failure, it brings
itself up to date by copying from other servers before accepting any user request.

4. Primary-Copy Protocol
For each replicated file, one copy is designated as the primary copy and all the others are
secondary copies. Read operations can be performed using any copy, primary or
secondary. But write operations are performed only on the primary copy. Each server
having a secondary copy updates its copy either by receiving notification of changes from
the server having the primary copy or by requesting the updated copy from it.

E.g., for UNIX-like semantics, when the primary-copy server receives an update request,
it immediately orders all the secondary-copy servers to update their copies. Some form of
locking is used and the write operation completes only when all the copies have been
updated. In this case, the primary-copy protocol is simply another method of
implementing the read-any-write-all protocol.

Prof.S.S. Aloni [Link] Computer Engineering

File Caching in Distributed Systems
No ratings yet
File Caching in Distributed Systems
10 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
37 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
46 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
6 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
86 pages
Distributed UNIT 5
No ratings yet
Distributed UNIT 5
15 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
16 pages
Features of Distributed File Systems
No ratings yet
Features of Distributed File Systems
15 pages
Maintaining Cache Consistency Explained
No ratings yet
Maintaining Cache Consistency Explained
4 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
12 pages
Study of Distributed File Systems
No ratings yet
Study of Distributed File Systems
5 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
28 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
42 pages
Advantages of Distributed File Systems
No ratings yet
Advantages of Distributed File Systems
68 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
49 pages
DS Notes UNIT 5
No ratings yet
DS Notes UNIT 5
10 pages
Distributed File System Design Issues
No ratings yet
Distributed File System Design Issues
17 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
28 pages
Key Elements of Cache Design
No ratings yet
Key Elements of Cache Design
6 pages
RAID Comparison: Benefits & Drawbacks
No ratings yet
RAID Comparison: Benefits & Drawbacks
5 pages
File Accessing Models Overview
No ratings yet
File Accessing Models Overview
18 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
28 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
16 pages
Railway Reservation System Implementation
No ratings yet
Railway Reservation System Implementation
8 pages
Mobile Cache Management Prototype
No ratings yet
Mobile Cache Management Prototype
34 pages
Distributed Operating Systems Overview
No ratings yet
Distributed Operating Systems Overview
27 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
28 pages
Understanding Remote Procedure Calls
No ratings yet
Understanding Remote Procedure Calls
2 pages
Advantages and Disadvantages of DFS
No ratings yet
Advantages and Disadvantages of DFS
42 pages
Distributed File Systems Overview
No ratings yet
Distributed File Systems Overview
46 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
47 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
9 pages
Network File System Caching Explained
No ratings yet
Network File System Caching Explained
3 pages
Distributed Shared Memory Algorithms
No ratings yet
Distributed Shared Memory Algorithms
57 pages
Understanding Cache Caching in Microprocessors
No ratings yet
Understanding Cache Caching in Microprocessors
51 pages
Security and Applications in Distributed Systems
No ratings yet
Security and Applications in Distributed Systems
10 pages
Distributed Resource Management in OS
No ratings yet
Distributed Resource Management in OS
63 pages
I/O Systems and File Management Concepts
No ratings yet
I/O Systems and File Management Concepts
14 pages
Understanding Cache in Computing Systems
No ratings yet
Understanding Cache in Computing Systems
15 pages
Distributed File System Design Overview
100% (1)
Distributed File System Design Overview
30 pages
Understanding Cache Memory and Locality
No ratings yet
Understanding Cache Memory and Locality
34 pages
Understanding Cache in Computer Systems
No ratings yet
Understanding Cache in Computer Systems
15 pages
Understanding Cache Memory Systems
No ratings yet
Understanding Cache Memory Systems
24 pages
Principle of Locality and Memory Hierarchy
No ratings yet
Principle of Locality and Memory Hierarchy
7 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
20 pages
DBMS Transaction Control and DFS Overview
No ratings yet
DBMS Transaction Control and DFS Overview
36 pages
File Models and Replication Explained
No ratings yet
File Models and Replication Explained
2 pages
Cache Memory: Mapping & Replacement Techniques
No ratings yet
Cache Memory: Mapping & Replacement Techniques
36 pages
Understanding Cache Memory Basics
No ratings yet
Understanding Cache Memory Basics
24 pages
Non-Technical Interview Tips for Engineers
No ratings yet
Non-Technical Interview Tips for Engineers
5 pages
Brocade Supportshow and Supportsave Guide
No ratings yet
Brocade Supportshow and Supportsave Guide
3 pages
Customizable Codec Pack Installation Guide
No ratings yet
Customizable Codec Pack Installation Guide
3 pages
FTC Training Manual for Android Studio
No ratings yet
FTC Training Manual for Android Studio
109 pages
UCSC Student Project Collaboration
No ratings yet
UCSC Student Project Collaboration
9 pages
Telematics Course Overview and Objectives
No ratings yet
Telematics Course Overview and Objectives
5 pages
Multithreading and Vector Processing Overview
No ratings yet
Multithreading and Vector Processing Overview
7 pages
NoLimitDrones User Guide and Modding Tips
No ratings yet
NoLimitDrones User Guide and Modding Tips
20 pages
Verilog Memory Initialization Techniques
No ratings yet
Verilog Memory Initialization Techniques
1 page
Essential Netiquette for Virtual Classrooms
No ratings yet
Essential Netiquette for Virtual Classrooms
22 pages
OCR GCSE Computer Science 2025 Predictions
100% (1)
OCR GCSE Computer Science 2025 Predictions
12 pages
Week-4 Soft-W Engg
No ratings yet
Week-4 Soft-W Engg
30 pages
DIP Commissioning Procedure Guide
No ratings yet
DIP Commissioning Procedure Guide
46 pages
Overview of System.Windows.Forms Namespace
No ratings yet
Overview of System.Windows.Forms Namespace
42 pages
Key Features of YouTube Explained
No ratings yet
Key Features of YouTube Explained
2 pages
661FXME Foxconn Easy Guide en 08-04-04
No ratings yet
661FXME Foxconn Easy Guide en 08-04-04
8 pages
Audida Hidden PDF Overview
No ratings yet
Audida Hidden PDF Overview
402 pages
Msi Geforce RTX™ 3070 Gaming X Trio Datasheet
No ratings yet
Msi Geforce RTX™ 3070 Gaming X Trio Datasheet
1 page
Samsung Mobile Service Management Profile
No ratings yet
Samsung Mobile Service Management Profile
3 pages
6502 AND Instruction Overview
No ratings yet
6502 AND Instruction Overview
8 pages
Considerations for SaaS, PaaS, IaaS Adoption
No ratings yet
Considerations for SaaS, PaaS, IaaS Adoption
5 pages
Optimize Cache with CacheTune Tool
No ratings yet
Optimize Cache with CacheTune Tool
38 pages
Internet Connection Types and Troubleshooting
No ratings yet
Internet Connection Types and Troubleshooting
9 pages
Poly G7500 Datasheet Overview
No ratings yet
Poly G7500 Datasheet Overview
3 pages
Sunday School Records Management System
50% (2)
Sunday School Records Management System
47 pages
8086 Microprocessor Register Overview
No ratings yet
8086 Microprocessor Register Overview
18 pages
IT Essentials Chapter 1 Quiz Answers
100% (3)
IT Essentials Chapter 1 Quiz Answers
9 pages
Google Data Mining Techniques
100% (2)
Google Data Mining Techniques
37 pages
Datasheet T1000
No ratings yet
Datasheet T1000
2 pages
Connecting Hardware Peripherals Guide
No ratings yet
Connecting Hardware Peripherals Guide
5 pages

File Models and Caching in Distributed Systems

Uploaded by

File Models and Caching in Distributed Systems

Uploaded by

Subject: DC Semester:VIII

2. Mutable and immutable files

1. Accessing remote files

a. Remote service model

Comparison of Data-caching model over the Remote service model:

Prof.S.S. Aloni [Link] Computer Engineering

File Caching Schemes

Key decisions to be made in file-caching scheme for distributed systems:

1. Servers main memory

Prof.S.S. Aloni [Link] Computer Engineering

Since we every cache hit requires accessing the server.

3. Clients main memory

There are two design issues involved:

Prof.S.S. Aloni [Link] Computer Engineering

There are three commonly used delayed-write approaches:

ii. Periodic write

iii. Write on close

Advantages of delayed-write scheme:

Prof.S.S. Aloni [Link] Computer Engineering

Cache Validation schemes

1. Checking before every access.

Prof.S.S. Aloni [Link] Computer Engineering

Difference Between Replication and Caching

1. A replica of a file is associated with a server, whereas a cached copy is normally

Prof.S.S. Aloni [Link] Computer Engineering

3. Improved response time:

4. Reduced network traffic:

5. Improved system throughput:

Multicopy Update Problem

Prof.S.S. Aloni [Link] Computer Engineering

Prof.S.S. Aloni [Link] Computer Engineering

You might also like