0% found this document useful (0 votes)

43 views30 pages

2 GFS

The Google File System (GFS) is a distributed file system designed and implemented by Google to meet the rapidly growing data storage needs of its data processing systems. GFS provides fault tolerance, high performance, reliability, availability and scalability on inexpensive commodity hardware. It uses a master-slave architecture with a single master and multiple chunkservers. Files are divided into fixed-size chunks which are replicated and stored across chunkservers for reliability. The master manages metadata and controls chunk replication and placement for load balancing.

Uploaded by

Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views30 pages

2 GFS

Uploaded by

Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Google File System

By Dinesh Amatya
Google File System (GFS)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

 designed and implemented to meet rapidly growing

demand of Google's data processing need
 a scalable distributed file system for large distributed
data-intensive applications
 provides fault tolerance while running on
inexpensive commodity hardware
 Performance , Reliability , Availability , scalability
GFS Design Overview:
Assumptions
 Component failure is norm : System components such as disks, machines and
links are ought to fail. So, the system should be able to detect and recover from
the failure by constantly monitoring
 File are huge : System stores millions of files most of which are multi-GB. So,
large files must be stored efficiently.
 Most modifications are appends: Random writes are practically non-existent.
Once written, files are seldom modified again and are read sequentially.
 Two types of read: Large streaming reads, Small random reads
 must efficiently implement well-defined semantics for multiple clients that
concurrently append to the same file
 Sustained bandwidth more important than low latency
GFS Design Overview: Interface

 Provides familiar file system interface

– Create, Delete, Open, Close, Read, Write

 Files are organized hierarchically in directories and identified by

pathnames
 GFS has snapshot and record append operations
 Snapshot creates a copy of a file or a directory tree at low cost
 Record append allows multiple clients to append data to the same file
concurrently while guaranteeing the atomicity of each individual
client’s append

–
GFS Design Overview: Architecture
GFS Design Overview: Architecture

→ Single Master
→ Multiple Chunkserver
→ Multiple Clients
→ Files are divided into Chunks
→ Chunk handle
→ Master
- maintains Metadata ( namespace, access control information,mapping from file to
chunks, current location of chunks)
- controls activities such as chunk lease management, garbage collection of
orphaned chunks, and chunk migration between chunkservers.
- communicates with each chunkserver in HeartBeat messages to give it
instructions and collect its state
GFS Design Overview: Architecture

→ GFS client code linked into each application implements the

file system API and communicates with the master and
chunkservers to read or write data on behalf of the application
→Clients interact with the master for metadata operations, but
all data-bearing communication goes directly to the
chunkservers
→ Neither the client nor the chunkserver caches file data
GFS Design Overview:
Single Master
→ vastly simplifies the design
→ enables the master to make sophisticated chunk placement
and replication decisions using global knowledge
→ However, one must minimize its involvement in reads and
writes so that it does not become a bottleneck
→ Clients never read and write file data through the master.
Instead, a client asks the master which chunkservers it should
contact. It caches this information for a limited time and interacts
with the chunkservers directly for many subsequent operations.
GFS Design Overview: Chunk Size

→ 64 MB – much larger than ordinary,

Why ?
- Advantages
. Reduce client-master interaction
. Reduce network overhead
. Reduce the size of metadata
- Disadvantage
. Hot spots – many clients accessing a 1-chunk file
GFS Design Overview: Metadata

→ Three major types

- File and chunk namespaces
- File to chunk mapping
- Location of chunk replicas
→ All kept in memory
- Fast
- Quick global scan (Garbage collection , Reorganization)
- 64 bytes per 64 MB of data
GFS Design Overview: Metadata

→ Chunk location
- obtained in master's memory by polling chunkservers at
startup
- updated using heartbeat messages
→ Operation Log
- contains historical record of metadata changes
- only persistent record of metadata
- master recovers its file system by replaying this log
- is critical, hence replicated
GFS Design Overview:
Consistency Model
→ A file region can be:
+ consistent: if clients see same data regardless of which replica they read from
+ defined: consistent, when a mutation succeeds without interference
from concurrent writers
+ undifined :Concurrent successful
Mutations
+ inconsistent: failed mutation
GFS Design Overview:
Consistency Model
→ After a sequence of successful mutations, the mutated file region is
guaranteed to be defined
→ GFS achieves this by
- applying mutations to a chunk in the same order on all its replicas
- using chunk version numbers to detect any replica that has become stale
because it has missed mutations while its chunkserver was down
GFS Design Overview:
System Interaction
→ Leases and mutation order
→ Data flow
→ Atomic record appends
→ Snapshot
System Interaction : Lease

→ Mutation: operation that changes the contents or metadata

→ Leases to maintain a consistent mutation order across replicas
→ Designed to minimize management overhead of master
→ Master grants lease to one replica which is called primary
→ Primary picks serial order of mutation and all replicas follow
→ 60 second timeout, can be extended
→ can be revoked
System Interaction : Data Flow

→ Decouples data flow and control flow

→ Control Flow
- Master → primary → secondary
→ Data Flow
- Carefully picked chain of chunk servers
. Forward to the closest first
. Pipelining to exploit full-duplex links
System Interaction :
Data Flow

Fig: Write Control and Data Flow

System Interaction :
Atomic Record Appends
→ Decouples data flow and control flow
→ Control Flow
- Master → primary → secondary
→ Data Flow
- Carefully picked chain of chunk servers
. Forward to the closest first
. Pipelining to exploit full-duplex links
→ Master grants lease to one replica which is called primary
→ Primary picks serial order of mutation and all replicas follow
→ 60 second timeout, can be extended
→ can be revoked
System Interaction :
Snapshot
→ Makes a copy of file of directory tree almost
instantaneously
→ Steps
- Revokes lease
- Logs operation to disk
GFS Design Overview:
Master Operation
→ executes all namespace operations
→ manages chunk replicas throughout the system
→ makes placement decisions, create new chunks
→ ensures chunks are fully replicated
→ balances load across all chunkservers
→ reclaim unused storage
Master Operation:
Namespace management and locking
→ logically represents its namespace as a lookup table mapping
full pathnames to metadata
→ each node in namespace tree has a read write lock
→ to access /d1/d2/leaf , need to lock /d1 , /d1/d2 and /d1/d2/leaf
→ can modify a directory concurrently. Each thread aquires
- a read lock on directory
- a write lock on a file
Master Operation:
Replica Placement
→hundreds of chunkservers spread across many
racks
→two purposes: maximize data reliability and
availability, and maximize network bandwidth utilization
→ spread chunk replicas across racks
→ tradeoff ?
Master Operation:
Creation, Re-replication, Re-balancing
→factors considered for creating replicas
- place new replicas on chunkserver with below-average disk
utilization
- limit the number of “recent” creation on chunkservers
- spread replicas of chunk across racks
→ re-replication priority
- chunk that is blocking client progress
- chunks for live files as opposed to chunks that belongs to recently
deleted files
- how far it is from its replication goal
Master Operation:
Creation, Re-replication, Re-balancing
→master picks highest priority chunk and instructs some
chunkserver to copy the chunk data directly from an existing
valid replica
→examines the current replica distribution and moves replicas
for better disk space and load balancing
→ remove replicas on chunkservers with below-average free
space
→gradually fills up a new chunkserver rather than instantly
swamping it with new chunks
Master Operation:
Garbage Collection
→GFS does not reclaim the available physical space after
deletion of a file
→does so lazily during regular garbage collection
→ file is renamed to a hidden name
→ can still be read under the new, special name and can
be undeleted by renaming
→ removed during master's regular scan if such file existed
for more than 3 days
Master Operation:
Stale Replica Detection
→replicas may become stale if a chunkserver fails and
misses mutations to the chunk while it is down
→master maintains a chunk version number to
distinguish between up-to-date and stale replicas
GFS Design Overview:
Fault Tolerance
→ greatest challenges in designing the system is
dealing with frequent component failures
Fault Tolerance:High Availability

→ Fast Recovery
→ Chunk Replication
→ Master Replication
Fault Tolerance:Data Integrity

→ uses checksumming to detect corruption of stored

data
→ chunk is broken up into 64KB blocks with
corresponding 32 bit checksum
→ chunkserver verifies the checksum before returning
data
References

 [Link]
 [Link]
 [Link]
 [Link]
 [Link]

Data-Driven Marketing Analyst Resume
No ratings yet
Data-Driven Marketing Analyst Resume
3 pages
Course Schedule and Proctor Details
No ratings yet
Course Schedule and Proctor Details
12 pages
GameCenter Startup Process Log Analysis
No ratings yet
GameCenter Startup Process Log Analysis
5 pages
Qustodio User Guide 2017 PDF
100% (1)
Qustodio User Guide 2017 PDF
135 pages
Figure 3.13 Transfer of Control With Multiple Interrupts
No ratings yet
Figure 3.13 Transfer of Control With Multiple Interrupts
10 pages
STRIDE Threat Modeling Framework
No ratings yet
STRIDE Threat Modeling Framework
2 pages
Information Systems Design Study Guide
No ratings yet
Information Systems Design Study Guide
16 pages
IM8 Washing Machine Programming Guide
No ratings yet
IM8 Washing Machine Programming Guide
17 pages
6.1.2 Perform Troubleshooting of Computer and Its Peripherals
No ratings yet
6.1.2 Perform Troubleshooting of Computer and Its Peripherals
4 pages
Pgdca
No ratings yet
Pgdca
45 pages
Anurag Kamboj - Software Engineer Profile
No ratings yet
Anurag Kamboj - Software Engineer Profile
1 page
Types of NoSQL Databases Explained
No ratings yet
Types of NoSQL Databases Explained
11 pages
Mass Update of Condition Pricing in SAP
No ratings yet
Mass Update of Condition Pricing in SAP
3 pages
Big Witch Hat Pattern Guide
No ratings yet
Big Witch Hat Pattern Guide
5 pages
HCMIS View Implementation Guide
100% (1)
HCMIS View Implementation Guide
7 pages
ES5473 MFP Driver Installation Guide
No ratings yet
ES5473 MFP Driver Installation Guide
4 pages
Xirrus Wi-Fi Load Balancing Guide
No ratings yet
Xirrus Wi-Fi Load Balancing Guide
12 pages
Roblox Simple Sex GUI Script
100% (2)
Roblox Simple Sex GUI Script
3 pages
Configuring IHS 6.1 with SSL for WebSphere
No ratings yet
Configuring IHS 6.1 with SSL for WebSphere
17 pages
Mastering IFIX
No ratings yet
Mastering IFIX
176 pages
Easy Guitar: Best of Creedence Clearwater
38% (8)
Easy Guitar: Best of Creedence Clearwater
3 pages
Leave Management System Proposal
75% (4)
Leave Management System Proposal
13 pages
ERP Implementation and Enterprise Size
No ratings yet
ERP Implementation and Enterprise Size
13 pages
AutoHotkey v2 Documentation Overview
No ratings yet
AutoHotkey v2 Documentation Overview
2,595 pages
Aubay Group: Commitment to Excellence
No ratings yet
Aubay Group: Commitment to Excellence
38 pages
Final Year CSE Project Ideas List
No ratings yet
Final Year CSE Project Ideas List
12 pages
Cache Installation for Taiko Music Pass
100% (2)
Cache Installation for Taiko Music Pass
10 pages
HandiTax 2024 Installation Log
No ratings yet
HandiTax 2024 Installation Log
37 pages
Honeywell LAAS Testing in Memphis
No ratings yet
Honeywell LAAS Testing in Memphis
45 pages
Zt220 Zt230
No ratings yet
Zt220 Zt230
4 pages

2 GFS

Uploaded by

2 GFS

Uploaded by

Google File System

 designed and implemented to meet rapidly growing

 Provides familiar file system interface

 Files are organized hierarchically in directories and identified by

→ GFS client code linked into each application implements the

→ 64 MB – much larger than ordinary,

→ Three major types

→ Mutation: operation that changes the contents or metadata

→ Decouples data flow and control flow

Fig: Write Control and Data Flow

→ uses checksumming to detect corruption of stored

You might also like