Google File System
The Google File System is a scalable distributed file system for large distributed data-intensive applications.
It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
Assumptions or Features of Google File System
- The system built from many inexpensive commodity components that often fail.
- It constantly monitors itself and detects, tolerates, and recover promptly from component failures on a routine basis.
- The system stores a modest number of large files.
- We expect a few million files, each typically 100 MB or larger in size.
- The workloads primarily consist of two kinds of reads: large streaming reads and small random reads.
- In large streaming reads, individual operations typically read hundreds of KBs, more commonly 1 MB or more.
- A small random read typically reads a few KBs at some arbitrary offset.
- The workloads also have many large, sequential writes that append data to files.
- The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file.
- Atomicity with minimal synchronization overhead is essential.
- High sustained bandwidth is more important than low latency.
Architecture of Google File System
A Google File System cluster consists of a single master and multiple chunk servers and accessed by multiple clients.
Each of these is typically a commodity Linux machine running a user-level server process.
- Files divided into fixed-size chunks.
- Each chunk identified by an immutable and globally unique 64-bit chunk handle assigned by the master at the time of chunk creation.
- Chunk servers store chunks on local disks as Linux files and read or write chunk data specified by a chunk handle and byte range.
- For reliability, each chunk replicated on multiple chunk servers.
- By default, we store three replicas, though users can designate different replication levels for different regions of the file namespace.
The master maintains all file system metadata.
- This includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks.
- It also controls system-wide activities such as chunk lease management, garbage collection of orphaned chunks, and chunk migration between chunk servers.
- The master periodically communicates with each chunk server in Heartbeat messages to give it instructions and collect its state.
- Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunk servers.
The interactions for a simple read with reference to figure is as follows:
- First, using the fixed chunk size, the client translates the file name and byte offset specified by the application into a chunk index within the file.
- Then, it sends the master a request containing the file name and chunk index.
- The master replies with the corresponding chunk handle and locations of the replicas.
- The client caches this information using the file name and chunk index as the key.
- The client then sends a request to one of the replicas, most likely the closest one.
- The request specifies the chunk handle and a byte range within that chunk.
- Further reads of the same chunk require no more client-master interaction until the cached information expires or the file reopened.
- In fact, the client typically asks for multiple chunks in the same request and the master can also include the information for chunks immediately following those requested.