This article proposes the Google File System (GFS). They introduced GFS to handle Google's huge data processing needs. GFS considers the following objectives: higher performance, scalability, reliability and availability. However, it is not easy to achieve these objectives, there are many obstacles. Therefore, to address the challenges, they considered using constant monitoring, fault detection, fault tolerance, and automatic recovery to address component failures that can impact system reliability and availability. The need to handle larger files is becoming very important as data continues to grow radically. Therefore, they considered changing I/O operations and block sizes. They also consider using append operations instead of overwrite to optimize performance and ensure atomicity. They also considered flexibility and simplicity when designing GFS. GFS supports the following operations: open, close, read, write, create, delete, snapshot (creates a copy of a file), and append (multiple users add data to the same file at the same time). They made six assumptions when designing GFS. First, the system should be able to detect, sustain and recover component failures. Secondly, larger files are the trend today and should be managed effectively. Third, read operations are performed many times, so sorting small reads should be considered to improve performance. Fourth, the trend now is to write large files that are usually not modified but added, so they consider append operation instead of update or overwrite. Fifth, since multiple clients might read from the same file at the same time, a semantics for this should be defined. Sixth, they felt that high and stable bandwidth is more important... middle of the paper... the primary master is down. OFS ensures data integrity by performing checksums to detect corrupted files. GFS also has diagnostic tools to debug and isolate problems and analyze performance. The GFS design and implementation team measured GFS by conducting three experiments. They are micro-benchmarks, real-world clustering, and workload sharing. They tried to address all the bottlenecks. During the design and implementation of GFS, the GFS team faced operational and technical challenges. Some of the problems involved disk and Linux issues. GFS provides location-independent namespace, replication, and high fault tolerance. However, GFS does not provide caching. In conclusion, GFS is useful for daily data processing rather than instant transactions like online banking transactions. The GFS team said that GFS met Google's storage needs.
tags