AI and ML Solutions Driving Modern Farming and Urban Innovation

HDFS: The Backbone of Big Data - A Review of High Availability, Scalability, and Performance Using Quorum Journal Manager and Dynamic Federated Metadata Management

Author(s): Bhagavathi Santos Kumar and Virendra Kumar Shrivastava *

Pp: 180-195 (16)

DOI: 10.2174/9798898812102125030016

* (Excluding Mailing and Handling)

Abstract

Working in tandem, the Hadoop Distributed File System (HDFS) plays a crucial role in managing large-scale data across distributed clusters. The basic structure of HDFS comprises the NameNode, which manages the file system namespace and client interactions, and the Data Nodes, which handle storage and data arrangement duties. Although HDFS faces constraints, particularly regarding scaling the NameNode, which is responsible for managing metadata, these constraints impact namespace scalability and performance. Therefore, this paper discusses the two architectures, the Quorum Journal Manager (QJM) components and the active and standby NameNodes, which enable the sharing of edit logs between them. To mitigate the risk of metadata corruption from split-brain scenarios, the system ensures that only a single NameNode writes to the JournalNodes. Through the utilization of the QJM, HDFS attains High Availability (HA), enabling swift failover in the event of machine crashes or scheduled maintenance. Dynamic Federated Metadata Management (DFMM) addresses the limitations of the current Hadoop architecture by distributing metadata management across multiple federated components. It entails dispersing metadata management among various federated components. HDFS can overcome the problems of connected block storage, namespace scalability, and performance bottlenecks by dynamically managing metadata on an exabyte scale. This work summarizes the key findings and presents a comparative analysis of both architectures. QJM performs a vital function that ensures the high availability, reliability, and fault tolerance of the NameNode in Hadoop’s HDFS. DFMM enhances the fault tolerance, scalability, and performance of distributed file systems by distributing metadata across multiple servers or nodes.


Keywords: DFMM, Dynamic federated metadata management, Hadoop, HDFS, HDFS federation, QJM, Quorum journal manager.