Working in tandem, the Hadoop Distributed File System (HDFS) plays a
crucial role in managing large-scale data across distributed clusters. The basic structure
of HDFS comprises the NameNode, which manages the file system namespace and
client interactions, and the Data Nodes, which handle storage and data arrangement
duties. Although HDFS faces constraints, particularly regarding scaling the
NameNode, which is responsible for managing metadata, these constraints impact
namespace scalability and performance. Therefore, this paper discusses the two
architectures, the Quorum Journal Manager (QJM) components and the active and
standby NameNodes, which enable the sharing of edit logs between them. To mitigate
the risk of metadata corruption from split-brain scenarios, the system ensures that only
a single NameNode writes to the JournalNodes. Through the utilization of the QJM,
HDFS attains High Availability (HA), enabling swift failover in the event of machine
crashes or scheduled maintenance. Dynamic Federated Metadata Management
(DFMM) addresses the limitations of the current Hadoop architecture by distributing
metadata management across multiple federated components. It entails dispersing
metadata management among various federated components. HDFS can overcome the
problems of connected block storage, namespace scalability, and performance
bottlenecks by dynamically managing metadata on an exabyte scale. This work
summarizes the key findings and presents a comparative analysis of both architectures.
QJM performs a vital function that ensures the high availability, reliability, and fault
tolerance of the NameNode in Hadoop’s HDFS. DFMM enhances the fault tolerance,
scalability, and performance of distributed file systems by distributing metadata across
multiple servers or nodes.
Keywords: DFMM, Dynamic federated metadata management, Hadoop, HDFS, HDFS federation, QJM, Quorum journal manager.