Hadoop Distributed File System as a System for Handling Big Data

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now

Hadoop Distributed File System (HDFS) is a file system that aims to store large files with the ability to stream access to data, working on clusters from arbitrary hardware. Files in HDFS are divided into blocks if the file size exceeds the block size. By default, the block size in HDFS can be 64, 128, or 256 MB, and each block is replicated in triplicate. Such an architecture provides HDFS with the opportunity to analyze and handle substantial amounts of data – structured, semi-structured, and unstructured.

An HDFS cluster consists of a management node (NameNode) and data storage nodes (DataNodes). NameNode is a separate server with program code for managing the file system namespace that stores the tree of files, as well as meta-data of files and directories (Kaur, Bagga, & Mann, 2017).

DataNode is one of many cluster servers with program code responsible for file operations and working with data blocks. DataNode is a required component of the HDFS cluster that is responsible for writing and reading data, executing commands from the NameNode node to create, delete, and replicate blocks (Kadam, Deshmukh, & Dhainje, 2015). It also periodically sends status messages (heartbeats) and processes requests for reading and writing from clients of the HDFS file system.

When reading a file from HDFS, the client receives the address of the location of the blocks from the control node and independently performs a sequential reading from the nodes of the file blocks. In this case, the nearest node is selected for each block. Moreover, the client extracts data from data nodes directly; hence, HDFS is available for many competitive clients, as traffic is distributed between data nodes.

References

Kadam, A. M., Deshmukh, P. K., & Dhainje, P. B. (2015). A review on distributed file system in Hadoop. International Journal of Engineering Research & Technology (IJERT), 4(5), 14–18.

Kaur, G., Bagga, S., & Mann, K. S. (2017). Hadoop approach to cluster based cache oblivious Peano Curves. In 2017 IEEE 7th International Advance Computing Conference (IACC) (pp. 115–120). Hyderabad, India: Institute of Electrical and Electronics Engineers.

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now