Home:ALL Converter>How does the CAP Theorem apply on HDFS?

How does the CAP Theorem apply on HDFS?

Ask Time:2019-11-11T13:55:48         Author:Pallav Doshi

Json Formatter

I just started reading about Hadoop and came across the CAP Theorem. Can you please throw some light on which two components of CAP would be applicable to a HDFS system?

Author:Pallav Doshi,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/58796173/how-does-the-cap-theorem-apply-on-hdfs
Shariq Ehsan :

Argument for Consistency\nThe document very clearly says:\n"The consistency model of a Hadoop FileSystem is one-copy-update-semantics; that of a traditional local POSIX filesystem."\n(One-copy update semantics means the file contents seen by all of the processes accessing or updating a given file would see as if only a single copy of the file existed.)\nMoving forward, the document says:\n\n"Create. Once the close() operation on an output stream writing a newly created file has completed, in-cluster operations querying the file metadata and contents MUST immediately see the file and its data."\n"Update. Once the close() operation on an output stream writing a newly created file has completed, in-cluster operations querying the file metadata and contents MUST immediately see the new data.\n"Delete. once a delete() operation on a path other than “/” has completed successfully, it MUST NOT be visible or accessible. Specifically, listStatus(), open() ,rename() and append() operations MUST fail."\n\nThe above mentioned characteristics point towards the presence of "Consistency" in the HDFS.\nSource: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html\nArgument for Partition Tolerance\nHDFS provides High Availability for both Name Nodes and Data Nodes.\nSource: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html\nArgument for Lack of Availability\nIt is very clearly mentioned in the documentation(under the section "Operations and failures"):\n"The time to complete an operation is undefined and may depend on the implementation and on the state of the system."\nThis indicates that the "Availability" in the context of CAP is missing in HDFS.\nSource:\nhttps://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html\n\nGiven the above mentioned arguments, I believe HDFS supports "Consistency and Partition Tolerance" and not "Availability" in the context of\nCAP theorem.\n",
2020-12-15T17:16:36
yy