Home:ALL Converter>Hadoop and HBase integration

Hadoop and HBase integration

Ask Time:2016-03-28T00:02:10         Author:chvs2000

Json Formatter

I am new to Big data technologies, I have a question on how hbase is integrated with hadoop. What does it mean by "Hbase sits on top of HDFS"? . My understanding is HDFS is a collection of structured and unstructured data distributed across multiple nodes and HBase is structured data.

How is Hbase integrated with Hadoop to provide real time access to the underlying data. Do we have to write special jobs to build indexes and such? In other words is there an additional layer between Hbase and hdfs that has data in the structure HBase understands

Author:chvs2000,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/36249463/hadoop-and-hbase-integration
Harsh J :

HDFS is a distributed filesystem; One can do most regular FS operations on it such as listing files in a directory, writing a regular file, reading a part of the file, etc. Its not simply \"a collection of structured or unstructured data\" anymore than your EXT4 or NTFS filesystems are.\n\nHBase is a in-memory Key-Value store which may persist to HDFS (it isn't a hard-requirement, you can run HBase on any distributed-filesystem). For any read key request asked of HBase, it will first check its runtime memory caches to see if it has a value cached, and otherwise visit its stored files on HDFS to seek and read out the specific value. There are various configurations in HBase offered to control the way the cache is utilised, but HBase's speed comes from a combination of caching and indexed persistence (faster, seek-ed file reads).\n\nHBase's file-based persistence on HDFS does the key indexing automatically when it writes, so there is no manual indexing need by its users. These files are regular HDFS files, but specialised in format for HBase's usage, known as HFiles.\n\nThese articles are slightly dated, but are still very reflective of the architecture HBase uses: http://blog.cloudera.com/blog/2012/06/hbase-write-path/ and http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/, and should help if you want to dig deeper.",
2016-03-27T17:11:06
Boo Radley :

HDFS is a distributed file system, and HBase is a NoSQL database that depends on the HDFS filesystem to store it's data. \n\nYou should read up on these technologies, since your structured/unstructured comparison is not correct. \n\nUpdate\n\nYou should check out the Google File System, MapReduce, and Bigtable papers if you are interested in the origins of these technologies. \n\n\nGhemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. \"The Google\nfile system.\" ACM SIGOPS operating systems review. Vol. 37. No. 5.\nACM, 2003.\nDean, Jeffrey, and Sanjay Ghemawat. \"MapReduce: simplified data processing on large clusters.\" Communications of the ACM 51.1 (2008): 107-113.\nChang, Fay, et al. \"Bigtable: A distributed storage system for\nstructured data.\" ACM Transactions on Computer Systems (TOCS) 26.2\n(2008): 4.\n",
2016-03-27T16:29:51
yy