Home:ALL Converter>Hadoop & Hive as warehouse: daily data deliveries

Hadoop & Hive as warehouse: daily data deliveries

Ask Time:2013-04-20T15:40:46         Author:caliph

Json Formatter

I am evaluating the combination of hadoop & hive (& impala) as a repolacement for a large data warehouse. I already set up a version and performance is great in read access.

Can somebody give me any hint what concept should be used for daily data deliveries to a table? I have a table in hive based on a file I put into hdfs. But now I have on a daily basis new transactional data coming in. How do I add them ti the table in hive. Inserts are not possible. HDFS cannot append. So whats the gernal concept I need to follow.

Any advice or direction to documentation is appreciated.

Best regards!

Author:caliph,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/16117968/hadoop-hive-as-warehouse-daily-data-deliveries
Jeremiah Peschka :

Hive allows for data to be appended to a table - the underlying implementation of how this happens in HDFS doesn't matter. There are a number of things you can do append data:\n\n\nINSERT - You can just append rows to an existing table.\nINSERT OVERWRITE - If you have to process data, you can perform an INSERT OVERWRITE to re-write a table or partition.\nLOAD DATA - You can use this to bulk insert data into a table and, optionally, use the OVERWRITE keyword to wipe out any existing data.\nPartition your data.\nLoad data into a new table and swap the partition in\n\n\nPartitioning is great if you know you're going to be performing date based searches and gives you the ability to use options 1, 2, & 3 at either the table or partition level. ",
2013-04-22T16:24:08
yy